The Study of Racialism Forum Index
The Study of Racialism
Discussion of U.S. Racialism
Please read The Rules before posting.
 
 FAQFAQ   SearchSearch     RegisterRegister 
   Log inLog in 
'

Search Engine Sucks.

 
Post new topic   Reply to topic    The Study of Racialism Forum Index -> Site Management
Author Message
Salsassin
Suspended
Suspended


Joined: 04 Apr 2005
{Posts: 3508 }

PostPosted: Mon 22 May 2006 14:37    Post subject: Search Engine Sucks. Reply with quote

Seriously.

Just look at this search for Brasil.

10 results?


Is there no way to get a better search engine going? I have seen other websites that use this format but their search engine is not through google and they actually work.
Back to top
fwsweet
Administrator
Administrator


Joined: 26 Nov 2004
{Posts: 4585 }
Location: Palm Coast, FL

PostPosted: Mon 22 May 2006 17:35    Post subject: Reply with quote

Okay. Apparently, the Google spider no longer likes the site, because of all the session-ids, avatars, and excessive links. I have tweaked the code to make it easier for google (and other bots) to swallow. Unfortunately, this makes it harder for non-members since it now treats all anonymous users as the same person. Hence, non-members have lost functions. They can no longer get "posts since last visit," nor see member avatars, not see lists of moderators. Also, the most-recently-posted user name no longer appear in the index next to each forum.

Let me know if these changes are unacceptable and I shall put it back the way it was and seek another solution. Otherwise, I would like to give it a week or so to see if these changes better attract the bots.
Back to top
Salsassin
Suspended
Suspended


Joined: 04 Apr 2005
{Posts: 3508 }

PostPosted: Mon 22 May 2006 18:00    Post subject: Reply with quote

Hey Frank, still only pulls up 10. For example I do a search for Brazil

and it will not pull up this thread

http://backintyme.com/odr/viewtopic.php?t=1615


Even though Brazil is a word in that thread.
Back to top
fwsweet
Administrator
Administrator


Joined: 26 Nov 2004
{Posts: 4585 }
Location: Palm Coast, FL

PostPosted: Mon 22 May 2006 18:26    Post subject: Reply with quote

That is why we need to give it a week or two. There are two main ways of indexing a bulletin board: by internal update or by external scanning.

Internal update is how most boards work. Each time someone posts something, the indexes are updated on the fly. The advantage is that we control when it happens. The drawback that when you have high activity plus large archives (as we do), posting gets slower and slower. In our case, after a few months it becomes intolerable. There are ways of dealing with the slowdown, but that is it in a nutshell.

External scanning means that a program (called a "bot" or a "spider") scans through the entire archive of posts every so often. After each scan, the bot/spider builds/rebuilds indexes on very fast powerful computers at central sites (not here). That is what we are doing, relying on Google's bot (which scans our site every couple of days) to maintain the search indexes. It has the advantage that there is no posting slowdown, no matter how active or big the site gets (the bots/spiders usually come by around 3:00 A.M.). Another advantage is that strangers can be directed to the site without even knowing of its existence. The drawback is that bots can be finnicky about what they will index. Google's bot was not doing a thorough scan on our site because it was too much "trouble" (as measured by Google's business tradeoff of bot time versus ad display time).

By making the site easier for the bots to scan, I made it more likely that during the Google bot's visits over the next week or so, it will do deeper scans of our site, build better indexes of our site, and so will not miss as much stuff. For now, the search is still using the superficial indexes it built over the past few weeks. Give it a week or so and see if the indexes improve. As you are aware, the measure of success is how many hits you get on a frequently-used term. If it does not rise sharply by next Monday, I will look at other possibilities.

One other thing. Using external bot scanning (rather than internal updating) means that messages posted since the last bot scan cannot show up in a search. So if the bot comes by every 48 hours (and does a thorough scan) you have an excellent chance of finding a two-day-old message, a 50-50 chance of finding a one-day-old message, but no chance at all of finding a message that was just posted. In general, the older the message, the more likely it is to be found in Google's indexes.
Back to top
Salsassin
Suspended
Suspended


Joined: 04 Apr 2005
{Posts: 3508 }

PostPosted: Mon 22 May 2006 19:11    Post subject: Reply with quote

GOTCHA
Back to top
fwsweet
Administrator
Administrator


Joined: 26 Nov 2004
{Posts: 4585 }
Location: Palm Coast, FL

PostPosted: Thu 25 May 2006 14:53    Post subject: Reply with quote

I have finished tweaking the site to make it more spider-friendly. Let us see if Google's indexing improves in the next week or so.

FYI, as a side-effect of my changes, users who are not logged in now see exactly what spiders see. Specifically until you register and log in, you can no longer see: searching, memberlists, usergroups, profiles, private messages, view posts since last visit, view your posts, view unanswered posts, number of users, number of posts, latest user, nor current users online. In addition, if you are not logged in, all of the site's URLs show up as "html" and not as "php." This makes them all look like static web pages, rather than dynamic content.

In short, visitors who are not logged in (like spiders) now see a very stripped-down version of the site, which looks to their browsers like a large static html website.

I am also looking into other solutions if Google does not come through for us.
Back to top
fwsweet
Administrator
Administrator


Joined: 26 Nov 2004
{Posts: 4585 }
Location: Palm Coast, FL

PostPosted: Wed 31 May 2006 15:28    Post subject: Reply with quote

Well, the changes made the site spider-friendly, but Google still seems to be weak. From scuttlebutt among forum managers, it could take many weeks before Google updates its indexes.

Accordingly, I have gone to Plan B. I am using a search function that is unique to the database engine that we use (MySQL). A search for the word "Brazil" now turns up 370-odd posts reasonably quickly. If it works well, I shall leave it in place (but I shall also check Google now and then).

The only drawback to the new search method is that it locks us in to one database engine. Please try it out and let me know of any problems/comments/suggestions/etc.
Back to top
William
Moderator
Moderator


Joined: 30 Mar 2005
{Posts: 1057 }
Location: New Jersey

PostPosted: Wed 31 May 2006 15:31    Post subject: Reply with quote

Frank wrote:
The only drawback to the new search method is that it locks us in to one database engine.


Does this mean people querying information ODR deals with via Google will no longer be directed here?
Back to top
fwsweet
Administrator
Administrator


Joined: 26 Nov 2004
{Posts: 4585 }
Location: Palm Coast, FL

PostPosted: Wed 31 May 2006 15:47    Post subject: Reply with quote

William wrote:
Does this mean people querying information ODR deals with via Google will no longer be directed here?

No, it does not mean that. The recent change will not affect Google's spider nor its indexes.

In fact, when anyone wants to search for something, I would be grateful if everyone would try both methods. Click here for the Google Search. And click here for the MySQL Search.

Right now, the MySQL Search method yields a lot more stuff (370-odd "Brazil" posts) but it is available only on our site. The Google Search is what anyone would get from Googling the same words (but with results limited to our site). As Jaime pointed out, it coughs up only ten posts for "Brazil."

With any luck, the The Google Search will get better and better as Google's spiders crawl the now spider-friendly site. I hope that both methods will eventually converge to give the same results.
Back to top
Salsassin
Suspended
Suspended


Joined: 04 Apr 2005
{Posts: 3508 }

PostPosted: Wed 31 May 2006 20:27    Post subject: Reply with quote

Wohoooo!!!
Back to top
Display posts from previous:   
Post new topic   Reply to topic    The Study of Racialism Forum Index -> Site Management All times are GMT
Page 1 of 1

 


Powered by phpBB © 2001, 2005 phpBB Group