Since the site’s inception, we’ve been massing large amounts of content on which millions of people have come to depend. We have numerous ways of getting to the content, but the quickest and easiest way to find specific information is to search for it.

AnandTech Search 1.0 (ColdFusion Verity)

The first version of the site used a search server included with ColdFusion named “Verity”. Most people have heard of Verity; they are one of the industry leaders in enterprise search software. The version of Verity that was included with ColdFusion back then was a light version of the full-blown Verity Search server. Although it did quite well at locating content via Boolean searches, it lacked flexibility and wasn’t all that of a performant.

AnandTech Search 2.0 (Microsoft FullText Search)

After we migrated to Microsoft SQL Server, we decided to use the Full Text search that is built-in to SQL Server. SQL Server Full Text came to be in version 7.0, and allows you to create catalogs that can contain multiple indexes on text column types. You can then configure Full Text to index the data in the background, or perform one time or scheduled indexing of the data.

There are, however, a couple of caveats with Microsoft Full Text search. The first is that it throws errors when your search criteria contain “noise words”. By default, Full Text search is configured with a list of “noise words”. Microsoft (and many other search engines) consider words like “because,been,before,being,between,both,but,by” to be common words that should not be contained in an index. Of course, you can trap this error easily in your application, but realistically, the search engine should just filter the words out of the search phrase itself.

The second and more important issue is how Full Text handles acronyms and numerical values in search strings. We never really did get to the bottom of the problem, but even with all of the noise words removed from Full Text, certain search phrases that contained acronym and numerical data wouldn’t return results. Since our data is full of technical acronyms and numerical model numbers, this was a major issue for us.

Along came Google
POST A COMMENT

48 Comments

View All Comments

  • zmagaw - Tuesday, September 06, 2005 - link

    there are a few methods - including creating separate collections by user type filtering out or in urls by pattern matching Reply
  • Brickster - Wednesday, September 07, 2005 - link

    Actually, I found you can achieve that with an upgrade... of course :)

    http://www.google.com/enterprise/feature_compariso...">http://www.google.com/enterprise/feature_compariso...

    Here is the Appliance feature:
    Secure Content API - Search across secure content using Google's Authorization API to integrate into existing access control systems.

    Looks like the Mini doesn't support secured content.
    Reply
  • zmagaw - Tuesday, September 06, 2005 - link

    we signed a non-disclosure that said we couldnt open the google search appliances... although the hardware looks simple and run of the mill... the software is not... a lot of open source stuff on that puppy but execution is everything... the support we got though was horrible... 2 day respose times... so not easy because the software is full of bugs that are not easily diagosed... hardware failures - disks... and speedy google working with large corporations has been seen as a daunting task for the bright people at Google Reply
  • nadirshakur - Tuesday, September 06, 2005 - link

    What is the warranty on these puppies. Hey didn't Anandtech void there's by opening it like that and showing the whole world they did. Reply
  • flatblastard - Tuesday, September 06, 2005 - link

    Thats okay, if the RAM/CPU goes bad, I'll sell them my old p3 450Mhz system I got laying around for spare parts. Heck, I'll even give them a sweet deal.....$1999.95 and I'll even throw in Windows 98 (not SE)..... ;) Reply
  • deathwalker - Tuesday, September 06, 2005 - link

    Is this really important when it comes to your experience when visiting the AnandTech website? I guess I'll get blasted for that statement!! So much really good stuff that could be the news of the day...this article is just cannon foder...something to fill the need for a new article to read on this day. Reply
  • Jason Clark - Tuesday, September 06, 2005 - link

    Guys, we don't crawl every day :) It crawls 3-4 times a week, since large articles are on the front page, searching for them is pretty unnecessary. Reply
  • Gooberslot - Tuesday, September 06, 2005 - link

    Is it normal for servers to give you no control like that? I wouldn't want anything that had a bios password that I couldn't change.

    I'm also surprised that you can even get a P3 or a P3 motherboard anymore.
    Reply
  • zmagaw - Tuesday, September 06, 2005 - link

    i think when you are buying appliances yes... the reason google does this... you would be able to decompile their software on that hard-drive which is formatted in a google HD format - or so I have heard Reply
  • smn198 - Tuesday, September 06, 2005 - link

    You could remove BIOS password but then you loose all the BIOS info as well and maybe they are doind something special there. Reply

Log in

Don't have an account? Sign up now