Indexing the Right Stuff
So, let’s get back to whether you need a search engine. Let’s assume that you do intend to slap a search engine on top of your web site. Shouldn’t be a problem right? Just point the indexer at the directory where all the pages live, and, voilà! Searchable site!
Of course, you knew it wasn’t that simple. Searching only works well when the stuff that’s being searched is the same as the stuff that users want. This means you may not want to index the entire site. We’ll explain.
Indexing the Entire Site
Search engines are frequently used to index an entire site without regard for the content and how it might vary—every word of every page, whether it contains real content or help information, advertising, navigation menus, and so on.
However, searching works much better when the information space is defined narrowly and contains homogeneous content. In other words, the more you search through indices that combine apples and oranges, the worse your retrieval results will be. After all, when you search a site, you’re probably looking for apples only, not oranges. As already discussed, a site’s content is usually a mix of apples, oranges, kumquats, bell peppers, chainsaws, and Barbie dolls to begin with. So, when you tell your search engine to index your entire site, the site’s users will be performing searches against all kinds of stuff—navigation, destination, and other kinds of pages—all at once. What they retrieve can often be ugly.
Let’s try an example to see ...