The content indexer

The content indexer is yet another very important component for the Links 'R' Us project. This component performs two distinct functions.

To begin with, the component maintains a full-text index for all documents retrieved by the crawler. Any new or updated document that is emitted by the content extractor component is propagated to the content indexer so that the index can be updated.

It stands to reason that having an index with no means of searching greatly diminishes its usefulness. To this end, the content indexer exposes mechanisms that allow other components to perform full-text searches against the index and to order the results according to retrieval date and/or PageRank score.

