Chapter 6. Search Technology Part 2

In this chapter some of the more sophisticated aspects of search technology are described. All search applications will have the technology components described in Chapter 5 but few will have all the technologies set out in this chapter. In selecting a search application it is of little value to use this chapter as a check-list, making a short list from those applications having the greatest number of ticks.

The reasons for this are:

  • Selecting a search application has to be based on user requirements, and it could be that just one of these features correctly implemented will be quite sufficient to meet these requirements.

  • The more of these features that are implemented the greater the cost of implementation and administration, ease of upgrading may be reduced, and users may need more training and support.

Entity Extraction

The concept of entity extraction is to be able to use the search application to identify automatically personal names, locations and other terms that can then be used as query terms without the need to manually index these terms. The technical term for this process is ‘named entity extraction’ and analyses not just an individual word but also a sequence of words to determine index terms that could be of value in responding to queries. When organizations choose English they are also choosing a language with over 1,000,000 words, a result of invasions and the scale of perhaps the British Empire. The result is a language full of synonyms ...

