Chapter 12. Text Indexing and Lookup

Besides the “basic” indexing capabilities, as explained in Chapter 11, eXist also supports full-text indexes based on the Apache Lucene text search-engine library. Lucene allows eXist to offer search capabilities like looking for words near each other or words like other words, using Boolean text comparison operators, and more. Full-text indexes allow you to do much more with your content than you can do using straight XPath expressions.

If your application needs to support searches based on human input, such as searching documentation or the like, full-text indexes can really help. But things get even better: on top of the full-text index searches, eXist offers keywords in context (KWIC) functionality. This makes it extremely easy to display the results of your searches in context, showing the search results within the surrounding text. We’ll examine this further in “Using Keywords in Context”.

Full-Text Index and KWIC Example

The examples for this book include a simple full-text search example. This example searches, using the full-text index, over some ancient Encyclopedia Britannica entries. Important components of the example are:

  • A full-text index on tei:p elements, defined in /db/system/config/db/apps/exist-book/indexing/data/collection.xconf:

    <collection xmlns="http://exist-db.org/collection-config/1.0">
      <index xmlns:tei="http://www.tei-c.org/ns/1.0">
        
        <!-- other indexes -->
        
        <lucene>
          <text qname="tei:p"/>
        </lucene>
      </index>
    </collection>
  • An ...

Get eXist now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.