O'Reilly logo

Jakarta Commons Cookbook by Timothy M. O'Brien

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

12.6. Creating an Index of XML Documents

Problem

You need to quickly search a collection of XML documents, and, to do this, you need to create an index of terms keeping track of the context in which these terms appear.

Solution

Use Jakarta Lucene and Jakarta Digester and create an index of Lucene Document objects for the lowest level of granularity you wish to search. For example, if you are attempting to search for speeches in a Shakespeare play that contain specific terms, create a Lucene Document object for each speech. For the purposes of this recipe, assume that you are attempting to index Shakespeare plays stored in the following XML format:

<?xml version="1.0"?>


<PLAY>
  <TITLE>All's Well That Ends Well</TITLE>

  <ACT>
    <TITLE>ACT I</TITLE>

    <SCENE>
      <TITLE>SCENE I.  Rousillon. The COUNT's palace.</TITLE>

      <SPEECH>
        <SPEAKER>COUNTESS</SPEAKER>
        <LINE>In delivering my son from me, I bury a second husband.</LINE>
      </SPEECH>

      <SPEECH>
        <SPEAKER>BERTRAM</SPEAKER>
        <LINE>And I in going, madam, weep o'er my father's death</LINE>
        <LINE>anew: but I must attend his majesty's command, to</LINE>
        <LINE>whom I am now in ward, evermore in subjection.</LINE>
      </SPEECH>
    </SCENE>
  </ACT>
</PLAY>

The following class creates a Lucene index of Shakespeare speeches, reading XML files for each play in the ./data/Shakespeare directory, and calling the PlayIndexer to create Lucene Document objects for every speech. These Document objects are then written to a Lucene index using an IndexWriter:

import java.io.File; ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required