6.3. Namespace-Aware Parsing

Problem

You need to parse an XML document with multiple namespaces.

Solution

Use Digester to parse XML with multiple namespaces, using digester.setNamespaceAware(true), and supplying two RuleSet objects to parse elements in each namespace. Consider the following document, which contains elements from two namespaces: http://discursive.com/page and http://discursive.com/person:

<?xml version="1.0"?>

<pages xmlns="http://discursive.com/page"
       xmlns:person="http://discursive.com/person">
  <page type="standard">
    <person:person firstName="Al" lastName="Gore">
      <person:role>Co-author</person:role> 
    </person:person>
    <person:person firstName="George" lastName="Bush">
      <person:role>Co-author</person:role> 
    </person:person>
  </page>
</pages>

To parse this XML document with the Digester, you need to create two separate sets of rules for each namespace, adding each RuleSet object to Digester with addRuleSet( ). A RuleSet adds Rule objects to an instance of Digester. By extending the RuleSetBase class, and setting the namespaceURI in the default constructor, the following class, PersonRuleSet, defines rules to parse the http://discursive.com/person namespace:

import org.apache.commons.digester.Digester; import org.apache.commons.digester.RuleSetBase; public class PersonRuleSet extends RuleSetBase { public PersonRuleSet( ) { this.namespaceURI = "http://discursive.com/person"; } public void addRuleInstances(Digester digester) { digester.addObjectCreate("*/person", Person.class); ...

Get Jakarta Commons Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.