14.3. Parsing a Complex XML Document

Problem

You have a collection of data stored in an XML document that uses an internal DTD or XML Namespaces. You want to parse the document and turn the data it contains into a collection of C++ objects.

Solution

Use Xerces’s implementation of the SAX2 API (the Simple API for XML, Version 2.0). First, derive a class from xercesc::ContentHandler; this class will receive notifications about the structure and content of your XML document as it is being parsed. Next, if you like, derive a class from xercesc::ErrorHandler to receive warnings and error notifications. Construct a parser of type xercesc::SAX2XMLReader, register instances of your handler classes using the parser’s setContentHandler() and setErrorHandler() methods. Finally, invoke the parser’s parse() method, passing the file pathname of your document as its argument.

For example, suppose you want to parse the XML document animals.xml from Example 14-1 and construct a std::vector of Animals representing the animals listed in the document. (See Example 14-2 for the definition of the class Animal.) In Example 14-3, I showed how to do this using TinyXml. To make the problem more challenging, let’s add namespaces to the document, as shown in Example 14-5.

Example 14-5. List of circus animals, using XML Namespaces

<?xml version="1.0" encoding="UTF-8"?> <!-- Feldman Family Circus Animals with Namespaces --> <ffc:animalList xmlns:ffc="http://www.feldman-family-circus.com"> <ffc:animal> <ffc:name>Herby</ffc:name> ...

Get C++ Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.