Parsing and Lexing XML

Because XML is a well-defined language, it’s a good idea to start our XML project by reviewing the W3C XML language definition.[49] Unfortunately, the XML specification (henceforth the spec) is huge, and it’s very easy to get lost in all of the details. To make our lives easier, let’s get rid of stuff we don’t need in order to parse XML files: <!DOCTYPE..> document type definitions (DTDs), <!ENTITY..> entity declarations, and <!NOTATION..> notation declarations. Besides, handling those tags wouldn’t teach us anything beyond what we need to handle the other constructs.

We’re going to start out by building the syntactic rules for XML. The good news is that we can reuse the informal grammar rules from the spec ...

Get The Definitive ANTLR 4 Reference, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.