Parsing Methodologies

Java has some built-in support for lexical analysis. Let's examine those classes and build our own XML scanner to perform lexical analysis.

StringTokenizer and Scanners

StringTokenizer is a class for breaking a single string into a set of tokens by matching all characters between the specified character delimiters as tokens. Though StringTokenizer is adequate for tokenizing simple, uniform strings, it is clearly inadequate for parsing an XML file. Listing 2.1 demonstrates parsing an XML file with StringTokenizer.

Code Listing 2.1. A Naive Tokenizer
 1: /** SimpleTokenize.java */ 2: package sams.chp2; 3: 4: import java.io.*; 5: import java.util.*; 6: 7: public class SimpleTokenize 8: { 9: public static void main(String args[]) ...

Get XML Development with Java™ 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.