9.5. A Tokenizer Class

The Tokenizer class in sjm.parse.tokens uses a set of states to recognize different types of tokens. Each state is a subclass of TokenizerState, a class in the same package. A Tokenizer object reads a character of an input string and uses this character to decide which state to use to find the next token. The design of Tokenizer in sjm.parse.tokens is as follows:

1.
Read a character and use it to look up which TokenizerState object to use.
2.
Send the TokenizerState object the initial character, and ask the TokenizerState to return a Token. The TokenizerState reads as many characters as it needs to produce a Token.
3.
Repeat until there are no more characters.

Figure 9.3 shows a state diagram of the classes in sjm.parse.tokens ...

Get Building Parsers with Java™ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.