Tokenizing Text
We end this chapter with an extended (and more complex) example
in three parts. Example 2-8
is a listing of Tokenizer.java. This Tokenizer
interface defines an API for
tokenizing text. Tokenizing
simply means breaking into chunks; tokenizers are also known as lexers
or scanners, and are commonly used when writing parsers. This Tokenizer
interface is intended to provide
an alternative to java.util.StringTokenizer
, which is too
simple for many uses, and java.io.StreamTokenizer
, which is complex
and poorly documented.
As an interface, Tokenizer
doesn’t do anything itself. But Example 2-8 is followed by an
implementation in Examples Example
2-9 and Example 2-10.
Following a pattern that you’ll also see frequently in Java platform
APIs, the implementation is broken into two classes: AbstractTokenizer
, an abstract class that
implements Tokenizer
and implements
its methods in terms of a small number of abstract methods, followed
by CharSequenceTokenizer
, a concrete subclass for tokenizing String
and StringBuffer
(or any CharSequence
) objects. To demonstrate the
flexibility of this implementation scheme, we’ll see other Tokenizer
implementations based on AbstractTokenizer
throughout this book.
ReaderTokenizer
(for tokenizing
character streams) is defined in Example 3-7, ChannelTokenizer
(for tokenizing text read
from high-performance “channels” of the New I/O API) is defined in
Example 6-8, and MappedFileTokenizer
(for tokenizing
memory-mapped files) is defined in Example ...
Get Java Examples in a Nutshell, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.