O'Reilly logo

Building Parsers with Java™ by Steven John Metsker

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

9.1. The Role of a Tokenizer

Most languages are easier to describe as patterns of tokens than as patterns of characters. A token represents a logical piece of a string. For example, a typical tokenizer would divide the string "1.23 <= 12.3" into three tokens: the number 1.23, a less-than-or-equal symbol, and the number 12.3. A token is a receptacle; it relies on a tokenizer to decide precisely how to divide a string into tokens. In addition to building up numbers from the characters of a string, a tokenizer provides other services that divide a string into tokens. A tokenizer typically does the following:

  • Parse numbers.
  • Build up “words” from letters and potentially other characters.
  • Treat characters such as “<” as one-character symbols.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required