O'Reilly logo

Building Parsers with Java™ by Steven John Metsker

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

5.4. A Tokenizing Problem

The coffee grammar accepts coffee names, roasts, and countries as Word terminals. This creates a problem if any of these “words” contains a blank. For example, a coffee name might be “Toasty Rita,” from Costa Rica. By default, the class Tokenizer in sjm.parse.tokens treats a blank as the end of a word. When tokenizing the text

Toasty Rita, Italian, Costa Rica, 9.95 

a default tokenizer would return Toasty as a Word, followed by Rita as a Word. After the first word, the grammar will be looking for a comma and not another word, and a parser generated from the grammar will fail to match the input text.

One solution is to ask the tokenizer to allow blanks to occur inside words. The following code snippet creates such a ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required