O'Reilly logo

Natural Language Processing with Java by Richard M Reese

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Simple Java tokenizers

There are several Java classes that support simple tokenization; some of them are as follows:

  • Scanner
  • String
  • BreakIterator
  • StreamTokenizer
  • StringTokenizer

Although these classes provide limited support, it is useful to understand how they can be used. For some tasks, these classes will suffice. Why use a more difficult to understand and less efficient approach when a core Java class can do the job? We will cover each of these classes as they support the tokenization process.

The StreamTokenizer and StringTokenizer classes should not be used for new development. Instead, the String class' split method is usually a better choice. They have been included here in case you run across them and wonder whether they should be used or not. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required