Simple Java tokenizers

There are several Java classes that support simple tokenization; some of them are as follows:

  • Scanner
  • String
  • BreakIterator
  • StreamTokenizer
  • StringTokenizer

Although these classes provide limited support, it is useful to understand how they can be used. For some tasks, these classes will suffice. Why use a more difficult to understand and less efficient approach when a core Java class can do the job? We will cover each of these classes as they support the tokenization process.

The StreamTokenizer and StringTokenizer classes should not be used for new development. Instead, the String class' split method is usually a better choice. They have been included here in case you run across them and wonder whether they should be used or not. ...

Get Natural Language Processing with Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.