August 2014
Beginner to intermediate
304 pages
7h 10m
English
Regular expressions can be used if you want complete control over how to tokenize text. As regular expressions can get complicated very quickly, I only recommend using them if the word tokenizers covered in the previous recipe are unacceptable.
First you need to decide how you want to tokenize a piece of text as this will determine how you construct your regular expression. The choices are:
We'll start with an example of the first, matching alphanumeric tokens plus single quotes so that we don't split up contractions.
We'll create an instance of RegexpTokenizer, giving it a regular expression string to use for matching tokens: ...
Read now
Unlock full access