April 2019
Beginner to intermediate
386 pages
11h 20m
English
We needed a tokenizer to use in conjunction with our model. We used the IndoEuropeanTokenizerFactory class to obtain an instance of a tokenizer. Once the factory has been obtained, we used it to obtain a tokenizer using the tokenizer method.
The first argument of this method is the text to be tokenized. The second argument specifies the starting index in the string, and the third argument specifies the ending position. The incorporation of these last two arguments provides more flexibility in what text is processed:
TokenizerFactory tokenizerFactory = IndoEuropeanTokenizerFactory.INSTANCE;Tokenizer tokenizer = tokenizerFactory.tokenizer( text.toCharArray(), 0, text.length());
Before the text was tokenized, it was necessary ...
Read now
Unlock full access