Summary
We discussed many of the issues that make sentence detection a difficult task. These include problems that result from periods being used for numbers and abbreviations. The use of ellipses and embedded quotes can also be problematic.
Java does provide a couple of techniques to detect the end of a sentence. We saw how regular expressions and the BreakIterator class can be used. These techniques are useful for simple sentences, but they do not work that well for more complicated sentences.
The use of various NLP APIs was also illustrated. Some of these process the text based on rules, while others use models. We also demonstrated how models can be trained and evaluated.
In the next chapter, you will learn how to find people and things with ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access