Skip to Content
Java: Data Science Made Easy
book

Java: Data Science Made Easy

by Richard M. Reese, Jennifer L. Reese, Alexey Grigorev
July 2017
Beginner to intermediate
715 pages
17h 3m
English
Packt Publishing
Content preview from Java: Data Science Made Easy

Customizing Apache Lucene

Apache Lucene is an old and very powerful search library. It was written back in 1999, and since then a lot of users not only have adopted it but also created many different extensions for this library.

Still, sometimes the built-in NLP capabilities of Lucene are not enough, and a specialized NLP library is needed.

For example, if we would like to include POS tags along with tokens, or find Named Entities, then we need something such as Stanford CoreNLP. It is not very difficult to include such external specialized NLP libraries in the Lucene workflow, and here we will see how to do it.

Let's use the StanfordNLP library and the tokenizer we have implemented in the previous section. We can call it StanfordNlpTokenizer ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Java Data Science Cookbook

Java Data Science Cookbook

Rushdi Shams
Java for Data Science

Java for Data Science

Walter Molina, Richard M. Reese, Shilpi Saxena, Jennifer L. Reese

Publisher Resources

ISBN: 9781788475655Supplemental Content