book

Java: Data Science Made Easy

by Richard M. Reese, Jennifer L. Reese, Alexey Grigorev

July 2017

Beginner to intermediate

715 pages

17h 3m

English

Packt Publishing

Read now

Unlock full access

Content preview from Java: Data Science Made Easy

Customizing Apache Lucene

Apache Lucene is an old and very powerful search library. It was written back in 1999, and since then a lot of users not only have adopted it but also created many different extensions for this library.

Still, sometimes the built-in NLP capabilities of Lucene are not enough, and a specialized NLP library is needed.

For example, if we would like to include POS tags along with tokens, or find Named Entities, then we need something such as Stanford CoreNLP. It is not very difficult to include such external specialized NLP libraries in the Lucene workflow, and here we will see how to do it.

Let's use the StanfordNLP library and the tokenizer we have implemented in the previous section. We can call it StanfordNlpTokenizer ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781788475655Supplemental Content

Java: Data Science Made Easy

by Richard M. Reese, Jennifer L. Reese, Alexey Grigorev

Customizing Apache Lucene

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Java Data Science Cookbook

Java for Data Science

Mastering Java for Data Science

Numerical Methods Using Java: For Data Science, Analysis, and Engineering

Publisher Resources