July 2017
Beginner to intermediate
715 pages
17h 3m
English
Apache Lucene is an old and very powerful search library. It was written back in 1999, and since then a lot of users not only have adopted it but also created many different extensions for this library.
Still, sometimes the built-in NLP capabilities of Lucene are not enough, and a specialized NLP library is needed.
For example, if we would like to include POS tags along with tokens, or find Named Entities, then we need something such as Stanford CoreNLP. It is not very difficult to include such external specialized NLP libraries in the Lucene workflow, and here we will see how to do it.
Let's use the StanfordNLP library and the tokenizer we have implemented in the previous section. We can call it StanfordNlpTokenizer ...