Named entity recognition in a sub process in the natural language processing pipeline. We identify the names and numbers from the input document. The names can be names of a person or company, location numbers can be money or percentages, to name a few. In order to perform named entity recognition, we will use Apache OpenNLP
TokenNameFinderModel API. In order to invoke the code from the R environment, we will use the OpenNLP R package:
library(rJava) library(NLP) library(openNLP)
txt <- " IBM is an MNC with headquarters in New York. Oracle is a cloud company in California. James works in IBM. Oracle hired John for cloud expertise. ...