Named entity recognition

Named entity recognition in a sub process in the natural language processing pipeline. We identify the names and numbers from the input document. The names can be names of a person or company, location numbers can be money or percentages, to name a few. In order to perform named entity recognition, we will use Apache OpenNLP TokenNameFinderModel API. In order to invoke the code from the R environment, we will use the OpenNLP R package:

  1. Load the required libraries:
    library(rJava)
    library(NLP)
    library(openNLP)
  2. Create a sample text; we will extract the entities from this text:
    txt <- " IBM is an MNC with headquarters in New York. Oracle is a cloud company in California. James works in IBM. Oracle hired John for cloud expertise. ...

Get Mastering Text Mining with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.