Named entity recognition
Named entity recognition in a sub process in the natural language processing pipeline. We identify the names and numbers from the input document. The names can be names of a person or company, location numbers can be money or percentages, to name a few. In order to perform named entity recognition, we will use Apache OpenNLP TokenNameFinderModel
API. In order to invoke the code from the R environment, we will use the OpenNLP R package:
- Load the required libraries:
library(rJava) library(NLP) library(openNLP)
- Create a sample text; we will extract the entities from this text:
txt <- " IBM is an MNC with headquarters in New York. Oracle is a cloud company in California. James works in IBM. Oracle hired John for cloud expertise. ...
Get Mastering Text Mining with R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.