Named entity recognition

A common task in NLP is named entity recognition (NER). NER is all about finding things that the text explicitly refers to. Before discussing more about what is going on, let's jump right in and do some hands-on NER on the first article in our dataset.

The first thing we need to do is load spaCy, in addition to the model for English language processing:

import spacy
nlp = spacy.load('en')

Next, we must select the text of the article from our data:

text = df.loc[0,'content']

Finally, we'll run this piece of text through the English language model pipeline. This will create a Doc instance, something we explained earlier on in this chapter. The file will hold a lot of information, including the named entities:

doc = nlp(text)

Get Machine Learning for Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.