May 2019
Intermediate to advanced
456 pages
11h 38m
English
A common task in NLP is named entity recognition (NER). NER is all about finding things that the text explicitly refers to. Before discussing more about what is going on, let's jump right in and do some hands-on NER on the first article in our dataset.
The first thing we need to do is load spaCy, in addition to the model for English language processing:
import spacy
nlp = spacy.load('en')Next, we must select the text of the article from our data:
text = df.loc[0,'content']
Finally, we'll run this piece of text through the English language model pipeline. This will create a Doc instance, something we explained earlier on in this chapter. The file will hold a lot of information, including the named entities:
doc = nlp(text)