O'Reilly logo

Mastering Data Mining with Python – Find patterns hidden in your data by Megan Squire

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Building and evaluating NER systems

Based on our discussion so far in this chapter, we know that building an NER system will start with the following steps:

  1. Separate our document into sentences.
  2. Separate our sentences into tokens.
  3. Tag each token with a part of speech.
  4. Identify named entities from this tagged token set.
  5. Identify the class of each named entity.

To help us correctly find tokens at step 2, separate the real named entities from the impostors at step 4, and to ensure that the entities are placed into the correct class at step 5, it is common to leverage a machine learning approach, similar to what NLTK and its sentiment mining functions did for us in Chapter 5, Sentiment Analysis in Text. Relying on a large set of pre-classified examples ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required