Building and evaluating NER systems

Based on our discussion so far in this chapter, we know that building an NER system will start with the following steps:

  1. Separate our document into sentences.
  2. Separate our sentences into tokens.
  3. Tag each token with a part of speech.
  4. Identify named entities from this tagged token set.
  5. Identify the class of each named entity.

To help us correctly find tokens at step 2, separate the real named entities from the impostors at step 4, and to ensure that the entities are placed into the correct class at step 5, it is common to leverage a machine learning approach, similar to what NLTK and its sentiment mining functions did for us in Chapter 5, Sentiment Analysis in Text. Relying on a large set of pre-classified examples ...

Get Mastering Data Mining with Python – Find patterns hidden in your data now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.