5

Getting Started with Information Extraction

In this chapter, we will cover the basics of information extraction. Information extraction is the task of pulling very specific information from text. For example, you might want to know the companies mentioned in a news article. Instead of spending time reading the whole article, you can use information extraction techniques to access the companies almost instantly.

We will start with extracting emails addresses and URLs from job announcements. Then, we will use an algorithm called Levenshtein distance to find similar strings. Next, we will extract important keywords from text. After that, we will use spaCy to find named entities in text, and later, we will train our own named entity recognition ...

Get Python Natural Language Processing Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.