Chapter 4: Rule-Based Matching

Rule-based information extraction is indispensable for any NLP pipeline. Certain types of entities, such as times, dates, and telephone numbers have distinct formats that can be recognized by a set of rules, without having to train statistical models.

In this chapter, you will learn how to quickly extract information from the text by matching patterns and phrases. You will use morphological features, POS tags, regex, and other spaCy features to form pattern objects to feed to the Matcher objects. You will continue with fine-graining statistical models with rule-based matching to lift statistical models to better accuracies.

By the end of this chapter, you will know a vital part of information extraction. You will ...

Get Mastering spaCy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Mastering spaCy by Duygu Altinok

Chapter 4: Rule-Based Matching

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly