Chapter 4: Entity Extraction

4.1 Introduction. 71

4.2 Business Context 72

4.3 Scraping Text Data. 73

4.3.1 Webpage. 73

4.3.2 File System... 74

4.4 Basic Entity Extraction Patterns. 76

4.4.1 Social Security Number. 78

4.4.2 Phone Number. 78

4.4.3 Address. 78

4.4.4 Website. 80

4.4.5 Corporation Name. 80

4.5 Putting Them Together 82

4.6 Summary. 83

 

4.1 Introduction

Figure 4.1: ERA Flow with Entity Extraction Focus

In order to identify the free text entity references we want to utilize for all subsequent resolution and analysis activities, we have to first determine the types of entities we want to understand more about, and in what data sources ...

Get Unstructured Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.