© Jay M. Patel 2020
J. M. PatelGetting Structured Data from the Internethttps://doi.org/10.1007/978-1-4842-6576-5_4

4. Natural Language Processing (NLP) and Text Analytics

Jay M. Patel1 
Specrom Analytics, Ahmedabad, India

In the preceding chapters, we have solely relied on the structure of the HTML documents themselves to scrape information from them, and that is a powerful method to extract information.

However, for many use cases, that still doesn’t get us specific enough information, and we have to use algorithms and techniques which work directly on raw text itself.

We will survey natural language processing (NLP) techniques and their common use cases in this chapter. The goal here is to present NLP methods and case studies illustrating their ...

Get Getting Structured Data from the Internet: Running Web Crawlers/Scrapers on a Big Data Production Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.