4. Collecting Text Data from the Web

Learning Objectives

By the end of this chapter, you will be able to:

  • Extract and process data from web pages
  • Describe different kinds of semi-structured data, such as JSON and XML
  • Extract real-time data using Application Programming Interfaces
  • Extract data from various file formats

In this chapter, you will learn how to collect data from different file formats.


In the last chapter, we learned about developing a simple classifier using feature extraction methods. We also covered different algorithms that fall under supervised and unsupervised learning. In this chapter, you will learn about collecting data by scraping web pages and then processing it. You will also learn how to handle ...

Get Natural Language Processing Fundamentals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.