August 2020
Beginner to intermediate
452 pages
7h 42m
English
Overview
This chapter introduces you to the concept of web scraping. You will first learn how to extract data (such as text, images, lists, and tables) from pages that are written using HTML. You will then learn about the various types of semi-structured data used to create web pages (such as JSON and XML) and extract data from them. Finally, you will use APIs for data extraction from Twitter, using the tweepy package.
In the last chapter, we developed a simple classifier using feature extraction methods. We also covered different algorithms that fall under supervised and unsupervised learning. In this chapter, you will learn how to collect text data by scraping web pages, ...