Chapter 5
Getting Comfortable with Different Kinds of Data Sources
Learning Objectives
By the end of this chapter, you will be able to:
- Read CSV, Excel, and JSON files into pandas DataFrames
- Read PDF documents and HTML tables into pandas DataFrames
- Perform basic web scraping using powerful yet easy to use libraries such as Beautiful Soup
- Extract structured and textual information from portals
In this chapter, you will be exposed to real-life data wrangling techniques, as applied to web scraping.
Introduction
So far in this book, we have focused on learning pandas DataFrame objects as the main data structure for the application of wrangling techniques. Now, we will learn about various techniques by which we can read data into a DataFrame ...
Get Data Wrangling with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.