Chapter 5

Getting Comfortable with Different Kinds of Data Sources

Learning Objectives

By the end of this chapter, you will be able to:

  • Read CSV, Excel, and JSON files into pandas DataFrames
  • Read PDF documents and HTML tables into pandas DataFrames
  • Perform basic web scraping using powerful yet easy to use libraries such as Beautiful Soup
  • Extract structured and textual information from portals

In this chapter, you will be exposed to real-life data wrangling techniques, as applied to web scraping.

Introduction

So far in this book, we have focused on learning pandas DataFrame objects as the main data structure for the application of wrangling techniques. Now, we will learn about various techniques by which we can read data into a DataFrame ...

Get Data Wrangling with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.