5. Getting Comfortable with Different Kinds of Data Sources

Overview

This chapter will provide you with the skills to read CSV, Excel, and JSON files into pandas DataFrames. You will learn how to read PDF documents and HTML tables into pandas DataFrames and perform basic web scraping operations using powerful yet easy-to-use libraries such as Beautiful Soup. You will also see how to extract structured and textual information from portals. By the end of this chapter, you will be able to implement data wrangling techniques such as web scraping in the real world.

Introduction

So far in this book, we have focused on studying pandas DataFrame objects as the main data structure for the application of wrangling techniques. In this chapter, we ...

Get The Data Wrangling Workshop - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.