6

Data Wrangling Documents and Spreadsheets

Now that we have some basic Python and data skills under our belt, let's take a look at how we can work with some common types of data you will see in the wild: documents and spreadsheets. Most organizations use Microsoft Office with Word and Excel, and this generates huge amounts of data. There are also loads of PDF documents out there with valuable information contained within. If our data lies in a pile of Excel and PDF files, then dealing with these types of data becomes necessary when doing data science. Once we have data loaded from these files, it's also useful to have a few basic analysis techniques at the ready. We'll learn data extraction techniques, as well as basic analysis techniques for ...

Get Practical Data Science with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.