Practical Data Cleaning with Python

Katharine Jarmul will show you how to use Python libraries to speed up the data wrangling process and automate data cleaning, how to handle messy data, and how to write data unit tests that monitor data validity.

March 16, 2015
Katharine Jarmul teaching. Katharine Jarmul teaching. (source: Katharine Jarmul, used with permission)

It’s a commonly cited statistic that data scientists spend roughly 80% of their time processing, wrangling, and munging their data and only 20% actually analyzing it. Speeding up the time you spend cleaning your data even a small amount can lead to valuable gains down the line.

Join expert Katharine Jarmul for a hands-on, in-depth exploration of practical data cleaning with Python, as she highlights the tools that can help speed up the data wrangling process and automate (or at least allow for general scripting) of some of the repetitive processes. You’ll get an overview of best libraries and tools to use when handling messy data and learn how to apply software development practices to data wrangling problems by writing data unit tests, which allow you to catch problems before they have created innacurate data for your entire company. Along the way, you’ll explore a few case studies to see the application of these techniques on real-world data problems.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Learn more.

Post topics: Data science