4

Loading and Wrangling Data with Pandas and NumPy

Data sources come in many formats: plain text files, CSVs, SQL databases, Excel files, and many more. We saw how to deal with some of these data sources in the last chapter, but there is one library in Python that takes the cake when it comes to data preparation: pandas. The pandas library is a core tool for a data scientist, and we will learn how to use it effectively in this chapter. We will learn about:

  • Loading data from and saving data to several different data source types
  • Some basic exploratory data analysis (EDA) and plotting with pandas
  • Preparing and cleaning data for later use, including the imputation of missing data (filling in missing values) and outlier detection
  • Essential data ...

Get Practical Data Science with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.