8.13 Intro to Data Science: Pandas, Regular Expressions and Data Munging

Data does not always come in forms ready for analysis. It could, for example, be in the wrong format, incorrect or even missing. Industry experience has shown that data scientists can spend as much as 75% of their time preparing data before they begin their studies. Preparing data for analysis is called data munging or data wrangling. These are synonyms—from this point forward, we’ll say data munging.

Two of the most important steps in data munging are data cleaning and transforming data into the optimal formats for your database systems and analytics software. Some common data cleaning examples are:

  • deleting observations with missing values,

  • substituting reasonable values ...

Get Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.