In this chapter, we will cover the basic tasks of reading, storing, and cleaning data using Python and OpenRefine. You will learn the following recipes:
Reading and writing CSV/TSV files with Python
Reading and writing JSON files with Python
Reading and writing Excel files with Python
Reading and writing XML files with Python
Retrieving HTML pages with pandas
Storing and retrieving from a relational database
Storing and retrieving from MongoDB
Opening and transforming data with OpenRefine
Exploring the data with OpenRefine
Removing duplicates
Using regular expressions and GREL to clean up the data
Imputing missing observations
Normalizing and standardizing features
Binning the observations
Encoding categorical variables
Introduction ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month, and much more.