O'Reilly logo

Practical Data Wrangling by Allan Visochek

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Extracting the core content of the data

The core content of a dataset is usually a series of data entries with identical properties. As mentioned previously, tabular data sources have a more predictable structure, so this step isn't really necessary for something like a CSV file.

Hierarchical data sources, on the other hand, will often include metadata with information about the dataset along with the data itself. There may also be variations in the structure of individual data entries, but for now all data entries will have the same variables.

Metadata can be useful, but in order to make use of the data, you will usually need to separate the core content of the dataset and put it in a more basic format. In the following steps, you will explore ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required