The core content of a dataset is usually a series of data entries with identical properties. As mentioned previously, tabular data sources have a more predictable structure, so this step isn't really necessary for something like a CSV file.
Hierarchical data sources, on the other hand, will often include metadata with information about the dataset along with the data itself. There may also be variations in the structure of individual data entries, but for now all data entries will have the same variables.
Metadata can be useful, but in order to make use of the data, you will usually need to separate the core content of the dataset and put it in a more basic format. In the following steps, you will explore ...