6Pivoting and Wide-long Transformations

We consider an untidy organization of data that is commonly encountered because it eases readability when the dataset is visually inspected. It can be observed almost every time data are presented in tabular form. Examples are endless, from Eurostat (https://ec.europa.eu/eurostat/databrowser/explore/all/all_themes) to Our World in Data (https://ourworldindata.org/), just to mention two cases. That is the so-called wide form (or horizontal or rectangular), with the alternative one called long form (or vertical), which is typical of tidy organizations of data.

Wide and long refers to the meaning of columns in a dataset, not just their number; even two columns could be either in wide or long form, because it depends on what information they represent. In the long form, a column represents one or more general features of the observed system, while in the wide form, a column represents a value of a certain feature. It is the same difference we have presented to distinguish tidy from untidy organizations. Therefore, with just two columns, the most basic example, we may have columns Republican and Democrat, each with the number of votes for each state on rows. This is a wide form because Republican and Democrat are values of a more general feature than is Party. Therefore, still two columns, Party and Votes, are the equivalent long form. Is it really equivalent? Not exactly. We can count the number of values in the two cases. In the first case, ...

Get Data Science Fundamentals with R, Python, and Open Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.