Skip to Main Content
Data Science Fundamentals with R, Python, and Open Data
book

Data Science Fundamentals with R, Python, and Open Data

by Marco Cremonini
April 2024
Beginner to intermediate content levelBeginner to intermediate
480 pages
12h 22m
English
Wiley
Content preview from Data Science Fundamentals with R, Python, and Open Data

6Pivoting and Wide-long Transformations

We consider an untidy organization of data that is commonly encountered because it eases readability when the dataset is visually inspected. It can be observed almost every time data are presented in tabular form. Examples are endless, from Eurostat (https://ec.europa.eu/eurostat/databrowser/explore/all/all_themes) to Our World in Data (https://ourworldindata.org/), just to mention two cases. That is the so-called wide form (or horizontal or rectangular), with the alternative one called long form (or vertical), which is typical of tidy organizations of data.

Wide and long refers to the meaning of columns in a dataset, not just their number; even two columns could be either in wide or long form, because it depends on what information they represent. In the long form, a column represents one or more general features of the observed system, while in the wide form, a column represents a value of a certain feature. It is the same difference we have presented to distinguish tidy from untidy organizations. Therefore, with just two columns, the most basic example, we may have columns Republican and Democrat, each with the number of votes for each state on rows. This is a wide form because Republican and Democrat are values of a more general feature than is Party. Therefore, still two columns, Party and Votes, are the equivalent long form. Is it really equivalent? Not exactly. We can count the number of values in the two cases. In the first case, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python and R for the Modern Data Scientist

Python and R for the Modern Data Scientist

Rick J. Scavetta, Boyan Angelov

Publisher Resources

ISBN: 9781394213245Purchase Link