Skip to Main Content
Data Science Fundamentals with R, Python, and Open Data
book

Data Science Fundamentals with R, Python, and Open Data

by Marco Cremonini
April 2024
Beginner to intermediate content levelBeginner to intermediate
480 pages
12h 22m
English
Wiley
Content preview from Data Science Fundamentals with R, Python, and Open Data

3Data Organization and First Data Frame Operations

Tabular data could be organized in different forms, with rows, columns, and values associated with information of various natures and carrying different meanings. Often, a specific organization of data is chosen to enhance readability; in other cases, it merely reflects characteristics of the data source or the data ingestion process (e.g. an automatic measurement process, an online data stream, a manual data entry), or it is functional for a certain transformation, computation, or visualization to be executed.

There exists a particular organization of data called tidy that is typically considered the reference model to be rational and suitable for further manipulations with computational or analytical tools. It has three main characteristics:

  • Each row represents a single observation of the phenomenon.
  • Each column represents a specific property (also called variable) of the phenomenon.
  • Each value represents a single information rather than an aggregate.

For example, consider datasets with personal information on students enrolled in courses or employees working at a certain office. Each row would likely correspond to a single individual (observation), with columns representing relevant information (variables) for the specific context for which data have been produced. Values would carry single information like the initial name, the middle name or the surname, place of birth, birth date, and so on, each one associated with ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python and R for the Modern Data Scientist

Python and R for the Modern Data Scientist

Rick J. Scavetta, Boyan Angelov

Publisher Resources

ISBN: 9781394213245Purchase Link