A.1. Data vocabulary

Let’s start with some basic vocabulary we’ll be using to describe data. There are some variations in the way data scientists and statisticians use the terminology, so I’ll try to make it clear which terms are equivalent and which I opt to use throughout the book. In this section, we’ll discuss

  • The difference between a sample and a population
  • What we mean by rows, columns, cases, and variables
  • What the different types of variables are and how they differ

A.1.1. Sample vs. population

In data science and statistics, we’re usually trying to learn something about, or predict something in, the real world. Let’s say we’re interested in the tusk length of hippos. It would be impossible to measure the tusk length of every hippo ...

Get Machine Learning with R, the tidyverse, and mlr now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.