3.3 DATA UNDERSTANDING

3.3.1 Data Tables

All disciplines collect data about things or objects. Medical researchers collect data on patients, the automotive industry collects data on cars, retail companies collect data on transactions. Patients, cars and transactions are all objects. In a data set there may be many observations for a particular object. For example, a data set about cars may contain many observations on different cars. These observations can be described in a number of ways. For example, a car can be described by listing the vehicle identification number (VIN), the manufacturer's name, the weight, the number of cylinders, and the fuel efficiency. Each of these features describing a car is a variable. Each observation has a specific value for each variable. For example, a car may have:

VIN = IM8GD9A_KP042788

Manufacturer = Ford

Weight = 2984 pounds

Number of cylinders = 6

Fuel efficiency = 20 miles per gallon

Data sets used for data analysis/data mining are almost always described in tables. An example of a table describing cars is shown in Table 3.1. Each row of the table describes an observation (a specific car). Each column describes a variable (a specific attribute of a car). In this example, there are two observations and these observations are described using five variables: VIN, Manufacturer, Weight, Number of cylinders and Fuel efficiency. Variables will be highlighted throughout the book in bold.

Table 3.1. Example of a table describing cars

A generalized ...

Get Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.