All disciplines collect data about things or objects. Medical researchers collect data on patients, the automotive industry collects data on cars, retail companies collect data on transactions. Patients, cars and transactions are all objects. In a data set there may be many observations for a particular object. For example, a data set about cars may contain many observations on different cars. These observations can be described in a number of ways. For example, a car can be described by listing the vehicle identification number (VIN), the manufacturer's name, the weight, the number of cylinders, and the fuel efficiency. Each of these features describing a car is a variable. Each observation has a specific value for each variable. For example, a car may have:
VIN = IM8GD9A_KP042788
Manufacturer = Ford
Weight = 2984 pounds
Number of cylinders = 6
Fuel efficiency = 20 miles per gallon
Data sets used for data analysis/data mining are almost always described in tables. An example of a table describing cars is shown in Table 3.1. Each row of the table describes an observation (a specific car). Each column describes a variable (a specific attribute of a car). In this example, there are two observations and these observations are described using five variables: VIN, Manufacturer, Weight, Number of cylinders and Fuel efficiency. Variables will be highlighted throughout the book in bold.
A generalized ...