Machine learning works by featuring a dataset that we will break up into a training section and a testing section. We will use the training data to come up with a model. We can then prove or test that model against the testing dataset.
For a dataset to be usable, we need at least a few hundred observations. I am using the housing data from http://uci.edu. Let's load the dataset by using the following command:
housing <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data")
The site documents the names of the variables as follows:
Variables |
Description |
CRIM |
Per capita crime rate |
ZN |
Residential zone rate percentage |
INDUS |
Proportion of non-retail business in town |
CHAS ... |