Dataset

Machine learning works by featuring a dataset that we will break up into a training section and a testing section. We will use the training data to come up with a model. We can then prove or test that model against the testing dataset.

For a dataset to be usable, we need at least a few hundred observations. I am using the housing data from http://uci.edu. Let's load the dataset by using the following command:

housing <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data") 

The site documents the names of the variables as follows:

Variables

Description

CRIM

Per capita crime rate

ZN

Residential zone rate percentage

INDUS

Proportion of non-retail business in town

CHAS ...

Get Learning Jupyter 5 - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.