Exploring the Housing Dataset

Before we implement our first linear regression model, we will introduce a new dataset, the Housing Dataset, which contains information about houses in the suburbs of Boston collected by D. Harrison and D.L. Rubinfeld in 1978. The Housing Dataset has been made freely available and can be downloaded from the UCI machine learning repository at https://archive.ics.uci.edu/ml/datasets/Housing.

The features of the 506 samples may be summarized as shown in the excerpt of the dataset description:

  • CRIM: This is the per capita crime rate by town
  • ZN: This is the proportion of residential land zoned for lots larger than 25,000 sq.ft.
  • INDUS: This is the proportion of non-retail business acres per town
  • CHAS: This is the Charles River ...

Get Python: Real-World Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.