How to do it...

  1. We use a housing dataset from UCI Machine Library Repository. You can download the entire dataset from the following URL:

https://archive.ics.uci.edu/ml/machine-learning-databases/housing/

The dataset comprises 14 columns, with the first 13 columns being the independent variables (that is, features) that try to explain the median price (the last column) of an owner-occupied house in Boston, USA.

We have chosen and cleaned the first eight columns as features. We use the first 200 rows to train and predict the median price.

1

CRIM

Per capita crime rate by town

2

ZN

Proportion of residential land zoned for lots over 25,000 sq. ft.

3

INDUS

Proportion of non-retail business acres per town

4

CHAS ...

Get Apache Spark 2.x Machine Learning Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.