How to do it...

  1. We use a housing dataset from UCI Machine Library Repository. You can download the entire dataset from the following URL:

The dataset comprises 14 columns, with the first 13 columns being the independent variables (that is, features) that try to explain the median price (the last column) of an owner-occupied house in Boston, USA.

We have chosen and cleaned the first eight columns as features. We use the first 200 rows to train and predict the median price.



Per capita crime rate by town



Proportion of residential land zoned for lots over 25,000 sq. ft.



Proportion of non-retail business acres per town


CHAS ...

Get Apache Spark 2.x Machine Learning Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.