How to do it...

  1. We use the admission dataset from UCLA IDRE. You can download the entire dataset from the following URLs:

The dataset comprises four columns, with the first column being the dependent variable (that is, the label - whether the student was admitted or not) and the next three columns being the explanatory variables (features that will explain the admission of a student).

We have chosen and cleaned the first eight columns as features. We use the first 200 rows to train and predict the median price.

Here is some sample data from the first three rows:

  1. Start a new project ...

Get Apache Spark 2.x Machine Learning Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.