How to do it...

  1. We use the admission dataset from the UCLA Institute for Digital Research and Education (IDRE). You can download the entire dataset from the following URLs:

The dataset comprises four columns, with the first column being the dependent variable (label - whether the student was admitted or not) and the next three columns being the explanatory variables, that is, the features that will explain the admission of a student.

We have chosen and cleaned the first three columns as features. We use the first 200 rows to train and predict the median price:

    • Admission - 0, 1 indicating whether ...

Get Apache Spark 2.x Machine Learning Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.