How to do it...

  1. We use the admission dataset from UCLA IDRE. You can download the entire dataset from the following URLs:

The dataset comprises four columns, with the first column being the dependent variable (that is, the label - whether the student was admitted or not) and the next three columns being the explanatory variables (features that will explain the admission of a student).

We have chosen and cleaned the first eight columns as features. We use the first 200 rows to train and predict the median price.

Here is some sample data from the first three rows:

  1. Start a new project ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.