Implementing a Spark ML clustering model

In this section, we will explain clustering with Spark ML. We will use a publicly available Dataset about the student's knowledge status about a subject.

The Dataset is available for download from the UCI website at https://archive.ics.uci.edu/ml/datasets/User+Knowledge+Modeling.

The attributes of the records contained in the Dataset have reproduced here from the UCI website mentioned previously for reference:

  • STG: The degree of study time for goal object materials (input value)
  • SCG: The degree of repetition number of users for goal object materials (input value)
  • STR: The degree of study time of users for related objects with the goal object (input value)
  • LPR: The exam performance of a user for related ...

Get Learning Spark SQL now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.