Implementing a Spark ML clustering model

In this section, we will explain clustering with Spark ML. We will use a publicly available Dataset about the student's knowledge status about a subject.

The Dataset is available for download from the UCI website at https://archive.ics.uci.edu/ml/datasets/User+Knowledge+Modeling.

The attributes of the records contained in the Dataset have reproduced here from the UCI website mentioned previously for reference:

  • STG: The degree of study time for goal object materials (input value)
  • SCG: The degree of repetition number of users for goal object materials (input value)
  • STR: The degree of study time of users for related objects with the goal object (input value)
  • LPR: The exam performance of a user for related ...

Get Learning Spark SQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.