Analysis of the Absenteeism at Work dataset using DBSCAN

The Absenteeism at Work dataset (follow the instructions at the beginning of the chapter to download it) is made up of 740 records containing information regarding employees who took some days off work. There are 20 attributes representing age, service time, education, habits, diseases, disciplinary failures, transportation expense, distance from home to office, and so on (a full description of the fields is available at https://archive.ics.uci.edu/ml/datasets/Absenteeism+at+work). Our goal is to preprocess the data and apply DBSCAN in order to discover dense regions with a specific semantic content.

The first step is loading the CSV file as follows (the placeholder <data_path> must ...

Get Hands-On Unsupervised Learning with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.