Analysis of the Absenteeism at Work dataset using DBSCAN

The Absenteeism at Work dataset (follow the instructions at the beginning of the chapter to download it) is made up of 740 records containing information regarding employees who took some days off work. There are 20 attributes representing age, service time, education, habits, diseases, disciplinary failures, transportation expense, distance from home to office, and so on (a full description of the fields is available at https://archive.ics.uci.edu/ml/datasets/Absenteeism+at+work). Our goal is to preprocess the data and apply DBSCAN in order to discover dense regions with a specific semantic content.

The first step is loading the CSV file as follows (the placeholder <data_path> must ...

Get Hands-On Unsupervised Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.