The dataset

To begin with, let's open Command Prompt and execute the following command:

cd tutorialjupyter lab

This will take us to the tutorial folder. From here, we can open up JupyterLab. This folder is going to be empty right now, but it is where we will be completing this tutorial.

The dataset we're going to use is the heart disease dataset from the UCI repository. You can download this from It has around 303 patients collected from the Cleveland Clinic Foundation. They have also added data from other places as well, but we are only going to look at data from Cleveland for now. If you go over to the Data folder, you'll see that we've got lot's of different options:

Get Machine Learning for Healthcare Analytics Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.