Splitting the dataset
Now we will split our dataset into training and testing datasets. We're going to use sklearn's train_test_split function to generate a training dataset, which will be about 80% of the total data, and then a testing dataset, which will be 20% of the total data. The class values in this dataset contain multiple types of heart disease, with values ranging from 0 (healthy) to 4 (severe heart disease). Consequently, we will convert our class data into categorical labels.
Let's create X and y datasets for training. So, first, we want to split our class label into its own y value. We will import the model_selection package from sklearn and convert the X DataFrame to a NumPy array, taking everything but the class attribute. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access