Get full access to Mastering Machine Learning with R - Second Edition and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Data understanding and preparation

The dataset for the 532 women is in two separate data frames. The variables of interest are as follows:

npreg: This is the number of pregnancies
glu: This is the plasma glucose concentration in an oral glucose tolerance test
bp: This is the diastolic blood pressure (mm Hg)
skin: This is triceps skin-fold thickness measured in mm
bmi: This is the body mass index
ped: This is the diabetes pedigree function
age: This is the age in years
type: This is diabetic, Yes or No

The datasets are contained in the R package, MASS. One data frame is named Pima.tr and the other is named Pima.te. Instead of using these as separate train and test sets, we will combine them and create our own in order to discover how to ...

Get Mastering Machine Learning with R - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now