O'Reilly logo

Mastering Machine Learning with R - Second Edition by Cory Lesmeister

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data understanding and preparation

The dataset for the 532 women is in two separate data frames. The variables of interest are as follows:

  • npreg: This is the number of pregnancies
  • glu: This is the plasma glucose concentration in an oral glucose tolerance test
  • bp: This is the diastolic blood pressure (mm Hg)
  • skin: This is triceps skin-fold thickness measured in mm
  • bmi: This is the body mass index
  • ped: This is the diabetes pedigree function
  • age: This is the age in years
  • type: This is diabetic, Yes or No

The datasets are contained in the R package, MASS. One data frame is named Pima.tr and the other is named Pima.te. Instead of using these as separate train and test sets, we will combine them and create our own in order to discover how to ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required