The dataset for the 532 women is in two separate data frames. The variables of interest are as follows:
- npreg: This is the number of pregnancies
- glu: This is the plasma glucose concentration in an oral glucose tolerance test
- bp: This is the diastolic blood pressure (mm Hg)
- skin: This is triceps skin-fold thickness measured in mm
- bmi: This is the body mass index
- ped: This is the diabetes pedigree function
- age: This is the age in years
- type: This is diabetic, Yes or No
The datasets are contained in the R package, MASS. One data frame is named Pima.tr and the other is named Pima.te. Instead of using these as separate train and test sets, we will combine them and create our own in order to discover how to ...