April 2017
Beginner to intermediate
420 pages
9h 58m
English
For the classification problem, we will prepare the breast cancer data in the same fashion as we did in Chapter 3, Logistic Regression and Discriminant Analysis. After loading the data, you will delete the patient ID, rename the features, eliminate the few missing values, and then create the train/test datasets in the following way:
> data(biopsy) > biopsy <- biopsy[, -1] #delete ID > names(biopsy) <- c("thick", "u.size", "u.shape", "adhsn", "s.size", "nucl", "chrom", "n.nuc", "mit", "class") #change the feature names > biopsy.v2 <- na.omit(biopsy) #delete the observations with missing values > set.seed(123) #random number generator > ind <- sample(2, nrow(biopsy.v2), replace = TRUE, prob = c(0.7, 0.3)) > biop.train ...Read now
Unlock full access