O'Reilly logo

Mastering Machine Learning with R - Second Edition by Cory Lesmeister

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data understanding and preparation

This dataset consists of tissue samples from 699 patients. It is in a data frame with 11 variables, as follows:

  • ID: Sample code number
  • V1: Thickness
  • V2: Uniformity of the cell size
  • V3: Uniformity of the cell shape
  • V4: Marginal adhesion
  • V5: Single epithelial cell size
  • V6: Bare nucleus (16 observations are missing)
  • V7: Bland chromatin
  • V8: Normal nucleolus
  • V9: Mitosis
  • class: Whether the tumor diagnosis is benign or malignant; this will be the outcome that we are trying to predict

The medical team has scored and coded each of the nine features on a scale of 1 to 10.

The data frame is available in the R MASS package under the biopsy name. To prepare this data, we will load the data frame, confirm the structure, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required