This dataset consists of tissue samples from 699 patients. It is in a data frame with 11 variables, as follows:
- ID: Sample code number
- V1: Thickness
- V2: Uniformity of the cell size
- V3: Uniformity of the cell shape
- V4: Marginal adhesion
- V5: Single epithelial cell size
- V6: Bare nucleus (16 observations are missing)
- V7: Bland chromatin
- V8: Normal nucleolus
- V9: Mitosis
- class: Whether the tumor diagnosis is benign or malignant; this will be the outcome that we are trying to predict
The medical team has scored and coded each of the nine features on a scale of 1 to 10.
The data frame is available in the R MASS package under the biopsy name. To prepare this data, we will load the data frame, confirm the structure, ...