Data understanding and preparation

The data set for the 97 men is in a data frame with 10 variables, as follows:

  • lcavol: This is the log of the cancer volume
  • lweight: This is the log of the prostate weight
  • age: This is the age of the patient in years
  • lbph: This is the log of the amount of Benign Prostatic Hyperplasia (BPH), which is the non-cancerous enlargement of the prostate
  • svi: This is the seminal vesicle invasion and an indicator variable of whether or not the cancer cells have invaded the seminal vesicles outside the prostate wall (1 = yes, 0 = no)
  • lcp: This is the log of capsular penetration and a measure of how much the cancer cells have extended in the covering of the prostate
  • gleason: This is the patient's Gleason score; a score ...

Get Mastering Machine Learning with R - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.