June 2017
Beginner to intermediate
576 pages
15h 22m
English
For classification targets you can use the random forest algorithm to determine variable importance.
For this example, a simulated sample was generated, with smoking and family history being key factors in determining heart disease among males:
set.seed(1020) #construct a 50/50 sample of Males, and Females gender <- sample(c("M","F"), 100, replace=T,prob=c(0.50,0.50)) #assign a higher probability of smoking to the Males (95%, WAY to high!) smokes <- ifelse(gender=="M", sample(c("N","Y"), 100, replace=T,prob=c(0.05,0.95)), sample(c("N","Y"), 100, replace=T,prob=c(0.45,0.55)) ) #assume they also have a 60% chance of family history of heart disease familyhistory <- ifelse(gender=="M", sample(c("N","Y"), 100, replace=T,prob=c(0.40,0.60)), ...