June 2017
Beginner to intermediate
576 pages
15h 22m
English
Now we can run our variable influence plot. The following plot indicates high cholesterol and family history as the most important variables:
require(randomForest) fit <- randomForest(factor(heartdisease)~., data=heart,ntree=1000) (VI_F <- importance(fit)) varImpPlot(fit,type=2,main="Random Forest Variable Importance Plot - Heart Disease Simulation")

A downside to this method is that it is treating each variable individually, and not considering any correlation between two variables.