Skip to Content
Statistics for Machine Learning
book

Statistics for Machine Learning

by Pratap Dangeti
July 2017
Beginner to intermediate
442 pages
10h 8m
English
Packt Publishing
Content preview from Statistics for Machine Learning

Example of random forest using German credit data

The same German credit data is being utilized to illustrate the random forest model in order to provide an apple to apple comparison. A very significant difference anyone can observe compared with logistic regression is that effort applied on data preprocessing drastically decreases. The following differences are worth a mention:

  • In RF, we have not removed variables one by one from analysis based on significance and VIF values, as significance tests are not applicable for ML models. However five-fold cross validation has been performed on training data to ensure the model's robustness.
  • We have removed one extra dummy variable in the logistic regression procedure, whereas in RF we have not ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Probability and Statistics for Machine Learning

Probability and Statistics for Machine Learning

Jon Krohn

Publisher Resources

ISBN: 9781788295758Supplemental Content