Analysis
We are given M=3 variables, according to which a feature can be classified. In a random forest algorithm, we usually do not use all three variables to form tree branches at each node. We only use a subset (m) of variables from M. So we choose m such that m is less than, or equal to, M. The greater m is, the stronger the classifier is in each constructed tree. However, as mentioned earlier, more data leads to more bias. But, because we use multiple trees (with a lower m), even if each constructed tree is a weak classifier, their combined classification accuracy is strong. As we want to reduce bias in a random forest, we may want to consider choosing an m parameter that is slightly less than M.
Hence, we choose the maximum number of ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access