Let's consider an input dataset X:
Every vector is made up of m features, so each of them can be a good candidate to create a node based on the (feature, threshold) tuple:
According to the feature and the threshold, the structure of the tree will change. Intuitively, we should pick the feature that best separates our data in other words, a perfect separating feature will be present only in a node and the two subsequent branches won't be based on it anymore. In real problems, this is often impossible, so it's necessary to find ...