Remember we discussed the mapping of categorical features? Use the categorical features info parameter to specify which of the variables are truly categorical, that is, not just mapped to an index.
For our example, Frisked has two levels, so it is specified as 0:2 (the target variable with 2 levels) and Race has 8 levels, so it is specified as 3:8 (the third feature having 8 levels):
Set other parameters appropriate to the decision tree model:
- numClasses: This applies to the number of outcomes; in our case, it is 2.
- maxDepth: How deep would you like the decision tree to go? For illustrative purposes, we will set it to 2 levels deep, but for real-world problems, you should attempt to go a bit deeper. Remember, ...