January 2020
Beginner to intermediate
372 pages
10h
English
In the winning solution of the KDD 2009 cup, http://www.mtome.com/Publications/CiML/CiML-v3-book.pdf, the authors limit one-hot encoding to the 10 most frequent categories of each variable. Check the Winning the KDD Cup Orange Challenge with Ensemble Selection article for more details. The number of top categories to encode is arbitrarily set by the user. In this recipe, we will encode the five most frequent categories.
Read now
Unlock full access