Grouping categorical values

In the data used for modeling, we frequently find attributes with a large number of different categorical values. A typical example is product codes, identifying a product purchased by a customer.

A data attribute with many different values can cause problems for data mining algorithms; complex data can make the algorithms run slowly, and may make it more difficult to find the patterns in the data, leading to less accurate models. A useful step in data preparation is to simplify this kind of complex data by grouping the values of a categorical variable into a smaller range of values, where the grouping has a relationship to the problem to be solved.

This recipe shows how to group product codes by their relation to a target ...

