O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Eliminating some factors with a large number of levels

Since the dataset is very granular for some of the features such as zip code and GPS location variables, we will first perform a dimension reduction by only including features/variables with a maximum of 20 levels. That will help us formulate models later without worrying about overfitting due to some variables that having high dimensionality. However, the number 20 is not set in stone; you can change the default and always keep variables which you feel are needed in the model regardless of levels.

OneR has a function named maxlevels() which can accomplish this. Any variable which has a number of levels exceeding this is not included in the output.

After running the function, use the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required