June 2017
Beginner to intermediate
576 pages
15h 22m
English
Since the dataset is very granular for some of the features such as zip code and GPS location variables, we will first perform a dimension reduction by only including features/variables with a maximum of 20 levels. That will help us formulate models later without worrying about overfitting due to some variables that having high dimensionality. However, the number 20 is not set in stone; you can change the default and always keep variables which you feel are needed in the model regardless of levels.
OneR has a function named maxlevels() which can accomplish this. Any variable which has a number of levels exceeding this is not included in the output.
After running the function, use the ...