O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Binning – numeric and character

Often numeric variables are binned into categories such as high, low, medium, or high risk and low risk. Even though this results in a loss of information, this can result in being able to use a variable in a logistic regression, or simply using it for the purpose of simplicity. There are different ways to define the cut points that segment a variable, but the simplest way is to divide the variable into equal parts. Taking our sales_horizontal data as an example, we can create a new categorical variable that splits the sales data along a high and low category. We will create a new variable called sales_cat that segments sales into two parts using the cut() function:

 sales_vertical$sales_cat <- cut(sales_vertical$sales, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required