June 2017
Beginner to intermediate
576 pages
15h 22m
English
Often numeric variables are binned into categories such as high, low, medium, or high risk and low risk. Even though this results in a loss of information, this can result in being able to use a variable in a logistic regression, or simply using it for the purpose of simplicity. There are different ways to define the cut points that segment a variable, but the simplest way is to divide the variable into equal parts. Taking our sales_horizontal data as an example, we can create a new categorical variable that splits the sales data along a high and low category. We will create a new variable called sales_cat that segments sales into two parts using the cut() function:
sales_vertical$sales_cat <- cut(sales_vertical$sales, ...