Chapter 5Logistic Regression
In Chapter 4, we discussed how analysts can use linear regression to predict the value of a numeric variable based upon its relationship to one or more independent variables. Linear regression is a useful tool for these situations, but it isn't well-suited for every type of problem. In particular, linear regression does not work well when our problem requires that we predict a categorical variable. For example, we might want to predict whether a potential customer might fit into the categories of Big Spender, Repeat Customer, One-Time Customer, or Noncustomer. Similarly, we might want to predict whether a tumor detected in a medical imaging scan is benign or malignant. These problems, where we attempt to predict membership in a category, are known as classification problems.
In this chapter, we explore the first of several techniques that we will use to model classification problems: logistic regression. While linear regression seeks to predict a numeric response, logistic regression seeks to predict the probability of a categorical response. As you will see in this chapter, we can then extend logistic regression to handle cases where there are more than two possible outcomes.
By the end of this chapter, you will have learned the following:
- The difference between regression and classification
- The underlying statistical principles and concepts behind logistic regression
- How logistic regression fits into the larger family of generalized linear models ...
Get Practical Machine Learning in R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.