CHAPTER 10Logistic Regression
In this chapter, we describe the highly popular and powerful classification method called logistic regression. Like linear regression, it relies on a specific model relating the predictors with the outcome. The user must specify the predictors to include as well as their form (e.g., including any interaction terms). This means that even small datasets can be used for building logistic regression classifiers, and that once the model is estimated, it is computationally fast and cheap to classify even large samples of new records. We describe the logistic regression model formulation and its estimation from data. We also explain the concepts of “logit,” “odds,” and “probability” of an event that arise in the logistic model context and the relations among the three. We discuss variable importance and coefficient interpretation, as well as variable selection for dimension reduction, and extensions to multi-class classification.
Python
In this chapter, we will use pandas for data handling, scikit-learn and statsmodels for the models, and matplotlib for visualization. We will also make use of the utility functions from the Python Utilities Functions Appendix. Use the following import statements for the Python code in this chapter.
import required functionality for this chapter
import numpy as np import pandas as pd from sklearn.linear_model import LogisticRegression, ...
Get Data Mining for Business Analytics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.