December 2017
Beginner to intermediate
410 pages
12h 45m
English
Not every response variable will be continuous, so a linear regression will not be the correct model in every circumstance. Some outcomes may contain binary data (e.g., sick, not-sick), or even count data (e.g., how many heads will I get). A general class of models called “generalized linear models” (GLM) can account for these types of data, yet still use a linear combination of predictors.
When you have a binary response variable, logistic regression is often used to model the data. Here’s some data from the American Community Survey (ACS) for New York.
import pandas as pd acs = pd.read_csv('../data/acs_ny.csv') print(acs.columns)