Modeling the data

Let's begin modeling by using our dataset. We're going to examine the effect that the ZIP code and the number of bedrooms have on the rental price. We'll use two packages here: the first, statsmodels, we introduced in Chapter 1, The Python Machine Learning Ecosystem, but the second, patsy,, is a package that makes working with statsmodels easier. Patsy allows you to use R-style formulas when running a regression. Let's do that now:

import patsy 
import statsmodels.api as sm 
f = 'rent ~ zip + beds' 
y, X = patsy.dmatrices(f, zdf, return_type='dataframe') 
results = sm.OLS(y, X).fit() 

The preceding code generates the following output:

Note that the preceding ...

Get Python Machine Learning Blueprints - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.