Statsmodels

The first library we'll cover is the statsmodels library (http://statsmodels.sourceforge.net/). Statsmodels is a Python package that is well documented and developed for exploring data, estimating models, and running statistical tests. Let's use it here to build a simple linear regression model of the relationship between sepal length and sepal width for the setosa species.

First, let's visually inspect the relationship with a scatterplot:

fig, ax = plt.subplots(figsize=(7,7)) 
ax.scatter(df['sepal width (cm)'][:50], df['sepal length (cm)'][:50]) 
ax.set_ylabel('Sepal Length') 
ax.set_xlabel('Sepal Width') 
ax.set_title('Setosa Sepal Width vs. Sepal Length', fontsize=14, y=1.02) 

The preceding code generates the following output: ...

Get Python Machine Learning Blueprints - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.