Statsmodels

The first library we'll cover is the statsmodels library (http://statsmodels.sourceforge.net/). Statsmodels is a Python package that is well documented and developed for exploring data, estimating models, and running statistical tests. Let's use it here to build a simple linear regression model of the relationship between sepal length and sepal width for the setosa species.

First, let's visually inspect the relationship with a scatterplot:

fig, ax = plt.subplots(figsize=(7,7)) 
ax.scatter(df['sepal width (cm)'][:50], df['sepal length (cm)'][:50]) 
ax.set_ylabel('Sepal Length') 
ax.set_xlabel('Sepal Width') 
ax.set_title('Setosa Sepal Width vs. Sepal Length', fontsize=14, y=1.02) 

The preceding code generates the following output: ...

Get Python Machine Learning Blueprints - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.