We are going to begin with the simplest possible classification problem: two classes, setosa and versicolor, and just one independent variable or feature, the sepal_length. As it is usually done, we are going to encode the setosa and versicolor categorical variables with the numbers 0 and 1. Using pandas, we can do the following:
df = iris.query("species == ('setosa', 'versicolor')")y_0 = pd.Categorical(df['species']).codesx_n = 'sepal_length' x_0 = df[x_n].valuesx_c = x_0 - x_0.mean()
As with other linear models, centering the data can help with the sampling. Now that we have the data in the proper format, we can finally build the model with PyMC3.
Notice how the first part of model_0 resembles ...