To get an initial understanding of the way logistic regression works, let's first take a look at the following example, where we have artificial feature values, X, plotted with the corresponding classes, 0 or 1:
from scipy.stats import normnp.random.seed(3) # for reproducibilityNUM_PER_CLASS = 40X_log = np.hstack((norm.rvs(2, size=NUM_PER_CLASS, scale=2), norm.rvs(8, size=NUM_PER_CLASS, scale=3)))y_log = np.hstack((np.zeros(NUM_PER_CLASS), np.ones(NUM_PER_CLASS))).astype(int)plt.xlim((-5, 20))plt.scatter(X_log, y_log, c=np.array(['blue', 'red'])[y_log], s=10)plt.xlabel("feature value")plt.ylabel("class")
Refer to the following graph:
As we can see, the data so noisy that classes overlap in the feature ...