How to do it...

  1. First, we need to create some data. For example, let's simulate heights of both women and men. We'll use this example throughout this recipe. It's a simple example, but hopefully it will illustrate what we're trying to accomplish in an N-dimensional space, which is a little easier to visualize:
import numpy as np N = 1000in_m = 72in_w = 66s_m = 2s_w = s_mm = np.random.normal(in_m, s_m, N)w = np.random.normal(in_w, s_w, N)from matplotlib import pyplot as plt%matplotlib inlinef, ax = plt.subplots(figsize=(7, 5))ax.set_title("Histogram of Heights")ax.hist(m, alpha=.5, label="Men");ax.hist(w, alpha=.5, label="Women");ax.legend()

This is the output:

  1. Next, we might be interested in subsampling the group, fitting the distribution, ...

Get scikit-learn Cookbook - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.