Skip to Content
Python Data Science Handbook, 2nd Edition
book

Python Data Science Handbook, 2nd Edition

by Jake VanderPlas
December 2022
Beginner to intermediate
588 pages
13h 43m
English
O'Reilly Media, Inc.
Content preview from Python Data Science Handbook, 2nd Edition

Chapter 49. In Depth: Kernel Density Estimation

In Chapter 48 we covered Gaussian mixture models, which are a kind of hybrid between a clustering estimator and a density estimator. Recall that a density estimator is an algorithm that takes a D -dimensional dataset and produces an estimate of the D -dimensional probability distribution that data is drawn from. The GMM algorithm accomplishes this by representing the density as a weighted sum of Gaussian distributions. Kernel density estimation (KDE) is in some senses an algorithm that takes the mixture-of-Gaussians idea to its logical extreme: it uses a mixture consisting of one Gaussian component per point, resulting in an essentially nonparametric estimator of density. In this chapter, we will explore the motivation and uses of KDE.

We begin with the standard imports:

In [1]: %matplotlib inline
        import matplotlib.pyplot as plt
        plt.style.use('seaborn-whitegrid')
        import numpy as np

Motivating Kernel Density Estimation: Histograms

As mentioned previously, a density estimator is an algorithm that seeks to model the probability distribution that generated a dataset. For one-dimensional data, you are probably already familiar with one simple density estimator: the histogram. A histogram divides the data into discrete bins, counts the number of points that fall in each bin, and then visualizes the results in an intuitive manner.

For example, let’s create some data that is drawn from two normal distributions:

In [2]: def make_data
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Science Handbook

Python Data Science Handbook

Jake VanderPlas

Publisher Resources

ISBN: 9781098121211Errata PageSupplemental Content