Performing discretization and quantiling of data

Discretization is a means of slicing up continuous data into a set of "bins." Each value is then associated with a representative bin. The resulting distribution of the count of values in each bin can then be used to get an understanding of relative distribution of data across the different bins.

Discretization in pandas is performed using the pd.cut() and pd.qcut() functions. To demonstrate, let's start with the following set of 10000 random numbers created with a normal random number generator:

This code shows us the mean and standard deviation of this dataset, which we expect to approach ...

Get Learning pandas - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.