♣19♣Data Binning

19.1 What is Binning and Why Use It

Histograms are well known, and they are an example of “data-binning” (also known as “discrete binning,” “bucketing,” or simply “binning”). They are used in order to visualize the underlying distributions for data that has a limited number of observations.

Consider the following simple example where we start with data drawn from a known distribution and plot the histogram (the output of this code is in Figure 19.1 on page 344):

Other examples of data binning are in image processing. When small shifts in the spectral dimension from mass spectrometry (MS) or nuclear magnetic resonance (NMR) could be falsely interpreted as representing different components, binning will help. Binning allows to reduce the spectrum in resolution to a sufficient degree to ensure that a given peak remains in its bin despite small spectral shifts between analyses. Also, several digital camera systems use a pixel binning function to improve image contrast and reduce noise.

Binning reduces the effects of minor observation errors, especially when the observations are sparse, binning will bring more stability. The original data values which fall in a given small interval, a bin, are replaced by a value representative of that interval, often the central value. It is a form of quantization. Statistical data binning is ...

Get The Big R-Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.