Chapter 22

Analysis of Large Data Sets

A Cautionary Tale of the Perils of Binning Data

R.W. Rumpf; J. Gonya; W.C. Ray    The Research Institute at Nationwide Children’s Hospital, Columbus, OH, United States

Abstract

Continuous data frequently must be binned for use with analysis algorithms that require discrete data. This binning can introduce ambiguity into the subsequent analysis, and affect the statistical strength as well as the analysis outcome. Using two different binning schemes (a quartile distribution where the data range of each bin was identical but the number of data points was different, and a manual distribution where the data range of the bins were different but the bins each contained the same number of data points) on the ...

Get Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.