book

Bioinformatics with Python Cookbook

by Tiago Antao

June 2015

Intermediate to advanced

306 pages

6h 50m

English

Packt Publishing

Read now

Unlock full access

Content preview from Bioinformatics with Python Cookbook

Computing the median in a large dataset

As you have seen in the first recipe, computing the median requires having all the values available. With something like a mean, we just need an accumulator and a counter. The fundamental point of this recipe is to introduce the idea of approximate computing; with big data, it may not always be the best strategy to get the precise value (of course, this should be evaluated on a case-by-case basis).

Getting ready

We will require the first recipe to have been fully run.

Here, we will take two different strategies to compute the median: approximating the data points in a way that allows compression of data and subsampling of data.

As usual, this is available in the 08_Advanced/Median.ipynb notebook.