Chapter 10. How Perl Saved the Human Genome Project

Lincoln D. Stein

Author’s Note: It is now six years since I wrote this article, and though much has changed, a surprising amount has remained the same. The human genome was successfully sequenced about a year ago, thanks in no small part to thousands of Perl scripts large and small, and the human genome project has now spawned genome sequencing projects for such organisms as the mouse, the chicken, the cow, the mosquito, the honeybee, the chimp, and—believe it or not—the duck-billed platypus.

The BoulderIO system described in the body of the text has long since been supplanted by a powerful and flexible body of code called BioPerl (http://www.bioperl.org), the collective work of dozens of committed programmers and biologists.

Perl remains the savior of the genome project now more than ever. Just a few weeks ago I found myself sitting in an auditorium listening to Jim Mullikin of the Wellcome Trust Sanger Institute describe how he had solved a problem that was once thought insurmountable: to assemble an entire genome (the mouse, in this case) in a single shot, without the tedious experimental mapping and subcloning that was previously thought to be critical to make the problem soluble. His genome assembly software, named Phusion, is a pipeline of Perl scripts wrapped around a nugget of high-performance C code. As Jim put it, “Perl and 70 gigabytes of main memory is all you need!”

DATE: February 1996

LOCATION: Cambridge, England, in the ...

Get Games, Diversions & Perl Culture now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.