Chapter 5 Data-Intensive Compute Infrastructures

Content contributed by Dijiang Huang, Yuli Deng, Jay Etchings, Zhiyuan Ma, and Guangchun Luo

The key element of computer design—software as well as hardware—is to manage the complexity from the lower levels of logical circuits to ever-higher levels that nest above one another. One may compare this to the number of neurons in the brain of animals from the flatworm, to a cat, to Homo sapiens, although the history of artificial intelligence research has shown that comparing a human brain to a computer can distort as much as clarify.

—P. E. Ceruzzi, Computing: A Concise History

According to many, we have now entered a new phase of science, known as fourth paradigm science. The term “fourth paradigm science” was popularized by the late Jim Gray, a Microsoft researcher who foresaw “a world of scholarly resources—text, databases, and other associated materials—that were seamlessly navigable and interoperable” [1]. Fourth paradigm science is data intensive and depends on advances in networking as well as compute power. A 2009 collection of essays published by Microsoft Research explores this new scientific methodology driven by data-intensive problems and surveys some of the exciting research in this area [2].

While the scope and nature of this paradigm shift (if indeed it is truly a paradigm shift) are open for debate, it is indisputable that the biosciences are engaging more and more deeply with big data. This trend is likely to continue ...

Get Strategies in Biomedical Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.