The emergence of data science is radically transforming the biomedical knowledge generation paradigm. While modern biomedicine has been a pioneer in evidence-based science, its approach for decades has largely followed a well-worn path of experimental design, data collection, analysis, and interpretation. Data science introduces an alternative pathway—one that starts with the vast collections of diverse digital data increasingly accessible to the community.

While the data science evidence generation concept has many birth parents, Jim Gray of Microsoft best described the unique opportunity afforded by this new paradigm. In a 2007 address to the U.S. National Research Council, Gray argued: “With an exaflood of unexamined data and teraflops of cheap computing power, we should be able to make many valuable discoveries simply by searching all that information for unexpected patterns” [1]. Gray coined the phrase “data-intensive scientific discovery.” Notably, he broke with the high-performance computing “high priests” and advocated the adoption of new models of computing. Following Gray’s untimely death shortly after his address, his colleagues captured this concept in a collection of essays ultimately published as The Fourth Paradigm: Data-Intensive Scientific Discovery [2]. It was within these essays that the term “big data” was introduced.

“Data science” and “big data” are now overburdened terms with many meanings. The most useful definitions are operational in nature. ...

Get Strategies in Biomedical Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.