Chapter 1. Introduction to Healthcare Data
Healthcare data is an exciting vertical for data science, and there are many opportunities to have real impact, whether from a clinical or technical perspective. For patients and clinicians, there is the alluring promise of truly personalized care where patients get the right treatment at the right time, tailored to their genetics, environment, beliefs, and lifestyle—each requiring effective integration, harmonization, and analysis of highly complex data. For data scientists and computer scientists, there are many open problems for natural language processing, graphs, semantic web, and databases, among many others.
Additionally, there are “frontier” problems that arise given the specific combination of a specific technology and the nuances and complexities of healthcare. For example, there is nothing about healthcare data itself nor data science that requires “regulatory-grade” reproducibility. Data scientists know how to use version control tools such as Git, and IT people know how to create database snapshots and use Docker containers. However, with regulatory bodies such as the US Food and Drug Administration (FDA) or the European Medicines Agency (EMA), there are specific requirements to track and store metadata and other artifacts to “prove” the results of the analysis, including reproducibility. Similarly, there is increasing desire and pressure to ensure reproducibility of studies or the sharing of negative results among academics. ...