Hands-On Healthcare Data

Book description

Healthcare is the next frontier for data science. Using the latest in machine learning, deep learning, and natural language processing, you'll be able to solve healthcare's most pressing problems: reducing cost of care, ensuring patients get the best treatment, and increasing accessibility for the underserved. But first, you have to learn how to access and make sense of all that data.

This book provides pragmatic and hands-on solutions for working with healthcare data, from data extraction to cleaning and harmonization to feature engineering. Author Andrew Nguyen covers specific ML and deep learning examples with a focus on producing high-quality data. You'll discover how graph technologies help you connect disparate data sources so you can solve healthcare's most challenging problems using advanced analytics.

You'll learn:

  • Different types of healthcare data: electronic health records, clinical registries and trials, digital health tools, and claims data
  • The challenges of working with healthcare data, especially when trying to aggregate data from multiple sources
  • Current options for extracting structured data from clinical text
  • How to make trade-offs when using tools and frameworks for normalizing structured healthcare data
  • How to harmonize healthcare data using terminologies, ontologies, and mappings and crosswalks

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. O’Reilly Online Learning
    4. How to Contact Us
    5. Acknowledgments
  3. 1. Introduction to Healthcare Data
    1. The Enterprise Mindset
    2. The Complexity of Healthcare Data
    3. Sources of Healthcare Data
      1. Electronic Health Records
      2. Claims Data
      3. Clinical/Disease Registries
      4. Clinical Trials Data
    4. Data Collection and How That Affects Data Scientists
      1. Prospective studies
      2. Retrospective studies
    5. Conclusion
  4. 2. Technical Introduction
    1. Basic Introduction to Docker and Containers
      1. Installing and Testing Docker
    2. Conceptual Introduction to Databases
      1. ACID Compliance
      2. OLTP Systems
      3. OLAP Systems
      4. SQL Versus NoSQL
      5. SQL Databases
      6. (Labeled) Property Graph Databases
      7. Hypergraph Databases
      8. Resource Description Framework Databases
      9. Conclusion
  5. 3. Standardized Vocabularies in Healthcare
    1. Controlled Vocabularies, Terminologies, and Ontologies
    2. Key Considerations
      1. Pre-coordination Versus Post-coordination
    3. Case Study Example: EHR Data
    4. Common Terminologies
      1. CPT
      2. ICD-9 and ICD-10
      3. LOINC
      4. RxNorm
      5. SNOMED CT
      6. Key Takeaways
    5. Using the Unified Medical Language System
      1. Some Basic Definitions
      2. Concept Orientation
      3. Working with the UMLS
      4. UMLS and Relational Databases
      5. Preprocessing the UMLS
      6. UMLS and Property Graph Databases
      7. UMLS and Hypergraph Databases
      8. Review of the UMLS
    6. Conclusion
  6. 4. Deep Dive: Electronic Health Records Data
    1. Publicly Accessible Data
      1. Medical Information Mart for Intensive Care
      2. Synthea
    2. Data Models
      1. Goals
      2. Examples of Data Models
    3. Case Study: Medications
      1. The Medication Harmonization Problem
      2. Technical Deep Dive
      3. Connecting to the UMLS
    4. Difficulties Normalizing Structured Medical Data
    5. Conclusion
  7. 5. Deep Dive: Claims Data
    1. Publicly Accessible Data—SynPUF
    2. Data Models
      1. Choosing a Data Model
      2. Combining Claims and EHR Data
    3. Case Study: Combining Diagnoses and Medications
      1. OMOP Versus Graphs
      2. Considerations When Combining Different Sources of Healthcare Data
    4. Conclusion
  8. 6. Machine Learning and Analytics
    1. A Primer on Machine Learning
      1. What Is Feature Engineering?
      2. Graph-Based Deep Learning
    2. Extracting Data as a Table
      1. To SQL or Not to SQL
      2. Querying OMOP Data
      3. From Graphs to Dataframes
      4. Why Add the Complexity of Graphs?
    3. Machine Learning and Feature Engineering with Graphs
    4. Graph Embeddings
      1. node2vec
      2. cui2vec
      3. med2vec
      4. snomed2vec
      5. Some Final Thoughts About Embeddings
    5. Making the Case for Graph-Based Analysis
    6. Conclusion
  9. 7. Trends in Healthcare Analytics
    1. Federated Learning and Federated Analytics
      1. How Does Federated Learning Work?
      2. Why Federated Analytics/Learning?
      3. The Data Harmonization Challenge in a Federated Context
      4. Graphs and Federated Approaches
    2. Natural Language Processing
      1. Concept Extraction
      2. Beyond Concept Extraction
      3. Clinical NLP Tools
      4. Commercial Clinical NLP Solutions
      5. Key Differences Between Clinical NLP and Other Applications of NLP
    3. Conclusion
  10. 8. Graphs, Harmonization, and Some Final Thoughts
    1. Other Types of Healthcare RWD
    2. Data Normalization and Harmonization
      1. Merging Datasets
      2. Bridging IT and the Business
      3. It’s a Human, Not Technical, Problem
    3. Graphs Can Be Part of the Solution
    4. Graphs Are Not a Silver Bullet
    5. Conclusion
  11. Index
  12. About the Author

Product information

  • Title: Hands-On Healthcare Data
  • Author(s): Andrew Nguyen
  • Release date: August 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098112929