Chapter 6. Machine Learning and Analytics

Up until now, we have spent quite a bit of time focused on how to think about healthcare real-world data, and how we can start engineering it. Of course, we are engineering these data so that we can analyze them and extract insights! When most of us (myself included) started working with data, we probably wanted to dive straight into machine learning and build predictive models. Then, we found ourselves constantly manipulating data, transforming it from one dataframe to another. Nearly every library out there doing something with data expects the input to be in the form of a dataframe, a tabular structure that fits well with data from CSV files and relational databases, but less so with data from document or graph databases.

So, we have spent all this time looking at the complexities of RWD and talked about how graphs would be great. How do we connect this to all of these analytics tools that want things in neat little tables? Or, are there alternative approaches?

In this chapter, we start to discuss how we can connect RWD (especially if in a graph) to analytics, and machine learning in particular. We start with a simple approach to extract a subset of the graph (also known as a subgraph) into a table/dataframe. Following that, we look at the machine learning pipeline overall and how graphs fit into the process of exploratory data analysis and feature engineering (including feature stores). Finally, we finish with integrating the graph ...

Get Hands-On Healthcare Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.