Chapter 12. The Data Science Lifecycle

With the rise of data science as a business-critical capability, enterprises are creating and deploying data science models as applications that require regular upkeep as data shifts over time. This is due to the changing data inputs and the insights gained from using the model over time. Many organizations include feedback loops or quality measures that deliver real-time or near-real-time reports on the efficacy of a particular model, allowing them to observe when outputs of the model deteriorate. In this way, a handful of initial models can quickly be refined by Open Data Science teams into “model factories” where tens to hundreds of deployed models may be “live” at any given time. These are then coupled to the results generated by these models, and it is clear that model management quickly becomes a critical requirement of the Open Data Science environment.

In this final chapter, we will explore why models have to be continuously evaluated as part of a data science lifecycle and what can be done to combat “data model drift.”

Models As Living, Breathing Entities

In the course of day-to-day business, many quantitative models are created, often without clear visibility on their number, variations, and origins. Many, if not most, are good only for temporary or one-off scenarios; however, it can be hard to predict in advance which will survive, be enhanced, and promoted to wider use.

Imagine a scenario where an executive contacts the analytics ...

Get Breaking Data Science Open now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.