With the rise of data science as a business-critical capability, enterprises are creating and deploying data science models as applications that require regular upkeep as data shifts over time. This is due to the changing data inputs and the insights gained from using the model over time. Many organizations include feedback loops or quality measures that deliver real-time or near-real-time reports on the efficacy of a particular model, allowing them to observe when outputs of the model deteriorate. In this way, a handful of initial models can quickly be refined by Open Data Science teams into “model factories” where tens to hundreds of deployed models may be “live” at any given time. These are then coupled to the results generated by these models, and it is clear that model management quickly becomes a critical requirement of the Open Data Science environment.
In this final chapter, we will explore why models have to be continuously evaluated as part of a data science lifecycle and what can be done to combat “data model drift.”
In the course of day-to-day business, many quantitative models are created, often without clear visibility on their number, variations, and origins. Many, if not most, are good only for temporary or one-off scenarios; however, it can be hard to predict in advance which will survive, be enhanced, and promoted to wider use.
Imagine a scenario where an executive contacts the analytics ...