Chapter 1. Why Model Management?

90% of the effort in successful machine learning is not about the algorithm or the model or the learning. It’s about logistics.

Why is model management an issue for machine learning, and what do you need to know in order to do it successfully?

In this book, we explore the logistics of machine learning, lumping various aspects of successful logistics under the topic “model management.” This process must deal with data flow and handle multiple models as well as collect and analyze metrics throughout the life cycle of models. Model management is not the exciting part of machine learning—the cool new algorithms and machine learning tools—but it is the part that unless it is done well is most likely to cause you to fail. Model management is an essential, ubiquitous and critical need across all types of machine learning and deep learning projects. We describe what’s involved, what can make a difference to your success, and propose a design—the rendezvous architecture—that makes it much easier for you to handle logistics for a whole range of machine learning use cases.

The increasing need to deal with machine learning logistics is a natural outgrowth of the big data movement, especially as machine learning provides a powerful way to meet the huge and, until recently, largely unmet demand for ways to extract value from data at scale. Machine learning is becoming a mainstream activity for a large and growing number of businesses and research organizations. Because of the growth rate in the field, in five years’ time, the majority of people doing machine learning will likely have less than five years of experience. The many newcomers to the field need practical, real-world advice.

The Best Tool for Machine Learning

One of the first questions that often arises with newcomers is, “What’s the best tool for machine learning?” It makes sense to ask, but we recently found that the answer is somewhat surprising. Organizations that successfully put machine learning to work generally don’t limit themselves to just one “best” tool. Among a sample group of large customers that we asked, 5 was the smallest number of machine learning packages in their toolbox, and some had as many as 12.

Why use so many machine learning tools? Many organizations have more than one machine learning project in play at any given time. Different projects have different goals, settings, types of data, or are expected to work at different scale or with a wide range of Service-Level Agreements (SLAs). The tool that is optimal in one situation might not be the best in another, even similar, project. You can’t always predict which technology will give you the best results in a new situation. Plus, the world changes over time: even if a model is successful in production today, you must continue to evaluate it against new options.

A strong approach is to try out more than one tool as you build and evaluate models for any particular goal. Not all tools are of equal quality; you will find some to be generally much more effective than others, but among those you find to be good choices, likely you’ll keep several around.

Tools for Deep Learning

Take deep learning, for example. Deep learning, a specialized subarea of machine learning, is getting a lot of attention lately, and for good reason. This is an over simplified description, but deep learning is a method that does learning in a hierarchy of layers—the output of decisions from one layer feeds the decisions of the next. The most commonly used style of machine learning used in deep learning is patterned on the connections within the human brain, known as neural networks. Although the number of connections in a human-designed deep learning system is enormously smaller than the staggering number of connections in the neural networks of a human brain, the power of this style of decision-making can be similar for particular tasks.

Deep learning is useful in a variety of settings, but it’s especially good for image or speech recognition. The very sophisticated math behind deep learning approaches and tools can, in many cases, result in a surprisingly simple and accessible experience for the practitioner using these new tools. That’s part of the reason for their exploding popularity. New tools specialized for deep learning include TensorFlow (originally developed by Google), MXNet (a newly incubating Apache Software Foundation project with strong support from Amazon), and Caffe (which originated with work of a PhD student and others at the UC Berkeley Vision and Learning Center). Another widely used machine learning technology with broader applications, H2O, also has effective deep learning algorithms (it was developed by data scientist Arno Candel and others).

Although there is no single “best” specialized machine learning tool, it is important to have an overall technology that effectively handles data flow and model management for your project. In some ways, the best tool for machine learning is the data platform you use to deal with the logistics.

Fundamental Needs Cut Across Different Projects

Just because it’s common to work with multiple machine learning tools doesn’t mean you need to change the underlying technology you use to handle logistics with each different situation. There are some fundamental requirements that cut across projects; regardless of the tool or tools you use for machine learning or even what types of models you build, the problems of logistics are going to be nearly the same.

Many aspects of the logistics of data flow and model management can best be handled at the data-platform level rather than the application level, thus freeing up data scientists and data engineers to focus more on the goals of machine learning itself.


With the right capabilities, the underlying data platform can handle the logistics across a variety of machine learning systems in a unified way.

Machine learning model management is a serious business, but before we delve into the challenges and discover some practical solutions, first let’s have some fun.

Tensors in the Henhouse

Internet of Things (IoT) sensor data, deep learning image detection, and chickens—these are not three things you’d expect to find together. But a recent machine learning project designed and built by our friend and colleague, Ian Downard, put them together into what he described as “an over-engineered attempt” to detect blue jays in his hens’ nesting box and chase the jays away before they break any eggs. Here’s what happened.

The excitement and lure of deep learning using TensorFlow took hold for Ian when he heard a presentation at Strata San Jose by Google developer evangelists. In a recent blog, Ian reported that this presentation was, to a machine learning novice such as himself, “... nothing less than jaw dropping.” He got the itch to try out TensorFlow himself. Ian is a skilled data engineer but relatively new to machine learning. Even so, he plunged in to build a predator detection system for his henhouse—a fun project, and a good way to do a proof-of-concept and get a little experience with tensor computation. It’s also a simple example that we can use to highlight some of the concerns you will face in more serious real-world projects.

The fact that Ian could do this himself shows the surprising accessibility of working with tensors and TensorFlow, despite the sophistication of how they work. This instance is, of course, a sort of toy project, but it does show the promise of these methods.

Defining the Problem and the Project Goal

The goal is to protect eggs against attack by blue jays. The specific goal for the machine learning step is to detect motion that activates the system and then differentiate between chickens and jays, as shown in Figure 1-1. This project had a limited initial goal: just to be able to detect jays. How to act on that knowledge in order to protect eggs is yet to come.

Image recognition using TensorFlow is at the heart of this henhouse-intruder detection project. Results are displayed via Twitter feed @TensorChicken (tweets seem appropriate for a bird-based project).
Figure 1-1. Image recognition using TensorFlow is at the heart of this henhouse-intruder detection project. Results are displayed via Twitter feed @TensorChicken (Tweets seem appropriate for a bird-based project.)


It’s important to recognize what data is available to be collected, how decisions can be structured, and to define a sufficiently narrow goal so that it is practical to carry out. Note that domain knowledge—such as, the predator is a blue jay—is critical to the effectiveness of this project.

Planning and design

Machine learning uses an image classification system that reacts to motion detection. The deployed prototype works this way: movement is detected via a camera connected to a Raspberry Pi using an application called Motion. This triggers classification of the captured image by a TensorFlow model that has been deployed to the Pi. A Twitter feed (@TensorChicken) displays the top three scores; in the example shown in Figure 1-1, a Rhode Island Red chicken has been correctly identified.

For training during development, several thousand images captured from the webcam were manually saved as files in directories labeled according to categories to be used by the classification model. For the model, Ian took advantage of a pre-built TensorFlow called Inception v3 that he customized using the henhouse training images. Figure 1-2 shows the overall project design.

Data flow for a prototype blue jay detection project using tensors in the henhouse. Details are available in this blog (image courtesy of Ian Downard).
Figure 1-2. Data flow for a prototype blue jay detection project using tensors in the henhouse. Details are available on the Big Endian Data blog (image courtesy of Ian Downard).


The design provides a reasonable way to collect data for training, takes advantage of simplified model development by using Inception-v3 because it is sufficient for the goals of this project, and the model can be deployed to the IoT edge.


One issue with the design, however, is that the 30 seconds required for the classification step on the Pi are probably too slow to detect the blue jay in time to take an action to stop it from destroying eggs. That’s an aspect of the design that Ian is already planning to address by running the model on a MapR edge cluster (a small footprint cluster) that can classify images within 5 seconds.

Retrain/update the model

A strength of the prototype design for this toy project is that it takes into account the need to retrain or update new models that will be in line to be deployed as time passes. See Figure 1-2. One potential way to do this is to make use of social responses to the Twitter feed @TensorChicken, although details remain to be determined.


Retraining or updating models as well as testing and rolling out entirely new models is an important aspect of successful machine learning. This is another reason that you will need to manage multiple models, even for a single project. Also note the importance of domain knowledge: After model deployment, Ian realized that some of his chickens were not of the type he thought. The model had been trained to erroneously identify some chickens as Buff Orpingtons. As it turns out, they are Plymouth Rocks. Ian retrained the model, and this shift in results is used as an example in Chapter 7.

Expanding project goals

Originally Ian just planned to classify images for the type of bird (jay or type of chicken), but soon he wanted to expand the scope to know whether or not the door was open and when the nest is empty.


The power of machine learning often leads to mission creep. After you see what you can do, you may begin to notice new ways that machine learning can produce useful results.

Real-World Considerations

This small tensor-in-the-henhouse project was useful as a way to get started with deep learning image detection and the requirements of building a machine learning project, but what would happen if you tried to scale this to a business-level chicken farm or a commercial enterprise that supplies eggs from a large group of farms to retail outlets? As Ian points out in his blog:

Imagine a high-tech chicken farm where potentially hundreds of chickens are continuously monitored by smart cameras looking for predators, animal sickness, and other environmental threats. In scenarios like this, you’ll quickly run into challenges...

Data scale, SLAs, a variety of IoT data sources and locations as well as the need to store and share both raw data and outputs with multiple applications or teams, likely in different locations, all complicate the matter. The same issues are true in other industries. Machine learning in the real world requires capable management of logistics, a challenge for any DataOps team. (If you’re not familiar with the concept of DataOps, don’t worry, we describe it in Chapter 2).

People new to machine learning may think of model management, for instance, as just a need to assign versions to models, but it turns out to be much more than that. Model management in the real world is a powerful process that deals with large-scale changing data and changing goals, and with ways to deal with models in isolation so that they can be evaluated in specifically customized, controlled environments. This is a fluid process.

Myth of the Unitary Model

A persistent misperception in machine learning, particularly by software engineers, is that the project consists of building a single successful model, and after it is deployed, you’re done. The real situation is quite different. Machine learning involves working with many models, even after you’ve deployed a model into production—it’s common to have multiple models in production at any given time. In addition, you’ll have new models being readied to replace production models as situations change. These replacements will have to be done smoothly, without interruptions to service if possible. In development, you’ll work with more than one model as you experiment with multiple tools and compare models. That is what you have with a single project, and that’s multiplied in other projects across the organization, maybe a hundred-fold.

One of the major causes for the need for so many models is mission creep. This is an unavoidable cost of fielding a successful model; once you have a one win, you will be expected to build on it and repeat it in new areas. Pretty soon, you have models depending on models in a much more complex system than you planned for initially.

The innovative architecture described in this book can be a key part of a solution that meets these challenges. This architecture must take into account the pragmatic business-driven concerns that motivate many aspects of model management.

What Should You Ask about Model Management?

As you read this book, think about the machine learning projects you currently have underway or that you plan to build, and then ask how a model management system would effectively handle the logistics, given the solutions we propose. Here’s a sample of questions you might ask:

  • Is there a way to save data in raw-ish form to use in training later models? You don’t always know what features will be valuable as you move forward. Saving raw data preserves data characteristics valuable for multiple projects.

  • Does your system adequately and conveniently support multitenancy, including sharing the same data without interference?

  • Do you have a way to efficiently deploy models and share data across data centers or edge processing in different locations, on premises, in cloud, or with a hybrid design?

  • Is there a way to monitor and evaluate performance in development as well as to compare models?

  • Can your system deploy models to production with ongoing validation of performance in this setting?

  • Can you stage models into the production system for testing without disturbing system operation?

  • Does your system easily handle hot hand-offs so new models can seamlessly replace a model in production?

  • Do you have automated fall back? (for instance, if a model is not responding within a specified time, is there an automated step that will go to a secondary model instead?)

  • Are your models functioning in a precisely specified and documented environment?

The recipe for meeting these requirements is the rendezvous architecture. Chapter 2 looks at some of the ingredients that go into that recipe.

Get Machine Learning Logistics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.