Chapter 1. Machine Learning and Deep Learning Models in the Cloud

It doesn’t seem that long ago that artificial intelligence (AI) was a dream. The idea that a machine could simulate and even beat humans at games of skill, image recognition, and predictions was preposterous 20 years ago. Now, the average user brushes up against some form of machine learning every day and everywhere—from our cars to stores to doctors’ offices, and throughout our homes.

We are living at the dawn of thinking machines. But how do they think? What do they use to build models of the world? And how can we, as developers, use these tools to make our systems smarter, more responsive, and more lifelike?

Our goal in this book is to discuss the basics of machine learning and to show you, in a step-by-step introduction, how to implement and code machine learning into your projects using serverless systems and pretrained models. We can think of machine learning as a tool for interacting with an ever-changing world using models that change and grow as they experience more of that world. In other words, it’s how we teach computers new things without explicitly programming them to do anything.

An Introduction to Machine Learning

The AI discipline, of which machine learning is a part, was born in the 1950s, during the Cold War, as a promise to develop systems that could solve complex problems. At that time, computers were not powerful enough for the task. Over the years, AI began to encompasses many different subdisciplines, from algorithms used to find a maze exit to systems that could recognize human emotions, drive cars, and predict future outcomes.

Machine learning, also often referred to as ML, is the study and creation of algorithms that become better at a specific task when they receive more data about that task. Most machine learning models begin with “training data” such as a large group of emails or a huge folder of images, which the machine processes and begins to “understand” statistically. Because machine learning assumes no active human programming, the machine algorithm itself changes and grows as it addresses the training data. After a while, the algorithm is ready for “real” data. It continues to evolve as it processes new information, ultimately leading to an answer or solution without outside intervention.

Machine learning is important for the simple reason that not all problems consist of a closed set of variables and routines. For example, a machine learning model could be tasked to analyze spots on the skin for cancer. Because every spot is different, the machine learning model will categorize each spot based on its own statistically chosen criteria and, when queried, will return its best guess as to whether a spot is cancerous—which, in many cases, is as good or better than a human guess. Traditional algorithms solve very specific, bounded problems. Machine learning lets us train a model to solve something that we initially might not know how to do.

We must also remember the difference between algorithms and training. You can have an algorithm for facial recognition that uses a set of training data passed into the system by the developer. But if your goal is something else—to look for vehicle license plates in pictures of cars, for example—you might use the same simple pattern recognition algorithm with the new dataset. Depending on your use case, you could either look for a model that has the algorithm and present new data on which to train it, or look for a model that’s already trained for the problem that you are trying to solve.

Models aren’t always perfect, however. Researcher Victoria Krakovna created a popular list of machine learning “mistakes” in which the machine learned to achieve goals in ways that didn’t address the problems that humans were trying to solve. In a life simulation game, for example, “creatures bred for jumping were evaluated on the height of the block that was originally closest to the ground. The creatures developed a long vertical pole and flipped over instead of jumping.” Another foible appeared in the video game Sims, in which “creatures exploited physics simulation bugs by twitching, which accumulated simulator errors and allowed them to travel at unrealistic speeds.” Our absolute favorite, however, involves the cannibalistic parents of another life simulation. As Krakovna describes:

In an artificial life simulation where survival required energy but giving birth had no energy cost, one species evolved a sedentary lifestyle that consisted mostly of mating in order to produce new children which could be eaten (or used as mates to produce more edible children).

These comical examples—edible children?—point to the sometimes weird conclusions that machine learning systems make when looking at data that humans would find opaque. But we can’t blame these machines for their will to “win.” After all, that’s all they’re programmed to do!

Machine learning algorithms are very diverse, having many different models that we can use for solving problems like prediction of outcomes and estimation (in which some value fluctuates by some natural causes) and classification of elements (e.g., examining satellite pictures to determine which areas are urban, which are forests, and which are bodies of water).

Examples of machine learning algorithms include the following:

Anomaly detection

This can identify rare items, events, or observations that differ significantly from the majority of the data. It can be used to detect bank fraud, for example.

Classification

This is good for predicting the classes of given data points. It’s useful, for example, in spam detection or identifying tumors.

Clustering

This groups a set of objects in such a way that objects in the same group are more similar in some sense to one another than to those in other groups. It is useful in pattern recognition, image analysis, data compression, biological classification, and insurance, for example.

Recommendation

Using a dataset of user-item-rating triples, this model can generate recommendations and find related items. It is used in media streaming services to recommend movies or music as well as online shops to recommend items to customers.

Regression

This is used to infer the expected quantity for an input related to a set of data points. Assuming that there is a linear, polinomic, logistic, or any other mathematical relation between inputs and outputs, the best coefficients possible of this math relation are inferred. Any type of experiment that records data uses this type of analysis to prove mathematical relation or correlation. In business, it’s used to forecast revenue, and insurance companies rely on regression analysis to estimate credit standing of policy holders and possible number of claims in a given time period.

Statistical functions

Machine learning can compute mathematical operations over a large set of data. It can also calculate correlation and probability scores and compute z-scores, as well as statistical distributions such as Weibull, gamma, and beta. Statistics has many applications to understand and model systems with a large number of elements. Governments and other organizations use it to understand data pertaining to wealth, income, crimes, and so on. Businesses use it to know what to produce and when. Social and natural scientists use it to study the demographic characteristics of a population.

Text analytics

This extracts information from text, such as most likely language used or key phrases. Sentiment analysis is an example application.

Computer vision

This is used to read text in images or handwritten notes, recognize human faces or landmarks, and analyze video in real time.

What these machine learning techniques all have in common is that they rely on learning automatically from a very large set of data. The programmer can define the algorithm architecture and some initial parameters, and then the program learns itself by studying the data.

If you want to know about what you can achieve with these techniques, here are some great examples of success stories for machine learning projects:

JFK Files

In 2017, more than 34,000 pages related to the assassination of John F. Kennedy were released by the United States government, consisting of a mixture of typed and handwritten documents. Thanks to Azure Search and Cognitive Services, a set of composable cognitive skills were applied to this data to extract knowledge and organize it into an index. Not only can this system answer many interesting questions, but you also see the answers and relationships in context with the original documents.

Snip Insights

This is an open source screen-capture desktop application that uses Azure Cognitive Services to gain instant insight into the image just captured. It can convert images to text and display information about the subject (for example, identifying celebrities and landmarks). If the image is of a product, the application will automatically search for similar products, providing information about how much each one costs and where to buy it.

Pix2Story

This is a web application that uses natural language processing (NLP) to teach an AI system, inspired by a picture, to write a machine-generated story. Captions obtained from the uploaded picture are fed to a recurrent neural network model to generate the narrative based on the genre and the contents of the picture.

Sketch2Code

This is an AI solution that converts hand-drawn renderings on paper or whiteboard to working HTML prototypes. After training the model with images of hand-drawn design elements like text boxes, buttons, or combo boxes, the Custom Vision Service performs object detection to generate the HTML snippets of each element.

To learn more about these and other examples, visit the Microsoft AI Lab website.

An Introduction to Deep Learning

Deep learning is a class of machine learning algorithms that does the following:

  • Uses a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input.

  • Learns in supervised (e.g., classification) and/or unsupervised (e.g., pattern analysis) ways.

  • Learns multiple levels of representations that correspond to different levels of abstraction. The levels form a hierarchy of concepts.

Deep learning’s use of this hierarchy of concepts and classifications contrasts with the often-brutish machine learning methods popular in the lab.

When we have many hidden layers in a model, we say that the network is a deep learning system because deep within the layers of the model it retains the knowledge it is gaining from the examples. We can use these kinds of models in a lot of groundbreaking applications thanks to the great processing power available in modern computers. But with each new layer of complexity, even more power is required to achieve new goals.

Neural Networks

The next frontier in machine learning is the neural network. The idea of the computer-based neural network has been around for decades, but only recently has hardware arrived with sufficient horsepower to make it widely useful. The neural network is built on a model of how a human brain works. We begin with a simple data processor called a perceptron, which works as a base neuron does in our own brains. Like a neuron, each perceptron takes an input, checks it against a gate function, and produces an output that can, at the same time, control other perceptrons. This net of perceptrons is called an artificial neural network, which you can see in Figure 1-1.

A neural network with two hidden layers.
Figure 1-1. A neural network with two hidden layers

But how does this system learn? Whenever a perceptron gets a strong enough signal, it reinforces the weights of its input to the activation function, so it is easier to fire again on the same conditions. A simple multilayer neural network has an input layer of perceptrons, some hidden layers that accept input from previous ones and feed the next ones, and a final output layer. There can also be intermediate layers whose outputs serve as inputs for previous layers. In this way, active feedback is built into the system.

Difficulties Defining Structure and Training Machine Learning Models

Figure 1-2 illustrates that a deep learning neural model can have a very complex structure that can be properly created only by an expert. Also, as shown previously, the list of different machine learning models can be extensive, and understanding how you must tune the settings of each one for the kind of problem that you want to solve is no small task. As we discuss shortly, this is why you can benefit from using premade and pretrained models.

A real world example of a deep learning network  its structure is the product of a lot of work to fine-tune how it operates.
Figure 1-2. A real-world example of a deep learning network; its structure is the product of a lot of work to fine-tune how it operates.

The data you have for training must have not only input values, but also the output answer that you want the model to learn. When you have a fixed set of data to work on, it’s important to separate it into training, validation, and test sets. Only the first one is used as to train the model, feeding all of its data to the learning process. Then you use the validation set, which the model has not seen, to ensure that it can do meaningful predictions. You use the trained model to process the validation data and compare the output that the data must yield with what the model inferred from it. If the model training has gone well, the output should be very accurate.

One of the problems that you must consider when working with machine learning models is overfitting to the training data. You want a model that is versatile and can make good predictions given new data that it hasn’t seen before. If you overtrain your model with data that is very similar, it will lose the capacity to extrapolate to other cases. That is why the verification with separate data becomes important. Lastly, the test set has the real data you need to use to create a prediction. For this set, you don’t know the real outcome until the model tells you what it is.

When models finish training, they are effectively “black boxes.” It can be difficult or virtually impossible for humans to perceive the statistically based logic that a model is using to arrive at its answers. After a model has been trained and you get good output results, you can copy and use it again anywhere you want (provided the kind of data you are using is similar), but you can’t analyze how the algorithm works to try to infer some kind of knowledge. The model will have made a large number of connections within itself using the data provided, and trying to understand it would be as difficult as trying to make the prediction about the data you are analyzing.

So, having lots of varied data is crucial for a well-trained model, and this means that training is going to require a great deal of computing power and is going to be time-consuming. Also, finding the right combination of model type and initial setup for a specific kind of problem can be very tricky, and you should always check the state of the art of the machine learning community for details on how to use the latest models for different types of problems.

An Introduction to Serverless Machine Learning

In this book, we talk about how serverless architectures can support the implementation of machine learning and deep learning workloads in the cloud. By taking advantage of a serverless machine learning service and passing it a set of data, we are able to offload a great deal of the work associated with training and analysis to platforms perfectly suited to the job.

The first benefit of serverless machine learning is that each machine you use is ready to run when needed and can be shut down immediately when it isn’t needed. You can also very easily use models and algorithms that are already written and stored in a library instead of having to write code yourself to deploy on your machines. This saves both money and time.

In fact, using premade models makes more sense given that the number of cloud-based models that are available is very high. This means that there is often no reason to reinvent the wheel in machine learning: there is a constantly growing number of algorithm optimizations for supporting parallel computing, for using less memory, and for starting up, running, and shutting down quickly. In almost every case, if we run an experiment with the serverless model and use prewritten models, we’re going to have a faster execution time. This is because the models are already optimized and fine-tuned with the best training possible for their task by experts, and the computer power a cloud provider can deliver yields greater performance than what we can achieve with our own infrastructure.

After we have decided on our model, another advantage of the serverless approach becomes apparent. If we were hosting a custom model on our own servers, we’d need to worry about scaling the system in a live environment. In a serverless architecture, this scaling is automatic. We will be charged for what we use, and the price is proportional to the workload.

Cloud providers like Amazon Web Services (AWS) and Microsoft Azure often allow serverless users to employ ready-made models including tools like image recognition, speech recognition, and object classification. In this case, you barely need to know anything about machine learning. Instead, you simply implement the model without having to think about how a certain image of a storefront returns metadata like the store’s name and the products in the picture. It just works. Using prepopulated machine learning models is an easy way to begin doing things with machine learning—you need to understand only what you’re trying to infer or detect.

Cloud providers have actually created a number of tools that solve common problems and don’t even advertise themselves as machine learning. For example, Azure offers the Swiftly Speech to Text service that can transcribe input audio of someone speaking into text. It relies on a machine learning model already trained for speech recognition.

Equally importantly, cloud providers offer methods to enforce security and user access as well as billing and cost control. This means that you can chain models in ways that are unique to your use case and ensure a reduction of security failures or billing surprises.

Event-driven architectures are unique in that they run only when called. In a traditional server architecture, code sits idle on an idle machine until it is needed. The result is that you must pay for a server running constantly, even if the code it hosts isn’t being run. In serverless, if a function is never called, there’s no cost to you. This code, which is parceled into event-driven services, is very easy to write, and it’s even easier to glue these functions together to create new functions. For example, you could build a function that takes as an input sound from an emergency phone call for an earthquake watch service, passes the sound to a machine learning model that transcribes it to text, and then forwards the text to an SMS sender service so that first responder personnel are notified. When everything is tied together, until you receive your first call, you pay nothing except for a little server space to store your code.

Using a serverless approach, the code is simple and clear because you don’t need to add parts necessary to manage virtual machines (VMs) and other resource-intensive tasks. All of the difficult “sysadmin” stuff—VM provisioning and autoscaling, stopping inactive VMs, even security and monitoring—is handled by the cloud provider. You also get analytic services that help you to inspect the performance of the machine learning models and any other cloud provider features that you are using. If you are running a critical service, they show you graphs with all the parameters that you want to measure without you having to insert this code into your functions. The analytics are baked in.

Although there is theoretically no limit to the kind of work that you can manage in a serverless application, there are some restrictions that you need to take into consideration. For example, Azure offers a maximum of 10 minutes per function call. If your function runs longer than that, Azure dumps the process. Azure assumes that functions should be very quick and that your function can call another function in series. This architectural philosophy assumes that if your function lasts more than 10 minutes, something is very wrong.

Using Containers with Machine Learning Models

Having pretrained machine learning models is very useful, and a serverless architecture ensures that you can scale effortlessly. But given that this area of technology advances quickly, you might find that you not only need to use custom models but also might need to design your own architecture.

In this case, you can work with containers, which are similar to VMs but a lot “lighter.” Using containers, any parameter that is not defined by you is still managed by the cloud provider. The behavior of containers is predictable, repeatable, and immutable. This means that there are no unexpected errors when you move them to a new machine, or between environments. You can then create a cluster of containers with a configuration suited to your machine learning requirements. Having a cloud provider that provides the means to easily coordinate (or “orchestrate”) these containers and to monitor and scale them in a way that is “as serverless as possible” is a great advantage. Working this way, you can have the best of both worlds: an almost serverless approach and, at the same time, customization of your containers to best suit your needs.

Right now, the state of the art for container orchestration is Kubernetes, an open source solution that some cloud providers offer. It manages for you the resources of different nodes (physical machines). On each one, it deploys one or more kubelets, which are sets of containers that need to work together. It then scales the number of kubelets up or down automatically to satisfy the requirements of the incoming load.

Figure 1-3 depicts an example architecture for managing a containerized application (one that uses containers) with Kubernetes. It is a little complex, but the magic about it is that, again, you don’t need to understand all of the details. You just prepare your containers on your own machine (which we explain in the examples), upload them to the Kubernetes Service cloud provider, and leave the difficult work to them.

A containerized architecture orchestrated using Kubernetes.
Figure 1-3. A containerized architecture orchestrated using Kubernetes

The Benefits of Serverless Computing for Machine Learning

To summarize our argument so far, serverless computing in machine learning has many benefits, including the following:

Infrastructure savings

This is a no-brainer. Without a server, you pay nothing for your own servers. This doesn’t mean Functions-as-a-Service (FaaS) is free. Instead, suppliers can be far more granular with their billing and charge you only for the code that you run. In turn, providers save resources by ensuring functions fire only as needed, thereby reducing operating overhead. Further, this infrastructure is someone else’s problem. Your downtime doesn’t depend on your code.

Code runs independently

Each piece of code in a serverless product is run independently. The spell-checking service doesn’t need to interact with the math service; likewise, the address lookup service doesn’t need to interact with the input services. We like to imagine this environment as a set of firecrackers shooting off in different parts of an empty room. The only entangling aspect is the initial match.

Scalability

As we noted before, this model ensures that VMs are powered down until needed. Because we are using an event-driven paradigm, these events trigger active code only as needed and remove the need for constant pooling—a problem that can grow as the project grows—and interdependencies that can slow down traditional programming projects. Functions can react to anything, be it vast amounts of telemetry, data, logs, sensor data, real-time streaming, and so on, and the system processes each chunk of data in parallel with no interaction with other parts of the machine.

Ease of training

Trigger-based functions are excellent for training machine learning applications. By sending massive amounts of training data to a set of virtualized functions we reduce the need for always-on training and can instead train and consume using virtual functions. In other words, FaaS makes for a solid set of machine learning tools.

In the next chapters, we explore how to build serverless architectures and use premade machine learning models in our projects. We also begin to demonstrate the depth and breadth of these robust models. Get ready to build!

Get Building Intelligent Cloud Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.