Chapter 1. Data Acquisition and Machine-Learning Models
Editor’s Note: At Strata + Hadoop World in Singapore, in December 2015, Danielle Dean (Senior Data Scientist Lead at Microsoft) presented a talk focused on the landscape and challenges of predictive maintenance applications. In her talk, she concentrated on the importance of data acquisition in creating effective predictive maintenance applications. She also discussed how to formulate a predictive maintenance problem into three different machine-learning models.
Modeling Machine Failure
The term predictive maintenance has been around for a long time and could mean many different things. You could think of predictive maintenance as predicting when you need an oil change in your car, for example—this is a case where you go every six months, or every certain amount of miles before taking your car in for maintenance.
But that is not very predictive, as you’re only using two variables: how much time has elapsed, or how much mileage you’ve accumulated. With the IoT and streaming data, and with all of the new data we have available, we have a lot more information we can leverage to make better decisions, and many more variables to consider when predicting maintenance. We also have many more opportunities in terms of what you can actually predict. For example, with all the data available today, you can predict not just when you need an oil change, but when your brakes or transmission will fail.
Root Cause Analysis
We can even go beyond just predicting when something will fail, to also predicting why it will fail. So predictive maintenance includes root cause analysis.
In aerospace, for example, airline companies as well as airline engine manufacturers can predict the likelihood of flight delay due to mechanical issues. This is something everyone can relate to: sitting in an airport because of mechanical problems is a very frustrating experience for customers—and is easily avoided with the IoT.
You can do this on the component level, too—asking, for example, when a particular aircraft component is likely to fail next.
Application Across Industries
Predictive maintenance has applications throughout a number of industries. In the utility industry, when is my solar panel or wind turbine going to fail? How about the circuit breakers in my network? And, of course, all the machines in consumers’ daily lives. Is my local ATM going to dispense the next five bills correctly, or is it going to malfunction? What maintenance tasks should I perform on my elevator? And when the elevator breaks, what should I do to fix it?
Manufacturing is another obvious use case. It has a huge need for predictive maintenance. For example, doing predictive maintenance at the component level to ensure that it passes all the safety checks is essential. You don’t want to assemble a product only to find out at the very end that something down the line went wrong. If you can be predictive and rework things as they come along, that would be really helpful.
A Demonstration: Microsoft Cortana Analytics Suite
We used the Cortana Analytics Suite to solve a real-world predictive maintenance problem. It helps you go from data, to intelligence, to actually acting upon it.
The Power BI dashboard, for example, is a visualization tool that enables you to see your data. For example, you could look at a scenario to predict which aircraft engines are likely to fail soon. The dashboard might show information of interest to a flight controller, such as how many flights are arriving during a certain period, how many aircrafts are sending data, and the average sensor values coming in.
The dashboard may also contain insights that can help you answer questions like “Can we predict the remaining useful life of the different aircraft engines?”or “How many more flights will the engines be able to withstand before they start failing?” These types of questions are where the machine learning comes in.
Data Needed to Model Machine Failure
In our flight example, how does all of that data come together to make a visually attractive dashboard?
Let’s imagine a guy named Kyle. He maintains a team that manages aircrafts. He wants to make sure that all these aircrafts are running properly, to eliminate flight delays due to mechanical issues.
Unfortunately, airplane engines often show signs of wear, and they all need to be proactively maintained. What’s the best way to optimize Kyle’s resources? He wants to maintain engines before they start failing. But at the same time, he doesn’t want to maintain things if he doesn’t have to.
So he does three different things:
He looks over the historical information: how long did engines run in the past?
He looks at the present information: which engines are showing signs of failure today?
He looks to the future: he wants to use analytics and machine learning to say which engines are likely to fail.
Training a Machine-Learning Model
We took publicly available data that NASA publishes on engine run-to-failure data from aircraft, and we trained a machine-learning model. Using the dataset, we built a model that looks at the relationship between all of the sensor values, and whether an engine is going to fail. We built that machine-learning model, and then we used Azure ML Studio to turn it into an API. As a standard web service, we can then integrate it into a production system that calls out on a regular schedule to get new predictions every 15 minutes, and we can put that data back into the visualization.
To simulate what would happen in the real world, we take the NASA data, and use a data generator that sends the data in real time, to the cloud. This means that every second, new data is coming in from the aircrafts, and all of the different sensor values, as the aircrafts are running. We now need to process that data, but we don’t want to use every single little sensor value that comes in every second, or even subsecond. In this case, we don’t need that level of information to get good insights. What we need to do is create some aggregations on the data, and then use the aggregations to call out to the machine-learning model.
To do that, let’s look at numbers like the average sensor values, or the rolling standard deviation; we want to then predict how many cycles are left. We ingest that data through Azure Event Hub and use Azure Stream Analytics, which lets you do simple SQL queries on that real-time data. You can then do things like select the average over the last two seconds, and output that to Power BI. We then do some SQL-like real-time queries in order to get insights, and show that right to Power BI.
We then take the aggregated data and execute a second batch, which uses Azure Data Factory to create a pipeline of services. In this example, we’re scheduling an aggregation of the data to a flight level, calling out to the machine-learning API, and putting the results back in SQL database so we can visualize them. So we have information about the aircrafts and the flights, and then we have lots of different sensor information about it, and this training data is actually run-to-failure data, meaning we have data points until the engine actually fails.
Getting Started with Predictive Maintenance
You might be thinking, “This sounds great, but how do I know if I’m ready to do machine learning?” Here are five things to consider before you begin doing predictive maintenance:
- What kind of data do you need?
First, you must have a very “sharp” question. You might say, “We have a lot of data. Can we just feed the data in and get insights out?” And while you can do lots of cool things with visualization tools and dashboards, to really build a useful and impactful machine-learning model, you must have that question first. You need to ask something specific like: “I want to know whether this component will fail in the next X days.”
- You must have data that measures what you care about
This sounds obvious, but at the same time, this is often not the case. If you want to predict things such as failure at the component level, then you have to have component-level information. If you want to predict a door failure within a car, you need door-level sensors. It’s essential to measure the data that you care about.
- You must have accurate data
It’s very common in predictive maintenance that you want to predict a failure occurring, but what you’re actually predicting in your data is not a real failure. For example, predicting fault. If you have faults in your dataset, those might sometimes be failures, but sometimes not. So you have to think carefully about what you’re modeling, and make sure that that is what you want to model. Sometimes modeling a proxy of failure works. But if sometimes the faults are failures, and sometimes they aren’t, then you have to think carefully about that.
- You must have connected data
If you have lots of usage information—say maintenance logs—but you don’t have identifiers that can connect those different datasets together, that’s not nearly as useful.
- You must have enough data
In predictive maintenance in particular, if you’re modeling machine failure, you must have enough examples of those machines failing, to be able to do this. Common sense will tell you that if you only have a couple of examples of things failing, you’re not going to learn very well; having enough raw examples is essential.
Feature Engineering Is Key
Feature engineering is where you create extra features that you can bring into a model. In our example using NASA data, we don’t want to just use that raw information, or aggregated information—we actually want to create extra features, such as change from the initial value, velocity of change, and frequency count. We do this because we don’t want to know simply what the sensor values are at a certain point in time—we want to look back in the past, and look at features. In this case, any kinds of features that can capture degradation over time are very important to include in the model.
Three Different Modeling Techniques
You’ve got a number of modeling techniques you can choose from. Here are three we recommend:
- Binary classification
Use binary classification if you want to do things like predict whether a failure will occur in a certain period of time. For example, will a failure occur in the next 30 days or not?
- Multi-class classification
This is for when you want to predict buckets. So you’re asking if an engine will fail in the next 30 days, next 15 days, and so forth.
- Anomaly detection
This can be useful if you actually don’t have failures. You can do things like smart thresholding. For example, say that a door’s closing time goes above a certain threshold. You want an alert to tell you that something’s changed, and you also want the model to learn what the new threshold is for that indicator.
These are relatively simplistic, but effective techniques.
Start Collecting the Right Data
A lot of IoT data is not used currently. The data that is used is mostly for anomaly detection and control, not prediction, which is what can provide us with the greatest value. So it’s important to think about what you will want to do in the future. It’s important to collect good quality data over a long enough period of time to enable your predictive analytics in the future. The analytics that you’re going to be doing in two or five years is going to be using today’s data.