Predicting Machine Failures

Problem Definition

Solving the predictive maintenance problems in most organizations comprises of two parts. First the maintenance team needs a list of machines that are expected to fail and second the root cause specific to that machine & failure must be identified. Without both pieces, an organization cannot effectively increase operational performance by staying ahead of mis-configurations, failures, normal wear, or defective components.

When Machines Fail

To solve the first half of the problem, technicians need the proper window of time to assess and address any machines failure. For example, if a prediction only provides a ten minute window before a machine will fail, but it takes an average of thirty minutes for a technician to get to the machine, then the solution is useless. Conversely, if a prediction says a machine will fail in the next six months, then an organization cannot adequately prioritize which machines to address because that timeframe is too broad. Emcien uniquely addresses both these problems and allows for automatic detection of failure windows which provides the flexibility and power necessary to address failure in the real world.  

Root Cause of Failure

What good is it to tell a technician that a machine will fail without identifying why?  Root cause diagnosis is very time consuming. If the root cause is not discovered and addressed, the organization may lose money on both a wasted maintenance cycle and a reoccurring failures. Emcien solves this problem by isolating the root cause - hence moving predictive  to prescriptive maintenance. Emcien’s prediction provides "remedies" that identify the specific cause of failure, per machine, allowing an organization to send the right people and right parts to the right machine at the right time - before the equipment fails.

Finally, the deployment of these machines can be geographically global. Emcien is designed to operate embedded in remote, low power, edge devices so that detecting machine failure doesn't require transmitting the data back to a central server for analysis and decision making. 

Scope of this Article

In this article we will cover all of the unique Emcien capabilities by walking through the NASA Tuborfan public dataset. This source data is provided by NASA and is publicly available for benchmarking.  This article also compares the Emcien approach and results with Microsoft’s Azure AI.  Because of its automated analysis engine, Emcien eliminates the need for coding and data science and provides predictions with supporting root causes per machine - all with fewer manual steps and faster time to results. 



Step 1: Data Preparation

Sample of Training Data from NASASample of Training Data from NASA

The source data is provided by NASA. The data is simulated turbofans in which many real-life variables are included like initial wear and system noise. NASA has provided a training and test file for the benchmark.

For your convenience, Emcien has prepared the NASA files with the derived columns. Please download here

Preparing the Test File

Preparing the test file consists of the following steps:

  1. Adding Moving Averages
  2. Adding Standard Deviations
  3. Adding Time-to-Live (TTL) measure
  4. Automated Model Generation

Moving Averages & Standard Deviation

In this data, we will add a moving average and standard deviation to each sensor. In this example, sensor data is purely numeric. Use your favorite tool to add these two metrics to each for each column.

A note about Emcien’s graceful handling of dirty-data; most analysis tools cannot ingest dirty data and require data cleansing. Emcien is designed to ingest dirty data, missing values, empty cells,etc. with no cleansing required. In addition Emcien is very flexible and can ingest numerical and categorical data, or any blend thereof. 

A little Background on Time Series and TTL Measures

Typically machine failure problems are solved with two approaches. The most common approach uses the concept of a Time Series "Time To Live" (TTL) measure.   With this method Emcien automatically detects the best ranges of time and presents accurate predictions of when machines will fail.

The second methods relies on a subject matter expert who specifies the timeframe required to respond and remedy the situation. For example, you may know that in your business you need ample time to react to a failure prediction. You may need to ‘roll trucks’ or physically access the device. In these scenarios Emcien allows you to specify the time frames, used by predictions, that are the most valuable to your business.

Bottom line, Emcien offers the automated TTL method, in which the software selects the best time ranges based on the data, or allows the user to specify time ranges. 

Automated Feature Selection 

Traditionally feature selection is a very labor intensive, iterative process, prone to human bias and limited by dirty data. Several products are created to try to simplify feature selection with more user friendly desktop applications, but these approaches do not address the core problem--which features to use to get the best results.

Emcien, using algorithms rooted in information theory, detects which features are the most valuable to achieving your specific agenda. Simply drag-n-drop your training data onto Emcien and select the outcome you want to predict.

In the NASA example, please first download the NASA training data which moving averages, standard deviations, and a TTL measure have been added. 

Next go to the Auto-binning component within Emcien Patterns (Bandit) and specify the following options:

  1. Drag the training file onto the drop target
  2. Set “Name of Outcome” to “label 2”
  3. Set “Capture” to “80%”
  4. Enter “2” in the Item fields

At this point a file will be generated for you with the best features selected and targeted to predicting machine failures.

Note: “label 2” and the number of cycles are driven by the NASA Turbofan problem statement. Your use case and data will have different labels and failure time frames.



Step 2: Emcien's Software Builds the Model

A core capability of Emcien is that predictive rules are automatically generated by extracting strong and mixed signal patterns in your data. With a targeted set of rules, Emcien is capable of predicting future machine failures. These rules do not require any human review, validation, or curation because Emcien’s automated analysis removed any need for human intervention.

This article describes Emcien's functionality with the web user interface. To operationalize. the process of predicting machine failure would be driven by an automation factory using the Emcien APIs

To build rules follow the below steps:

  1. Go to the "New Analysis" page
  2. Select your Training File
  3. Select your Outcome Category
  4. Click the Green 'Analyze' Button

Optional: Viewing Rules

The rules are human-readable. If you are curious, the prediction rules are available for you to view. However, Emcien requires no curation or validation of the rules.  



Step 3: Making Predictions

Emcien is architected as a two-tier system. The "Analysis Engine” uses historic data to generate a targeted set of highly predictive rules. The “Prediction Engine” uses these rules to make predictions.  The Prediction Engine can be run in three different modes:

  1. Batch Predictions - Up to a million predictions per request
  2. Real-time Predictions - HTTP API handling up to 10,000 predictions per second
  3. Embedded Predictions - Emcien is architected to be run embedded and can be deployed at the edge

In this walkthrough, we will rely on the batch predictions mode. 

To start, simply:

  1. Go to the New Predictions Page
  2. Select your test file
  3. Click the "Predict" button

At this point Emcien will import the test data, the prediction engine will apply the rules, and present to you the results in a web UI or JSON API.

When you see a green button that says "View Predictions", the predictions are done. You have successfully taken data from the NASA Turbofans challenge, prepared it in a few easy steps, built Rules and Predictions by simply pressing the big green button.




The Results

In this section we will review the results from Emcien, take a look at the prescriptive components of the predictions, and compare the results to Microsoft Azure AI. 

Predicting When a Machine Will Fail

The first part of our objective is to identify which machines will fail in the time windows we specified. According to the NASA challenge the machines TTL values are labeled (“label 2”) as:

  • 2 - This is 15 cycles or less
  • 1 - Between 16 and 30 cycles
  • 0 - Greater than 30 cycles

NASA does not define what a cycle is, other than a consistent unit of time. In the real world, these units can be seconds, minutes, hours, or days.

NASA does specify that class ‘2’ is the most important in their use case therefore our focus is on it.

By viewing the Emcien confusion matrix, you will see that Emcien successfully predicted 75% of Class 2 with a capture rate of 90%. This is an excellent result and the level of effort to attain these results was three push-button steps.

Supporting Reasons - Why a Machine Will Fail

Understanding why machines will fail can be viewed at two levels - machine-specific or systemic issues across many machines. 

For each prediction at a machine level, the reasons are presented in the Emcien user interface or HTTP API. You can inspect the NASA data, but they label each sensor as “sensor N”. To help illustrate this feature we display root cause reasons from a different dataset--server failures. To see more about the use case related to the screenshots, view this video here.

Below we see another example of prediction with supporting reasons and the machine is powering off because it is getting too hot. The technician can see that the average up time and voltages are responsible for causing the high temperatures. Once on site, the technician can replace the power supply or move the server to a cooler part of the facility.

At a systemic level, we see the general causes of machine failure across all machines.  This can be used to answer the question - "Why are the machines failing?". This is useful when creating policy or changing processes. An organizations can roll out adjustments to their overall operations based on the systemic causes of failure faster that incorporating predictions at the individual machine level in a workflow. Both views provide a depth of value to directly improve operations.  

Comparing Emcien and Microsoft Azure AI Results

When starting this comparison it should not be overlooked that:

  • The Emcien process is a series of clicks versus coding and data science required for the Azure process 
  • Emcien automates feature selection eliminating time and effort for manual feature selection. 
  • Emcien provides predictions with root causes and systemic drivers. The Azure product does not according to their NASA Turbofan writeup
  • Emcien can be deployed in real-time or quickly embedded into existing business applications. Azure is a stand-alone data science tool.

Since Azure can only address one part of what is required to solve the machine failure problem, we will focus our comparison on that part - predicting when machines will fail.

Comparison of Capture Rates

Below we see the capture rates for each predicted class for Emcien and Azure. Emcien captures more failed machines for Class 1 which is useful in the field. Emcien is better at both capture rate and overall prediction accuracy.

Note: The Microsoft article also provides results obtained using neural network. While the accuracy achieved with Emcien is comparable we did not include it in this benchmark report because the number of nodes, layers, parameter tuning, etc was not explained in the Azure paper.  Neural nets are a black box and cannot provide supporting reasons for predictions.  They also typically have a very high data science overhead -  therefore not equivalent to the simplified, automated Emcien process.



Conclusion

Prescriptive predictions provide an organization the two critical pieces to improving overall operations--when machines will fail and why. Emcien is uniquely positioned to deliver on both of these dimensions plus provides excellent accuracy and capture rates.

While a big team of data scientists with a cluster of high end servers and Hadoop clusters may provide competitive models given enough time, Emcien provides a highly accurate, automated solution out of the box.  The winning proposition is that Emcien can operate completely embedded inside an existing business process or application - helping make every work flow predictive. This means that every organization can benefit from the value of real-time predictions without the overhead of data science. 

For more information or questions please contact support@emcien.com