Time Series Data

When Will That Happen? A Novel Approach to Time Series Prediction

Knowing when something will happen – not simply that it will happen – can better enable you to take the right action at the right time in response to a prediction. As a result, you can make a bigger impact to the outcome you want to improve.

That’s why it’s so valuable to analyze time series data and make time series predictions.

Time Series Prediction Methodologies

“Forecasting” is somewhat ill-defined. But it typically refers to a type of time series prediction in which you estimate or project the value of a single variable, or multiple variables, at various points in time in the future. For example, many businesses forecast sales (variable) figures (value) each week for the quarter (time). And, governments forecast the GDP (variable) number (value) for the next five years (time).

But what if you want to know when a specific event or outcome will occur? For example, what if you want to know when each piece of equipment will fail? Or when each network will go down? Or when each employee will churn?

Traditional forecasts cannot answer these granular questions about outcomes.

Current statistical methods are able to predict an outcome with time series using a single variable, but most all business problems are multivariate problems. And, unraveling multivariate prediction problems becomes quite complex using these more traditional, linear methods.

EmcienPatterns delivers multivariate time series predictions about outcomes quickly and easily using its powerful engine and a time series prediction format known as “time to live” or TTL.

TTL is used to communicate how much time is left until each next event occurs, where the event is often a failure or negative outcome of some kind.

An Example of TTL: Machine Downtime

In the case of machine downtime, Emcien uses TTL to identify how much time each machine has until it experiences the next failure event. This is how much time the machine has left “to live.”

What follows is an example use case illustrating how Emcien uses TTL.

Imagine a company has several critical machines – like oil drills – located at several sites across a state. Each machine has 3 different sensors that capture data about the machine’s status and performance.

In reality, sensors capture and transmit data from mission-critical machines like oil drills very frequently – as often as every second or minute – because close monitoring is necessary to prevent and mitigate risks.

But in this simple example, the sensors capture data just once each day and then transmit that data to corporate headquarters for review.

Occasionally a machine will unexpectedly fail, causing significant loss for the company. This failure event is captured in the data that the machine’s sensors collect, shown below:

In this machine’s data set, a row contains all the sensor readings collected and transmitted on a particular day, and whether or not the machine failed. Failure is marked with a “1.”

The company wants to know when each machine is going to fail before it does, so they can attempt to prevent the failure with proactive maintenance.

A Simple Data Prep Step: Adding a TTL Column

Emcien can deliver this multivariate time series prediction easily by adding only a simple data preparation step to the standard analysis and prediction process with EmcienPatterns.

The company must first add a special column to their data set – the historical data set containing past sensor readings, failure events, and timestamps that EmcienPatterns will analyze.

The column can be named anything, but is named Time to Live in the above example. Its purpose is to indicate the time until the next failure event – expressed in days in this case.

In order to achieve this, the column calculates the difference between the timestamp of every row (rows being daily sensor readings and failure events) and the timestamp of the failure event that follows behind it most closely in time.

For example, the row for the daily sensor reading on 10/16/2017 has a “1” in the Time to Live column. This is because the closest failure event after that date occurs on 10/17/2017, and the difference between those two timestamps is exactly 1 day.

The company is able to add this column easily and use a macro to quickly make the calculation for each row.

Now, instead of predicting the failure using the Failure column, EmcienPatterns will predict time until next failure using the Time to Live column. In this way, the company has effectively added a time dimension into what would otherwise be a basic failure prediction.

Boosting Predictive Power with Data Binning

Before EmcienPatterns analyzes the data, it automatically converts values in the Time to Live outcome column into value ranges, or “bins,” shown below:

In this example, “1” day before failure did not change. But “2” days and “3” days before failure were converted to a new “2-3” days before failure range. And, “4,” “5,” “6,” and “7” values were converted to a single combined “4-7” days before failure range.

This conversion of values into certain value ranges – the data binning process – may appear random, but it is not.

Rather, for every unique data set, EmcienPatterns automatically uncovers the value ranges that will boost that data set’s predictive power, heightening prediction accuracy.

It then bins the outcomes accordingly. The “1” day before failure value did not change because EmcienPatterns determined that the “1” day value was already optimally predictive, and binning it into a range with other values would dilute its predictive power.

The “2” and “3” day values were combined into a “2-3” day range because together, prediction accuracy would be improved.

Grouping & Analyzing Data

When EmcienPatterns analyzes the data, it groups together all the rows associated with each non-zero value or value range in the Time to Live column, shown below:

The two rows associated with “1” day before a failure event are grouped together. The four rows with “2-3” days before a failure event are grouped together, and the six rows associated with “4-7” days before failure are grouped together.

The sensor readings on days the machine failed are outlier data points that typically cannot produce reliable insight about the machine’s failure. However, the sensor readings taken prior to the failure event are helpful and predictive.

Therefore, EmcienPatterns learns the predictive patterns in each group prior to failure and uses the patterns to generate a model that predicts machine failure.

This data analysis process is performed on the master data set that combines data from all of the company’s machines, not simply on the small data set for a single machine.

Result: Predicting When Machines Will Fail

When Emcien accesses new data from the machines’ sensors, it compares all the data to the predictive model. It then identifies which machines are displaying patterns associated with a particular time window to failure (like 1 day before failure), and expresses any matches as predictions – each delivered with a likelihood number.

For example, EmcienPatterns may predict that machine #1 in Laird has an 87% likelihood of failure in 2-3 days.

EmcienPatterns also provides remedies with every prediction so the company knows what parts of each machine to address, and how, so that proactive maintenance efforts to prevent failure are most effective.

And, Emcien sends predictions and remedies to the company’s enterprise applications so they have this critical information when and where they need it in order to act quickly.