Use-Case Setup Overview Steps

Note: The following steps assume OSI PI as the datasource (any data source can be used (ex. SQL DB, etc.) with minor changes to the methods of extracting and deriving data for each platform)

Setting up the EmcienPatterns solution has 5 steps:

Configure

  1. Choose Use-Case with outcome to solve for an asset or device
    1. Locate the historical data which includes the outcome of interest (i.e. “Fail/Not Fail”)
    2. At least a few columns of attributes with at least a few hundred rows will produce reasonable results for most use-cases. In general, more attributes/columns provide more accuracy, with more rows providing more confidence to the solution.
  2. Is the use-case about predicting “WHEN” something will happen?  Example: (“I want to be alerted 10 minutes before the machine will fail” instead of just “Reasons for machine failures”)
    1. NO: Jump to Step #4
    2. YES: Follow Step #3 to Add TTL (Time To Live) metric to target for prediction
  3. Adding the TTL metric to predict “WHEN” something will happen
    1. TTL column is a simple measure that subtracts the time between every 2 rows counting backward from the events of interest (i.e. “Failure”).
    2. This new column TTL (or whatever name you want to use) is the new outcome column for Patterns.
    3. We will create this new derived column during the extract (or it can be created in a virtual data view depending on what type of data store you are using)
    4. Fore more information on this see the TTL time metric concept overview

Extract

  1. Setup extract from PI Server using PI Integrator
    1. Using PI Integrator, setup Extract of all attributes for the Asset including
      • Timestamp
      • Device ID
      • Measures (i.e. voltages, temps, etc.)
      • IF TTL from Step #3, then created computed column for TTL using PI Integrator
    2. Set PI Integrator to Extract File to CSV
  2. Setup Learn and Test data files so we can measure accuracy
    1. Drag the data from PI Integrator onto Bandit Screen
    2. Choose the outcome
      • Did you follow Step #3 above (TTL)?
        • NO: Choose your outcome column for the use-case
        • YES:  Then choose ‘TTL’ column you created as this will target the time ‘WHEN’ something will happen.
    3. In the ‘Override Defaults’ section, set the ‘Test File’ dropdown to 20%.  This will create two files, a Learning File with 80% of the Data and a Test File with 20% randomly sampled out.
    4. IF you came from Step #9 and have over 100 columns:
      • In the ‘Override Defaults’ section under ‘Advanced Defaults’ subsection, set the ‘Capture’ value to 80% and type in the outcome item of most interest (i.e.  “Failure in 10 minutes”)

Analyze

  1. Run the Patterns Analysis Engine
    1. If not already selected from above Bandit step:
      • Choose the File created by PI Integrator with historic data
      • Choose the outcome (see Step #5 choosing method)
    2. Click ‘Analyze’
    3. Review the Rules that are created
      • You should have hundreds or thousands of rules that together represent the model to predict your chosen outcome
      • Rules with numeric values should have ranges for those values created automatically by the bandit function
      • High Frequency Rules should be provide some confidence to the Asset’s subject matter expert that the rules are accurate.  Low frequency rules are considered “weak signal” and are harder to detect by people, but contribute greatly to the model overall.
  2. Run the Prediction Engine to Test the Results (Click ‘Predict’ link from Rules Screen)
    1. Using a Hold-out sample of the Data we will test the accuracy of the results.
    2. Select the other file created by Bandit in Step #5 (Includes the word ‘-test’ in the name before the timestamp)
    3. Review Accuracy & Capture results for outcome value of interest
      • Strong Green Color down the diagonal is ideal, but the most important value is for your key outcome value, the others are not as important (i.e. “Failure in 10 minutes”)
      • The results in the matrix assume all predictions are acted upon, you can improve your accuracy (not capture) by only acting on high confidence predictions, this can reduce False Positives.
    4. Are the Accuracy and Capture Values sufficient for your use-case?
      • YES: Go to Step #10 to Automate process and put into production
      • NO: Continue to Step #8 to add additional derived information to your source data to increase predictive accuracy.

Enhance

  1. Add Additional Derived Data to increase predictive accuracy
    1. “Deriving” data means to compute a new value column from another existing column for the purpose of creating more “repeatable patterns” that the engine can detect.
    2. Here are a few common derived elements that usually increase predictability:
      • Derive “Day of Week” from your timestamp.  
      • Derive “Hour of Day” from your timestamp
      • Derive “Month of Year” from your timestamp
      • Derive “Season” from your timestamp (this can be natural seasons ‘Spring’ & ‘Fall’, or it can be business related “Tax Season”, “Football Season”)
      • Derive “Shift” for plant floor or manufacturing data to see if different teans of people are related to the issue
    3. Is your outcome a TTL type (“WHEN” will the event happen)?
      • NO: Return back to Step #4 to extract your source data with the included derive columns
      • YES: Continue Step #9
  2. Time-Series TTL type outcomes can sometimes be dramatically improved by including derived columns that measure “changes over time” type metrics.  Because Patterns will choose only the columns it needs to use, we can add them without knowing if they will be useful or not.
    NOTE: You only need to consider adding these if you need to increase the predictability of your results from Step #7.  
    1. The 2 most common metrics that can make the biggest improvement to Time Series problems include:
      • Difference - The difference between the values in successive rows (i.e. how much a voltage changed from one measure to the next)
      • Moving Average - Usually just 5, 10 or 30 unit measure that smooths out the values over a time period. Or you can create all 3.
    2. OSI PI Integrator can easily derive these for us when we export our data, other databases can easily compute these as well as part of a database view.
    3. Which Columns to apply this to?
      • Complete Solution - Have OSI PI Integrator Create the derived metrics for ALL of the numeric measures from the asset
      • Minimalist Solution - Only apply this to a limited number of the Numeric columns for example:
        • Top 10 Columns that have volatile values measure to measure
        • Top 10 Numeric Categories that have the most clusters (from Categories screen, sort descending by Num Clusters)
    4. Return to Step #4 to create new data set with derived values

Deploy

  1. Automate Process by scheduling repeating APIs
    1. Either from standard automation platforms or from basic scripts with CRON jobs, you can easily make the previous steps repeat for an automated analysis flow.
    2. The following Steps should be automated:
      • Step #4 - Extracting Historical Data - query should be a moving window of time, that way it is always up to date and drops data off the end.
      • Step #6 - Run the Patterns Engine