Automated Feature Engineering

When working with Time Series Data it is important to leverage the information represented not just be what is in each row or transaction, but rather how that data is changing over time.  Measures such as Moving Averages, Lagging data and Differences are useful in detecting significant indications of upcoming events.

While this type of data is useful, creating it is not easy and worse than that, it's hard to know which measures would benefit from it.  Even more difficult is discovering the amount of time to look across when lagging or averaging across data.  Should use use 24 hour moving average, 7-day moving averages or both?

To make the process of analyzing time series automated, Emcien now includes a new Automated Feature Engineering capability (BETA).  This feature can take a time series dataset and generate new time-based columns based on the original columns that increase the overall predictability of the data.

Using Automated Feature Engineering

Note: Currently this feature is only available with the command line version of Bandit, (the web UI version is expected in late July 2018).

Automated Feature engineering is simply a new step that you run on your historical data prior to doing binning, analysis and predictions.

If you have the latest version of Emcien Bandit (currently v60.0), and run the help command you will see the new switches available:

./bandit -h

Example Command Line Execution:

Here is an example command line execution for a time series data file containing measures for assembly line robot metrics.  The objective of this run is to be able to increase the preditability of when the speed of the robot will drop below expectations.  The time windows for the time to live of speed is to discover at least 3 days in advance of the speed dropping dramatically, so that range is included in the TTL breaks file:

./bandit -e 1 -d "RBOT_ASSMBL_A3_SPEED" -b "user_feng.csv" -t RBOT_ASSMBL_A3_OSI_DATA.csv


debug dep=(RBOT_ASSMBL_A3_SPEED) table=(RBOT_ASSMBL_A3_OSI_DATA.csv)

start Emcien Bandit (BETA version 60) (www.emcien.com)

finish Emcien Bandit

The new switches we will use for this new pre-processing step are:

  • -e 1
    • This turns on Feature Engineering
  • -d "{variable_to_base_TTL_countdown_on}"
    • This is the column in your data that you want to base your TTL countdown column on. (Learn more about TTL and Time Series)
    • It should be 0 for everything is fine and 1 for a failure.  
    • You can derive these values from any other column using any formula you prefer.  For example if you wanted to consider any water pressure below 10 to be a failure, then simply create a new derived column called failure and have it be 0 when pressure column is  greater than 10 and 1 for less than 10.
    • example:     -d "Failure"
  • -b "{your_feature _engineering_TTL_breaks_file.csv"
    • This is a file with the numeric breaks that separate the different amounts in your chosen TTL countdown variable
    • example: -b "user_feature_engineering.csv"
      which could contain something like this:
      • "TTL",user,raw,0.000000,1.000000,15,Failure
        "TTL",user,raw,1.000000,72.000000,262,Less than 3 Days
        "TTL",user,raw,72.000000,336.000000,168,3 to 14 Days
        "TTL",user,raw,336.000000,2000.000000,357,> 14 Days
  • -t "your_original_source_data_file.csv"
    • This is the same tag used to tell bandit what file to act upon

Upon completion of this process, a new data file will be produced with the prefix name of:  "feng_".  This file will contain:

  1. the original source data columns
  2. a new TTL (Time To Live) column that counts down to each "failure" type event
  3. a new TSF (Time Since last Failure)
  4. new Moving Averages of your source data columns (e.g. "__MAV24" is a 24 period moving average)
  5. new Time Since last Filure Difference columns (e.g. "__DTSF" is a Difference column that tracks changes between that named column and the time since last failure.)

This new file is ready to be used like any other data file.  You can proceed to give it to Web Bandit to have it binned for the purposes of analysis.   Learn more information about Bandit for automated binning purposes.