2019-01-14

November 2017

Volume 32 Number 11

[Machine Learning]

Azure Machine Learning Time Series Analysis for Anomaly Detection

Anomaly Detection is one of the most important features of Internet of Things (IoT) solutions that collect and analyze tem-poral changes of data from various sensors. In many scenarios, sensor data doesn’t change significantly over time. However, when it does, it usually means that your system has encountered an anomaly—and this anomaly can lead to a specific malfunction. In this article I’ll show you how to use Azure Machine Learning Time Series Anomaly Detection to identify anomalous sensor readings. To this end I’ll extend the RemoteCamera Universal Windows Platform (UWP) app I developed in my previous article (msdn.com/magazine/mt809116) by adding a list that displays anomalous values (see Figure 1). The RemoteCamera app acquires images from the webcam and calculates their brightness, which fluctuates around some specific value unless the camera image changes significantly. Because you can easily induce serious brightness changes (by covering the camera, for example), leading to irregularities, this app provides good input for time-series anomaly detection.

Detecting Anomalous Brightness Values with Azure Machine Learning

Figure 1 Detecting Anomalous Brightness Values with Azure Machine Learning

Anomaly Detection

As explained in a recent article by James McCaffrey (msdn.com/magazine/mt826350), one common way of detecting abnormalities is through time-series regression. By fitting a model to your data, you can predict trends and then see if all sequence values follow them by calculating the difference between actual and predicted values. A large divergence from expected values indicates outliers or anomalous values. Here, I’ll first demonstrate how to detect such outliers by analyzing the so-called z-scores. The larger the z-score, the higher the probability that the actual value is an outlier. So, to find abnormalities, you specify the range of z-scores, which are treated as “normal.” All z-scores outside that range indicate abnormalities. However, this approach uses a fixed threshold, thus it might lead to a large number of false positives. To solve such an issue, more-complex algorithms are employed. Specifically, the Azure Time Series Anomaly Detection module is based on exchangeability martingales (bit.ly/2wjBYUU), which analyze if a sequence of values can be arbitrarily reordered without changing the probability of finding a given value in that sequence (or, in other words, that each value is equally likely to be found in a dataset). This exchangeability property of the dataset leads to small anomaly scores. When exchangeability is broken, large anomaly scores will be generated, indicating abnormal values.

In this article I’ll demonstrate how to create such machine learning (ML) algorithms. I’ll use the Microsoft Azure Machine Learning Studio (studio.azureml.net), which was also described by McCaffrey in the September 2014 issue (msdn.com/magazine/dn781358). Here, I’ll go beyond that article and, as well as creating ML experiments, I’ll show how to deploy the resulting solution as a Web service, then how to use such a service in the RemoteCamera app.

Training the Data Set

The first step is to extend the RemoteCamera app by adding another tab, which lets you acquire the training dataset and enable or disable anomaly detection using a checkbox (Figure 2).

Anomaly Detection Tab of the RemoteCamera App

Figure 2 Anomaly Detection Tab of the RemoteCamera App

The button, Acquire training dataset, becomes enabled after you start the camera preview (using controls from the first tab). When you tap this button, an app starts acquiring the training dataset. This works in the background and is indicated with a progress ring. The resulting training dataset comprises 100 data points, each of which is represented by an instance of the BrightnessDataPoint structure:

public struct BrightnessDataPoint
{
  public DateTime Time { get; private set; }
  public byte Brightness { get; private set; }

  public BrightnessDataPoint(byte brightness)
  {
    Time = DateTime.Now;
    Brightness = brightness;
  }
}

The BrightnessDataPoint struct stores a brightness value along with the time the brightness was determined. A collection of such values is then exported to the BrightnessData.csv file, which looks like the following:

Time,Brightness
9/8/2017 11:30:00,103
9/8/2017 11:30:01,103
9/8/2017 11:30:02,102
9/8/2017 11:30:03,42
9/8/2017 11:30:04,46
9/8/2017 11:30:05,149
9/8/2017 11:30:06,99
9/8/2017 11:30:07,101
9/8/2017 11:30:08,104

The particular location of the training dataset is then displayed in the textbox. I use a comma-separated (CSV) file so it can be easily uploaded to the Machine Learning Studio.

To implement this functionality, I wrote two classes: BrightnessFileStorage and AnomalyDetector. The first class, Brightness-FileStorage, is defined in the BrightnessFileStorage.cs file in the AnomalyDetection subfolder of the companion code. Brightness-FileStorage saves a collection of BrightnessDataPoint objects to the CSV file using the DataWriter class (bit.ly/2wS31dq).

The second class, AnomalyDetector, handles the logic related to anomaly detection. In particular, it has a public method, Add-TrainingValue, shown in Figure 3, which is invoked right after the image brightness is calculated (see the ImageProcessor_ProcessingDone event handler in the MainPage.xaml.cs in the accompanying code). AddTrainingValue proceeds as follows: First, I create an instance of the BrightnessDataPoint, which is then added to the collection. When this collection has 100 items, I save it to the CSV file. I then fire the TrainingDataReady event, which is handled in MainPage to break training-dataset acquisition and display the file location in the UI:

private async void AnomalyDetector_TrainingDataReady(
  object sender, TrainingDataReadyEventArgs e)
{
  await ThreadHelper.InvokeOnMainThread(() =>
  {
    remoteCameraViewModel.IsTrainingActive = false;
    remoteCameraViewModel.TrainingDataSetFilePath = e.FilePath;
  });
}

Figure 3 Acquiring Training Dataset

private const int trainingDataSetLength = 100;
private List<BrightnessDataPoint> trainingDataSet = 
  new List<BrightnessDataPoint>();

public event EventHandler<TrainingDataReadyEventArgs> TrainingDataReady;

public async Task AddTrainingValue(byte brightness)
{
  trainingDataSet.Add(new BrightnessDataPoint(brightness));

  // Check if all data points were acquired
  if (trainingDataSet.Count == trainingDataSetLength)
  {
    // If so, save them to csv file
    var brightnessFileStorage = 
      await BrightnessFileStorage.CreateAsync();
    await brightnessFileStorage.WriteData(trainingDataSet);

    // ... and inform listeners that the training data set is ready
    TrainingDataReady?.Invoke(this,
      new TrainingDataReadyEventArgs(
      brightnessFileStorage.FilePath));
  }
}

The location of the training dataset is displayed in the textbox, so you can easily copy it and paste it in Windows Explorer to see the resulting data.

The z-Score Analysis

With the training dataset ready, I prepare the first experiment in Machine Learning Studio, following the instructions in McCaffrey’s 2014 article. I first upload the BrightnessData.csv file, and then design the experiment using the visual designer, as shown in Figure 4. Briefly, all the components are in the menu, located on the left-hand side of the Machine Learning Studio. To place an element in your experiment, you simply drag it on the experiment pane (the center part of the Machine Learning Studio). Each component has specific inputs and outputs. You connect compatible nodes to control the data flow between modules. Components can have additional configuration, which you set using the properties window (it appears on the right of the Machine Learning Studio).

Anomaly Detection Using z-Score Analysis

Figure 4 Anomaly Detection Using z-Score Analysis

The ML algorithm depicted in Figure 4 works in two modes: experiment and Web service. They differ only in the input. In experi-ment mode, an input is composed of the uploaded training dataset (BrightnessData), which is replaced in the Web service mode by the Web service input. Independent of the mode, the input is converted to a dataset, then the values from the brightness column are normalized using the z-score transformation (bit.ly/2eWwHAa). The transformation converts brightness values to z-scores, which tell you how far the current value is from the mean. This distance is measured in standard deviations. The larger a distance, the higher the probability that the current value is an outlier. I apply the z-score normalization because, in general, the base or normal brightness level varies depending on what the camera sees. Thus, the z-score transformation ensures the correct brightness level, after normaliza-tion is close to 0. The raw brightness values vary from approximately 40 to 150. After normalization, all brightness values will fall between approximately -4.0 and +4.0, as shown in Figure 5. Consequently, to find anomalous values all I need to do is apply the threshold filter. Here, I use the Azure Machine Learning Threshold Filter of type OutOfRange with lower and upper boundaries set to -2 and 1.5. I choose these values based on the z-scores plot in Figure 5 and set them using the properties pad of the Threshold Filter in Machine Learning Studio.

The Training Dataset After Normalization

Figure 5 The Training Dataset After Normalization

After thresholding, the dataset contains one Boolean column, specifying whether a given time point is outside the specified range. To supplement this information with actual brightness values that are identified as outliers, I combine this column with the original dataset and then split the resulting dataset into two subsets: one containing anomalous values only and the other with normal values (see the bottom part of Figure 4). I change the column datatype before splitting because the Split Data module doesn’t accept Boolean values. Then, the first subset is returned by the experiment. In the Web service view, this result is transferred to the client. Note that to see values from any dataset you use the Results dataset | Visualize option from the dataset context menu in Machine Learning Studio. This option works provided you’ve previously run the experiment. Figure 6 depicts an example of such visualization of the last dataset from the experiment shown in Figure 4.

Anomalous Values Detected with z-Score Analysis

Figure 6 Anomalous Values Detected with z-Score Analysis

Machine Learning Time-Series Analysis

Let’s now see how to use the Azure Time Series Anomaly Detection (ATSAD) module to identify outliers. As shown in Figure 7, the flow of the experiment is quite similar to the previous one. The initial dataset is normalized with z-score transformation and trans-ferred to the ATSAD module (you can find it under the Time Series node of Machine Learning Studio). This requires you to provide several inputs, which you configure with the properties window (bit.ly/2xUGTg2). First, you specify the data and time columns, then you configure martingale type. Here, I use the Power martingale. This activates another textbox, Epsilon, in which you can type any value from 0 to 1 to specify the sensitivity of the detector. Then, you choose a strangeness function, using one of three options:

RangePercentile: Use this option to identify values that are clearly outside the range, like spikes or dips. I use this option in my experiment so it will work analogously to the previous experiment, but with a more comprehensive analysis.
SlowPos- and SlowNegTrend: Use these options to identify positive and negative trend changes in your dataset. This is useful when your solution looks for increases or decreases in observed values.

An Experiment Utilizing the Azure Time Series Anomaly Detection Module

Figure 7 An Experiment Utilizing the Azure Time Series Anomaly Detection Module

Next, you specify the length of the martingale and strangeness values history. You can choose any integer between 10 and 1000. After a bit of trial-and-error experimentation, I settled on the following parameters for my detector:

Epsilon = 0.4
Length of martingale and strangeness values history = 50

The last parameter of the detector is an alert threshold, which specifies the minimum value of the anomaly score that marks the given value as an outlier. By default, the alert threshold is set to 3.5. For my experiment, I changed this to 2.

If you visualize the output of the ATSAD module, you’ll see that it supplements the input dataset with two columns: the anomaly score, which measures abnormalities, and the alert indicator, which contains a binary value (0 or 1) indicating if a value is anomalous. I use the latter to split the dataset into two subsets: normal and abnormal. Only the abnormal subset is returned by the experiment. The other elements of the experiment are the same as before so I won’t discuss them again. I’ll only note that a very important aspect of the experiment required for a Web service is the name of the input and output. I set these values to Data (Web service input) and AnomalyDetectionResult (Web service output).

Web Service Client

With the experiments set up I can now publish them as Web services so they can be accessed by the RemoteCamera app to identify any image brightness abnormalities. To set up a Web service you need to run the experiment and then press the Deploy Web Service icon on the bottom pane of Machine Learning Studio (see the highlighted item in Figure 8). If you don’t add Web service input and output modules to the experiment, this pane will show Set Up Web Service. If you click it, Web service input and output modules will be added to the experiment, and the button label will change to Deploy Web Service.

The Azure Machine Learning Studio Action Pane

Figure 8 The Azure Machine Learning Studio Action Pane

When the Web service deployment completes, you’ll be redirected to the Web service dashboard, shown in Figure 9. The dash-board presents a summary of the published Web service, along with the API key and instructions on how to send requests and handle responses (API Help Page). Specifically, after clicking the request/response hyperlink, a Web service URL and a detailed request and response structure in JSON format will be presented.

The Web Service Dashboard

Figure 9 The Web Service Dashboard

To proceed further, I store the API key and create JSON-to-C# mappings using the JSONUtils service (jsonutils.com). I then save the resulting classes in corresponding files, AnomalyDetectionRequest.cs and AnomalyDetectionResponse.cs, in the AnomalyDetection subfolder. Their structure is similar—both files contain classes, which as usual are composed mostly of the auto-implemented properties. AnomalyDetectionRequest and AnomalyDetectionResponse represent corresponding JSON objects being transmitted between a client and a Web service. As an example, the definition of the AnomalyDetectionRequest class and dependent objects is given in Figure 10. Note that to convert a collection of brightness data points to an input accepted by the Machine Learning Studio Web service (two dimensional array of strings), I use a helper class, ConversionHelper. The latter, which full definition is in the companion code, has two public methods. They either convert the collection of brightness data points to string[,] (BrightnessDataToStringTable) or vice versa (AnomalyDetectionResponseToBrignthessData).

Figure 10 A Definition of the AnomalyDetectionRequest Class and Dependent Objects

public class AnomalyDetectionRequest
{
  public Inputs Inputs { get; set; }
  public GlobalParameters GlobalParameters { get; set; }

  public AnomalyDetectionRequest(
    IList<BrightnessDataPoint> brightnessData)
  {
    Inputs = new Inputs()
    {
      Data = new Data()
      {
        ColumnNames = new string[]
        {
          "Time",
          "Brightness"
        },

          Values = ConversionHelper.
            BrightnessDataToStringTable(brightnessData)
      }
    };
  }
}

public class Inputs
{ 
  public Data Data { get; set; }
}

public class Data
{
  public string[] ColumnNames { get; set; }
  public string[,] Values { get; set; }
}

public class GlobalParameters { }

Once the JSON-to-C# object mapping is established, I can write the actual Web service client. To that end I first install the Microsoft.AspNet.WebApi.Client NuGet package and then use it to define the AnomalyDetectionClient class (see the corresponding file in the companion code). This class has three private fields: baseAddress, apiKey and httpClient. The first field stores the URL of the Machine Learning Studio Web service, while the second contains the API key. These two values are used to instantiate the HttpClient class (from the installed NuGet package):

public AnomalyDetectionClient()
{
  httpClient = new HttpClient()
  {
    BaseAddress = new Uri(baseAddress),
  };

  httpClient.DefaultRequestHeaders.Authorization = 
    new AuthenticationHeaderValue("Bearer", apiKey);
}

After creating the client, I can start sending requests to the Machine Learning Studio Web service with the AnomalyDetectionClient.DetectAnomalyAsync method from Figure 11. This method accepts a collection of brightness data points, representing test data. This test data replaces the CSV file I used previously for experimenting and is used to instantiate AnomalyDetectionRequest. An instance of this class is later posted to the Web service for analysis with the PostAsJsonAsync extension method. The resulting JSON response is converted to the AnomalyDetectionResponse class instance, which is finally returned by the DetectAnomalyAsync function. I also look for any errors and eventually throw an exception if appropriate.

Figure 11 Sending Requests to the Azure Machine Learning Studio Web Service

public async Task<IList<BrightnessDataPoint>> 
  DetectAnomalyAsync(IList<BrightnessDataPoint> brightnessData)
{
  var request = new AnomalyDetectionRequest(brightnessData);

  var response = await httpClient.PostAsJsonAsync(string.Empty, request);

  IList<BrightnessDataPoint> result; 

  if (response.IsSuccessStatusCode)
  {
    var anomalyDetectionResponse = await 
      response.Content.ReadAsAsync<AnomalyDetectionResponse>();

    result = ConversionHelper.
      AnomalyDetectionResponseToBrightnessData(anomalyDetectionResponse);
  }
  else
  {
    throw new Exception(response.ReasonPhrase);
  }

  return result;
}

The AnomalyDetectionClient is utilized in the AddTestValue method of the AnomalyDetector class (Figure 12). Like AddTrain-ingValue, AddTestValue is also invoked in the ImageProcessor_ProcessingDone event handler (see MainPage.xaml.cs in the compan-ion code). However, AddTestValue proceeds in a slightly different manner than the AddTrainingValue method. In AddTestValue I add brightness data points to an instance of the BrightnessDataset class, which internally uses the generic List class to implement a rolling window. This window, like the one in James McCaffrey’s October article, is used to store test values. By default, the size of the rolling window is set to 30 elements, but you can control this value using a constructor of the BrightnessDataset. As shown in Figure 12, I don’t send data for analysis until the window is full, then I check whether the collection of anomalous values returned by the Web service contains any elements. If so, I invoke the AnomalyDetected event, which is also used to pass abnormalities to listeners.

Figure 12 Detecting Anomalies

public event EventHandler<AnomalyDetectedEventArgs> AnomalyDetected;
private BrightnessDataset dataSet = new BrightnessDataset();

public async Task AddTestValue(byte brightness)
{
  dataSet.Add(new BrightnessDataPoint(brightness));

  if (dataSet.IsFull)
  {
    try
    {
      var anomalousValues = await anomalyDetectionClient.
        DetectAnomalyAsync(dataSet.Data);

      if (anomalousValues.Count > 0)
      {
        AnomalyDetected?.Invoke(this,
          new AnomalyDetectedEventArgs(anomalousValues));
      }
    }
    catch (Exception ex)
    {
      Debug.WriteLine(ex);
    }
  }
}

To display anomalous values in the UI, I handle the AnomalyDetected event in the MainPage class as follows:

private async void AnomalyDetector_AnomalyDetected(
  object sender, AnomalyDetectedEventArgs e)
{
  await ThreadHelper.InvokeOnMainThread(() =>
  {
    foreach (var anomalousValue in e.AnomalousValues)
    {
      if (!remoteCameraViewModel.AnomalousValues.Contains(anomalousValue))
      {
        remoteCameraViewModel.AnomalousValues.Add(anomalousValue);
      }
    }
  });
}

Specifically, I iterate over the collection of obtained values to check whether they were already added to the local datastore (the AnomalousValues property of the view model). If not, I add them to the observable collection. As a result, only new abnormal values will appear in the list shown previously in Figure 1. I do this additional check because of the rolling window, in which only one element is changed between successive calls to the Web service.

To test my solution, you’ll need to run the RemoteCamera app, start the camera preview and enable anomaly detection using the checkbox on the Anomaly Detection tab. Once this is done you can generate abnormal values by covering your camera. These values should be quickly recognized by the remote ML detector as anomalous and displayed in the listbox (as in Figure 1).

Wrapping up

I demonstrated here how to design two different anomaly detection experiments in the Azure Machine Learning Studio. Both exper-iments were also deployed as Web services and combined with the client app, RemoteCamera, which sends locally acquired time-series data for machine learning analysis to identify abnormalities. Here, I used a Web service in the UWP app. However, you can use the same code to access a Web service from an ASP.NET Web app, where you handle the ML logic at the back end rather than on the endpoint—which, in the case of IoT can be just a simple sensor.

Dawid Borycki is a software engineer and biomedical researcher, author and conference speaker. He enjoys learning new technologies for software experimenting and prototyping.

Thanks to the following Microsoft technical expert for reviewing this article: Dr. James McCaffrey

Discuss this article in the MSDN Magazine forum

Share via