Training
Learning path
This learning path aims to explain learners how to deploy AI at the edge using Azure services.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Important
Starting on the 20th of September, 2023 you won’t be able to create new Anomaly Detector resources. The Anomaly Detector service is being retired on the 1st of October, 2026.
This article provides guidance around recommended practices to follow when using the multivariate Anomaly Detector (MVAD) APIs. In this tutorial, you'll:
Follow the instructions in this section to avoid errors while using MVAD. If you still get errors, refer to the full list of error codes for explanations and actions to take.
These three parameters are required in training and inference API requests:
source
- The link to your zip file located in the Azure Blob Storage with Shared Access Signatures (SAS).startTime
- The start time of data used for training or inference. If it's earlier than the actual earliest timestamp in the data, the actual earliest timestamp will be used as the starting point.endTime
- The end time of data used for training or inference which must be later than or equal to startTime
. If endTime
is later than the actual latest timestamp in the data, the actual latest timestamp will be used as the ending point. If endTime
equals to startTime
, it means inference of one single data point which is often used in streaming scenarios.Other parameters for training API are optional:
slidingWindow
- How many data points are used to determine anomalies. An integer between 28 and 2,880. The default value is 300. If slidingWindow
is k
for model training, then at least k
points should be accessible from the source file during inference to get valid results.
MVAD takes a segment of data points to decide if the next data point is an anomaly. The length of the segment is slidingWindow
.
Please keep two things in mind when choosing a slidingWindow
value:
slidingWindow
. When your data is at a high frequency (small granularity) like minute-level or second-level, you could set a relatively higher value of slidingWindow
.slidingWindow
may cause longer training/inference time. There is no guarantee that larger slidingWindow
s will lead to accuracy gains. A small slidingWindow
may cause the model difficult to converge to an optimal solution. For example, it is hard to detect anomalies when slidingWindow
has only two points.alignMode
- How to align multiple variables (time series) on timestamps. There are two options for this parameter, Inner
and Outer
, and the default value is Outer
.
This parameter is critical when there is misalignment between timestamp sequences of the variables. The model needs to align the variables onto the same timestamp sequence before further processing.
Inner
means the model will report detection results only on timestamps on which every variable has a value, i.e. the intersection of all variables. Outer
means the model will report detection results on timestamps on which any variable has a value, i.e. the union of all variables.
Here is an example to explain different alignModel
values.
Variable-1
timestamp | value |
---|---|
2020-11-01 | 1 |
2020-11-02 | 2 |
2020-11-04 | 4 |
2020-11-05 | 5 |
Variable-2
timestamp | value |
---|---|
2020-11-01 | 1 |
2020-11-02 | 2 |
2020-11-03 | 3 |
2020-11-04 | 4 |
Inner
join two variables
timestamp | Variable-1 | Variable-2 |
---|---|---|
2020-11-01 | 1 | 1 |
2020-11-02 | 2 | 2 |
2020-11-04 | 4 | 4 |
Outer
join two variables
timestamp | Variable-1 | Variable-2 |
---|---|---|
2020-11-01 | 1 | 1 |
2020-11-02 | 2 | 2 |
2020-11-03 | nan |
3 |
2020-11-04 | 4 | 4 |
2020-11-05 | 5 | nan |
fillNAMethod
- How to fill nan
in the merged table. There might be missing values in the merged table and they should be properly handled. We provide several methods to fill them up. The options are Linear
, Previous
, Subsequent
, Zero
, and Fixed
and the default value is Linear
.
Option | Method |
---|---|
Linear |
Fill nan values by linear interpolation |
Previous |
Propagate last valid value to fill gaps. Example: [1, 2, nan, 3, nan, 4] -> [1, 2, 2, 3, 3, 4] |
Subsequent |
Use next valid value to fill gaps. Example: [1, 2, nan, 3, nan, 4] -> [1, 2, 3, 3, 4, 4] |
Zero |
Fill nan values with 0. |
Fixed |
Fill nan values with a specified valid value that should be provided in paddingValue . |
paddingValue
- Padding value is used to fill nan
when fillNAMethod
is Fixed
and must be provided in that case. In other cases it is optional.
displayName
- This is an optional parameter which is used to identify models. For example, you can use it to mark parameters, data sources, and any other meta data about the model and its input data. The default value is an empty string.
MVAD detects anomalies from a group of metrics, and we call each metric a variable or a time series.
You could download the sample data file from Microsoft to check the accepted schema from: https://aka.ms/AnomalyDetector/MVADSampleData
Each variable must have two and only two fields, timestamp
and value
, and should be stored in a comma-separated values (CSV) file.
The column names of the CSV file should be precisely timestamp
and value
, case-sensitive.
The timestamp
values should conform to ISO 8601; the value
could be integers or decimals with any number of decimal places.
A good example of the content of a CSV file:
timestamp | value |
---|---|
2019-04-01T00:00:00Z | 5 |
2019-04-01T00:01:00Z | 3.6 |
2019-04-01T00:02:00Z | 4 |
... | ... |
Note
If your timestamps have hours, minutes, and/or seconds, ensure that they're properly rounded up before calling the APIs.
For example, if your data frequency is supposed to be one data point every 30 seconds, but you're seeing timestamps like "12:00:01" and "12:00:28", it's a strong signal that you should pre-process the timestamps to new values like "12:00:00" and "12:00:30".
For details, please refer to the "Timestamp round-up" section in the best practices document.
The name of the csv file will be used as the variable name and should be unique. For example, "temperature.csv" and "humidity.csv".
Variables for training and variables for inference should be consistent. For example, if you are using series_1
, series_2
, series_3
, series_4
, and series_5
for training, you should provide exactly the same variables for inference.
CSV files should be compressed into a zip file and uploaded to an Azure blob container. The zip file can have whatever name you want.
A common mistake in data preparation is extra folders in the zip file. For example, assume the name of the zip file is series.zip
. Then after decompressing the files to a new folder ./series
, the correct path to CSV files is ./series/series_1.csv
and a wrong path could be ./series/foo/bar/series_1.csv
.
The correct example of the directory tree after decompressing the zip file in Windows
.
└── series
├── series_1.csv
├── series_2.csv
├── series_3.csv
├── series_4.csv
└── series_5.csv
An incorrect example of the directory tree after decompressing the zip file in Windows
.
└── series
└── series
├── series_1.csv
├── series_2.csv
├── series_3.csv
├── series_4.csv
└── series_5.csv
Now you're able to run your code with MVAD APIs without any error. What could be done to improve your model accuracy?
The underlying model of MVAD has millions of parameters. It needs a minimum number of data points to learn an optimal set of parameters. The empirical rule is that you need to provide 5,000 or more data points (timestamps) per variable to train the model for good accuracy. In general, the more the training data, better the accuracy. However, in cases when you're not able to accrue that much data, we still encourage you to experiment with less data and see if the compromised accuracy is still acceptable.
Every time when you call the inference API, you need to ensure that the source data file contains just enough data points. That is normally slidingWindow
+ number of data points that really need inference results. For example, in a streaming case when every time you want to inference on ONE new timestamp, the data file could contain only the leading slidingWindow
plus ONE data point; then you could move on and create another zip file with the same number of data points (slidingWindow
+ 1) but moving ONE step to the "right" side and submit for another inference job.
Anything beyond that or "before" the leading sliding window won't impact the inference result at all and may only cause performance downgrade. Anything below that may lead to an NotEnoughInput
error.
In a group of variables (time series), each variable may be collected from an independent source. The timestamps of different variables may be inconsistent with each other and with the known frequencies. Here's a simple example.
Variable-1
timestamp | value |
---|---|
12:00:01 | 1.0 |
12:00:35 | 1.5 |
12:01:02 | 0.9 |
12:01:31 | 2.2 |
12:02:08 | 1.3 |
Variable-2
timestamp | value |
---|---|
12:00:03 | 2.2 |
12:00:37 | 2.6 |
12:01:09 | 1.4 |
12:01:34 | 1.7 |
12:02:04 | 2.0 |
We have two variables collected from two sensors which send one data point every 30 seconds. However, the sensors aren't sending data points at a strict even frequency, but sometimes earlier and sometimes later. Because MVAD takes into consideration correlations between different variables, timestamps must be properly aligned so that the metrics can correctly reflect the condition of the system. In the above example, timestamps of variable 1 and variable 2 must be properly 'rounded' to their frequency before alignment.
Let's see what happens if they're not pre-processed. If we set alignMode
to be Outer
(which means union of two sets), the merged table is:
timestamp | Variable-1 | Variable-2 |
---|---|---|
12:00:01 | 1.0 | nan |
12:00:03 | nan |
2.2 |
12:00:35 | 1.5 | nan |
12:00:37 | nan |
2.6 |
12:01:02 | 0.9 | nan |
12:01:09 | nan |
1.4 |
12:01:31 | 2.2 | nan |
12:01:34 | nan |
1.7 |
12:02:04 | nan |
2.0 |
12:02:08 | 1.3 | nan |
nan
indicates missing values. Obviously, the merged table isn't what you might have expected. Variable 1 and variable 2 interleave, and the MVAD model can't extract information about correlations between them. If we set alignMode
to Inner
, the merged table is empty as there's no common timestamp in variable 1 and variable 2.
Therefore, the timestamps of variable 1 and variable 2 should be pre-processed (rounded to the nearest 30-second timestamps) and the new time series are:
Variable-1
timestamp | value |
---|---|
12:00:00 | 1.0 |
12:00:30 | 1.5 |
12:01:00 | 0.9 |
12:01:30 | 2.2 |
12:02:00 | 1.3 |
Variable-2
timestamp | value |
---|---|
12:00:00 | 2.2 |
12:00:30 | 2.6 |
12:01:00 | 1.4 |
12:01:30 | 1.7 |
12:02:00 | 2.0 |
Now the merged table is more reasonable.
timestamp | Variable-1 | Variable-2 |
---|---|---|
12:00:00 | 1.0 | 2.2 |
12:00:30 | 1.5 | 2.6 |
12:01:00 | 0.9 | 1.4 |
12:01:30 | 2.2 | 1.7 |
12:02:00 | 1.3 | 2.0 |
Values of different variables at close timestamps are well aligned, and the MVAD model can now extract correlation information.
There are some limitations in both the training and inference APIs, you should be aware of these limitations to avoid errors.
per_second
.We have provided severity that indicates the significance of anomalies. False positives may be filtered out by setting up a threshold on the severity. Sometimes too many false positives may appear when there are pattern shifts in the inference data. In such cases a model may need to be retrained on new data. If the training data contains too many anomalies, there could be false negatives in the detection results. This is because the model learns patterns from the training data and anomalies may bring bias to the model. Thus proper data cleaning may help reduce false negatives.
Generally speaking, it's hard to decide which model is the best without a labeled dataset. However, we can leverage the training and validation losses to have a rough estimation and discard those bad models. First, we need to observe whether training losses converge. Divergent losses often indicate poor quality of the model. Second, loss values may help identify whether underfitting or overfitting occurs. Models that are underfitting or overfitting may not have desired performance. Third, although the definition of the loss function doesn't reflect the detection performance directly, loss values may be an auxiliary tool to estimate model quality. Low loss value is a necessary condition for a good model, thus we may discard models with high loss values.
Apart from the error code table, we've learned from customers like you some common pitfalls while using MVAD APIs. This table will help you to avoid these issues.
Pitfall | Consequence | Explanation and solution |
---|---|---|
Timestamps in training data and/or inference data weren't rounded up to align with the respective data frequency of each variable. | The timestamps of the inference results aren't as expected: either too few timestamps or too many timestamps. | Please refer to Timestamp round-up. |
Too many anomalous data points in the training data | Model accuracy is impacted negatively because it treats anomalous data points as normal patterns during training. | Empirically, keep the abnormal rate at or below 1% will help. |
Too little training data | Model accuracy is compromised. | Empirically, training a MVAD model requires 15,000 or more data points (timestamps) per variable to keep a good accuracy. |
Taking all data points with isAnomaly =true as anomalies |
Too many false positives | You should use both isAnomaly and severity (or score ) to sift out anomalies that aren't severe and (optionally) use grouping to check the duration of the anomalies to suppress random noises. Please refer to the FAQ section below for the difference between severity and score . |
Sub-folders are zipped into the data file for training or inference. | The csv data files inside sub-folders are ignored during training and/or inference. | No sub-folders are allowed in the zip file. Please refer to Folder structure for details. |
Too much data in the inference data file: for example, compressing all historical data in the inference data zip file | You may not see any errors but you'll experience degraded performance when you try to upload the zip file to Azure Blob as well as when you try to run inference. | Please refer to Data quantity for details. |
Creating Anomaly Detector resources on Azure regions that don't support MVAD yet and calling MVAD APIs | You'll get a "resource not found" error while calling the MVAD APIs. | During preview stage, MVAD is available on limited regions only. Please bookmark What's new in Anomaly Detector to keep up to date with MVAD region roll-outs. You could also file a GitHub issue or contact us at AnomalyDetector@microsoft.com to request for specific regions. |
Let's use two examples to learn how MVAD's sliding window works. Suppose you have set slidingWindow
= 1,440, and your input data is at one-minute granularity.
Streaming scenario: You want to predict whether the ONE data point at "2021-01-02T00:00:00Z" is anomalous. Your startTime
and endTime
will be the same value ("2021-01-02T00:00:00Z"). Your inference data source, however, must contain at least 1,440 + 1 timestamps. Because MVAD will take the leading data before the target data point ("2021-01-02T00:00:00Z") to decide whether the target is an anomaly. The length of the needed leading data is slidingWindow
or 1,440 in this case. 1,440 = 60 * 24, so your input data must start from at latest "2021-01-01T00:00:00Z".
Batch scenario: You have multiple target data points to predict. Your endTime
will be greater than your startTime
. Inference in such scenarios is performed in a "moving window" manner. For example, MVAD will use data from 2021-01-01T00:00:00Z
to 2021-01-01T23:59:00Z
(inclusive) to determine whether data at 2021-01-02T00:00:00Z
is anomalous. Then it moves forward and uses data from 2021-01-01T00:01:00Z
to 2021-01-02T00:00:00Z
(inclusive)
to determine whether data at 2021-01-02T00:01:00Z
is anomalous. It moves on in the same manner (taking 1,440 data points to compare) until the last timestamp specified by endTime
(or the actual latest timestamp). Therefore, your inference data source must contain data starting from startTime
- slidingWindow
and ideally contains in total of size slidingWindow
+ (endTime
- startTime
).
Normally we recommend you to use severity
as the filter to sift out 'anomalies' that aren't so important to your business. Depending on your scenario and data pattern, those anomalies that are less important often have relatively lower severity
values or standalone (discontinuous) high severity
values like random spikes.
In cases where you've found a need of more sophisticated rules than thresholds against severity
or duration of continuous high severity
values, you may want to use score
to build more powerful filters. Understanding how MVAD is using score
to determine anomalies may help:
We consider whether a data point is anomalous from both global and local perspective. If score
at a timestamp is higher than a certain threshold, then the timestamp is marked as an anomaly. If score
is lower than the threshold but is relatively higher in a segment, it's also marked as an anomaly.
Training
Learning path
This learning path aims to explain learners how to deploy AI at the edge using Azure services.
Documentation
Quickstart: Anomaly detection using the Anomaly Detector client library - Azure AI services
The Anomaly Detector API offers client libraries to detect abnormalities in your data series either as a batch or on streaming data.
Frequently asked questions - Anomaly Detector - Azure AI services
Get answers to frequently asked questions about the Anomaly Detector Service in Azure AI services.
Tutorial: Use Univariate Anomaly Detector in Azure Data Explorer - Azure AI services
Learn how to use the Univariate Anomaly Detector with Azure Data Explorer.