Azure Multivariate Anomaly Detection: What start and end time, model and data set size is correct?

Question

Hello,

I am experimenting with Azure Anomaly Detection (Multivariate) and used https://github.com/Azure-Samples/AnomalyDetector/blob/master/quickstarts-multivariate/Java/MultivariateSample.java as a starting point.

I want to batch infer data points in 24 hour intervals, e.g. between 2022-02-16T00:00:00 (inclusive) and 2022-02-16T23:59:00 (inclusive) with minute-level granularity.
Assuming anomaly detection should be based on a 1440 window. Thus having to provide data points in interval 2022-02-15T00:00:00 (inclusive) and 2022-02-15T23:59:00 (inclusive) also on minute-level granularity.

So for example

to infer 2022-02-16T00:00:00, datapoints in interval 2022-02-15T00:00:00 (inclusive) to 2022-02-15T23:59:00 (inclusive) are used
to infer 2022-02-16T00:01:00, datapoints in interval 2022-02-15T00:01:00 (inclusive) to 2022-02-16T00:00:00 (inclusive) are used
...

Question: What start and end time should be used for training (https://github.com/Azure-Samples/AnomalyDetector/blob/master/quickstarts-multivariate/Java/MultivariateSample.java#L140
) and inference (https://github.com/Azure-Samples/AnomalyDetector/blob/master/quickstarts-multivariate/Java/MultivariateSample.java#L175
)?

Currently I provide a dataset with size 1440*2 starting from 2022-02-15T00:00:00 to 2022-02-16T23:59:00 for both training and inference.
For

training use startTime=2022-02-15T00:00:00 and endTime=2022-02-16T23:59:00 and window size 1440
inference use startTime=2022-02-16T00:00:00 and endTime=2022-02-16T23:59:00 and window size 1440

Is that correct?

When doing inference with startTime=2022-02-15T00:00:00 and endTime=2022-02-16T23:59:00 I am getting different results than when calling the model with startTime=2022-02-16T00:00:00 and endTime=2022-02-16T23:59:00
Do i actually have to provide a size 1440*2 dataset if i only want to batch infer 1440 data points in a 24h interval?

Thanks

Answer

Hi, according to the document, each variable must have two and only two fields, timestamp and value. For training data size, the maximum number of timestamps is 1000000, and a recommended minimum number is 15000 timestamps. For sliding window, you specify how many data points are used to determine anomalies (an integer between 28 and 2,880, the default value is 300). Please review the document for more details on input schema, input parameters, model training, etc. Let us know if you need further clarification after reviewing the document in it's entirety. Thanks.

Azure Multivariate Anomaly Detection: What start and end time, model and data set size is correct?

1 answer