SsaChangePointDetector Class
This transform detects the change-points in a seasonal time-series using Singular Spectrum Analysis (SSA).
- Inheritance
-
nimbusml.internal.core.timeseries._ssachangepointdetector.SsaChangePointDetectorSsaChangePointDetectornimbusml.base_transform.BaseTransformSsaChangePointDetectorsklearn.base.TransformerMixinSsaChangePointDetector
Constructor
SsaChangePointDetector(training_window_size=100, confidence=95.0, seasonal_window_size=10, change_history_length=20, error_function='SignedDifference', martingale='Power', power_martingale_epsilon=0.1, columns=None, **params)
Parameters
- columns
see Columns.
- training_window_size
The number of points, N, from the beginning of the sequence used to train the SSA model.
- confidence
The confidence for change point detection in the range [0, 100].
- seasonal_window_size
An upper bound, L, on the largest relevant seasonality in the input time-series, which also determines the order of the autoregression of SSA. It must satisfy 2 < L < N/2.
- change_history_length
The length of the sliding window on p-value for computing the martingale score.
- error_function
The function used to compute the error between the
expected and the observed value. Possible values are:
{SignedDifference
, AbsoluteDifference
, SignedProportion
,
AbsoluteProportion
, SquaredDifference
}.
- martingale
The type of martingale betting function used for
computing the martingale score. Available options are {Power
,
Mixture
}.
- power_martingale_epsilon
The epsilon parameter for the Power
martingale if martingale is set to Power
.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# SsaChangePointDetector
import numpy as np
import pandas as pd
from nimbusml.timeseries import SsaChangePointDetector
# This example creates a time series (list of data with the
# i-th element corresponding to the i-th time slot).
# The estimator is applied to identify points where data distribution changed.
# This estimator can account for temporal seasonality in the data.
# Generate sample series data with a recurring
# pattern and a spike within the pattern
seasonality_size = 5
seasonal_data = np.arange(seasonality_size)
data = np.tile(seasonal_data, 3)
data = np.append(data, [0, 100, 200, 300, 400]) # change distribution
X_train = pd.Series(data, name="ts")
# X_train looks like this
# 0 0
# 1 1
# 2 2
# 3 3
# 4 4
# 5 0
# 6 1
# 7 2
# 8 3
# 9 4
# 10 0
# 11 1
# 12 2
# 13 3
# 14 4
# 15 0
# 16 100
# 17 200
# 18 300
# 19 400
training_seasons = 3
training_size = seasonality_size * training_seasons
cpd = SsaChangePointDetector(confidence=95,
change_history_length=8,
training_window_size=training_size,
seasonal_window_size=seasonality_size + 1) << {'result': 'ts'}
cpd.fit(X_train, verbose=1)
data = cpd.transform(X_train)
print(data)
# ts result.Alert result.Raw Score result.P-Value Score result.Martingale Score
# 0 0 0.0 -2.531824 5.000000e-01 1.470334e-06
# 1 1 0.0 -0.008832 5.818072e-03 8.094459e-05
# 2 2 0.0 0.763040 1.374071e-01 2.588526e-04
# 3 3 0.0 0.693811 2.797713e-01 4.365186e-04
# 4 4 0.0 1.442079 1.838294e-01 1.074242e-03
# 5 0 0.0 -1.844414 1.707238e-01 2.825599e-03
# 6 1 0.0 0.219578 4.364025e-01 3.193633e-03
# 7 2 0.0 0.201708 4.505472e-01 3.507451e-03
# 8 3 0.0 0.157089 4.684456e-01 3.719387e-03
# 9 4 0.0 1.329494 1.773046e-01 1.717610e-04
# 10 0 0.0 -1.792391 7.353794e-02 3.014897e-04
# 11 1 0.0 0.161634 4.999295e-01 1.788041e-04
# 12 2 0.0 0.092626 4.953789e-01 7.326680e-05
# 13 3 0.0 0.084648 4.514174e-01 3.053876e-05
# 14 4 0.0 1.305554 1.202619e-01 9.741702e-05
# 15 0 0.0 -1.792391 7.264402e-02 5.034093e-04
# 16 100 1.0 99.161634 1.000000e-08 4.031944e+03 <-- alert is on, predicted change point
# 17 200 0.0 185.229474 5.485437e-04 7.312609e+05
# 18 300 0.0 270.403543 1.259683e-02 3.578470e+06
# 19 400 0.0 357.113747 2.978766e-02 4.529837e+07
Remarks
Singular Spectrum Analysis (SSA) is a powerful framework for decomposing the time-series into trend, seasonality and noise components as well as forecasting the future values of the time-series. In order to remove the effect of such components on anomaly detection, this transform add SSA as a time-series modeler component in the detection pipeline.
The SSA component will be trained and it predicts the next expected value on the time-series under normal condition; this expected value is further used to calculate the amount of deviation from the normal behavior at that timestamp. The distribution of this deviation is then modeled using Adaptive kernel density estimation.
This transform detects change points by calculating the martingale score for the sliding window based on the estimated distribution of deviations. The idea is based on the Exchangeability Martingales that detects a change of distribution over a stream of i.i.d. values. In short, the value of the martingale score starts increasing significantly when a sequence of small p-values detected in a row; this indicates the change of the distribution of the underlying data generation process.
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep