Edit

Share via


detect_anomalous_spike_fl()

Applies to: ✅ Microsoft FabricAzure Data ExplorerAzure MonitorMicrosoft Sentinel

Detect the appearance of anomalous spikes in numeric variables in timestamped data.

The function detect_anomalous_spike_fl() is a UDF (user-defined function) that detects the appearance of anomalous spikes in numeric variables - such as amount of exfiltrated data or failed sign in attempts - in timestamped data, such as traffic logs. In cybersecurity context, such events might be suspicious and indicate a potential attack or compromise.

The anomaly model is based on a combination of two scores: Z-score (the number of standard deviations above average) and Q-score (the number of interquantile ranges above a high quantile). Z-score is a straightforward and common outlier metric; Q-score is based on Tukey's fences - but we extend the definition to any quantiles for more control. Choosing different quantiles (by default, 95th and 25th quantiles are used) allows to detect more significant outliers, thus improving precision. The model is built on top of some numeric variable (such as the number of login attempts or amount of exfiltrated data) and is calculated per scope (such as subscription or account) and per entity (such as user or device).

After calculating the scores for a single-variate numeric datapoint and checking other requirements (for example, the number of active days in training period on scope is above a predefined threshold), we check whether each of the scores is above its predefined threshold. If so, a spike is detected and the datapoint is flagged as anomalous. Two models are built: one for entity level (defined by entityColumnName parameter) - such as user or device per scope (defined by scopeColumnName parameter) - such as account or subscription. The second model is built for the whole scope. The anomaly detection logic is executed for each model and if anomaly is detected in one of them - it is shown. By default, upward spikes are detected; downward spikes ('dips') can also be interesting in some contexts and can be detected by adapting the logic.

The model's direct output is an anomaly score based on the scores. The score is monotonous in the range of [0, 1], with 1 representing something anomalous. In addition to the anomaly score, there's a binary flag for detected anomaly (controlled by a minimal threshold parameter), and other explanatory fields.

Note that the function disregards the temporal structure of the variable (mainly for scalability and explainability). If the variable has significant temporal components - such as trend and seasonalities - we suggest considering either the series_decompose_anomalies() function, or use series_decompose() in order to calculate the residual and execute detect_anomalous_spike_fl() on top of it.

Syntax

detect_anomalous_spike_fl(numericColumnName, entityColumnName, scopeColumnName, timeColumnName, startTraining, startDetection, endDetection, [minTrainingDaysThresh], [lowPercentileForQscore], [highPercentileForQscore], [minSlicesPerEntity], [zScoreThreshEntity], [qScoreThreshEntity], [minNumValueThreshEntity], [minSlicesPerScope], [zScoreThreshScope], [qScoreThreshScope], [minNumValueThreshScope])

Learn more about syntax conventions.

Parameters

Name Type Required Description
numericColumnName string ✔️ The name of the input table column containing numeric variable for which anomaly models are calculated.
entityColumnName string ✔️ The name of the input table column containing the names or IDs of the entities for which anomaly model is calculated.
scopeColumnName string ✔️ The name of the input table column containing the partition or scope, so that a different anomaly model is built for each scope.
timeColumnName string ✔️ The name of the input table column containing the timestamps, that are used to define the training and detection periods.
startTraining datetime ✔️ The beginning of the training period for the anomaly model. Its end is defined by the beginning of detection period.
startDetection datetime ✔️ The beginning of the detection period for anomaly detection.
endDetection datetime ✔️ The end of the detection period for anomaly detection.
minTrainingDaysThresh int The minimum number of days in training period that a scope exists to calculate anomalies. If it is below threshold, the scope is considered too new and unknown, so anomalies aren't calculated. The default value is 14.
lowPercentileForQscore real A number in range [0.0,1.0] representing the percentile to be calculated as low limit for Q-score. In Tukey's fences, 0.25 is used. The default value is 0.25. Choosing a lower percentile improves precision as more significant anomalies are detected.
highPercentileForQscore real A number in range [0.0,1.0] representing the percentile to be calculated as high limit for Q-score. In Tukey's fences, 0.75 is used. The default value is 0.9. Choosing a higher percentile improves precision as more significant anomalies are detected.
minSlicesPerEntity int The minimum threshold of 'slices' (for example, days) to exist on an entity before anomaly model is built for it. If the number is below the threshold, the entity is considered too new and unstable. The default value is 20.
zScoreThreshEntity real The minimum threshold for entity-level Z-score (number of standard deviations above average) to be flagged as anomaly. When choosing higher values, only more significant anomalies are detected. The default value is 3.0.
qScoreThreshEntity real The minimum threshold for entity-level Q-score (number of interquantile ranges above high quantile) to be flagged as anomaly. When choosing higher values, only more significant anomalies are detected. Default value is 2.0.
minNumValueThreshEntity long The minimum threshold for numeric variable to be flagged as anomaly for an entity. This is useful for filtering cases when a value is anomalous statistically (high Z-score and Q-score), but the value itself is too small to be interesting. The default value is 0.
minSlicesPerScope int The minimum threshold of 'slices' (for example, days) to exist on a scope before anomaly model is built for it. If the number is below the threshold, the scope is considered too new and unstable. The default value is 20.
zScoreThreshScope real The minimum threshold for scope-level Z-score (number of standard deviations above average) to be flagged as anomaly. When choosing higher values, only more significant anomalies are detected. The default value is 3.0.
qScoreThreshScope real The minimum threshold for scope-level Q-score (number of interquantile ranges above high quantile) to be flagged as anomaly. When choosing higher values, only more significant anomalies are detected. The default value is 2.0.
minNumValueThreshScope long The minimum threshold for numeric variable to be flagged as anomaly for a scope. This is useful for filtering cases when a value is anomalous statistically (high Z-score and Q-score), but the value itself is too small to be interesting. The default value is 0.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Define the function using the following let statement. No permissions are required.

Important

A let statement can't run on its own. It must be followed by a tabular expression statement. To run a working example of detect_anomalous_spike_fl(), see Example.

let detect_anomalous_spike_fl = (T:(*), numericColumnName:string, entityColumnName:string, scopeColumnName:string
                            , timeColumnName:string, startTraining:datetime, startDetection:datetime, endDetection:datetime, minTrainingDaysThresh:int = 14
                            , lowPercentileForQscore:real = 0.25, highPercentileForQscore:real = 0.9
                            , minSlicesPerEntity:int = 20, zScoreThreshEntity:real = 3.0, qScoreThreshEntity:real = 2.0, minNumValueThreshEntity:long = 0
                            , minSlicesPerScope:int = 20, zScoreThreshScope:real = 3.0, qScoreThreshScope:real = 2.0, minNumValueThreshScope:long = 0)
{
// pre-process the input data by adding standard column names and dividing to datasets
let timePeriodBinSize = 'day';      // we assume a reasonable bin for time is day
let processedData = (
    T
    | extend scope      = column_ifexists(scopeColumnName, '')
    | extend entity     = column_ifexists(entityColumnName, '')
    | extend numVec     = tolong(column_ifexists(numericColumnName, 0))
    | extend sliceTime  = todatetime(column_ifexists(timeColumnName, ''))
    | where isnotempty(scope) and isnotempty(sliceTime)
    | extend dataSet = case((sliceTime >= startTraining and sliceTime < startDetection), 'trainSet'
                           , sliceTime >= startDetection and sliceTime <= endDetection,  'detectSet'
                                                                                       , 'other')
    | where dataSet in ('trainSet', 'detectSet')
);
let aggregatedCandidateScopeData = (
    processedData
    | summarize firstSeenScope = min(sliceTime), lastSeenScope = max(sliceTime) by scope
    | extend slicesInTrainingScope = datetime_diff(timePeriodBinSize, startDetection, firstSeenScope)
    | where slicesInTrainingScope >= minTrainingDaysThresh and lastSeenScope >= startDetection
);
let entityModelData = (
    processedData
    | join kind = inner (aggregatedCandidateScopeData) on scope
    | where dataSet == 'trainSet'
    | summarize countSlicesEntity = dcount(sliceTime), avgNumEntity = avg(numVec), sdNumEntity = stdev(numVec)
            , lowPrcNumEntity = percentile(numVec, lowPercentileForQscore), highPrcNumEntity = percentile(numVec, highPercentileForQscore)
            , firstSeenEntity = min(sliceTime), lastSeenEntity = max(sliceTime)
        by scope, entity
    | extend slicesInTrainingEntity = datetime_diff(timePeriodBinSize, startDetection, firstSeenEntity)
);
let scopeModelData = (
    processedData
    | join kind = inner (aggregatedCandidateScopeData) on scope
    | where dataSet == 'trainSet'
    | summarize countSlicesScope = dcount(sliceTime), avgNumScope = avg(numVec), sdNumScope = stdev(numVec)
            , lowPrcNumScope = percentile(numVec, lowPercentileForQscore), highPrcNumScope = percentile(numVec, highPercentileForQscore)
        by scope
);
let resultsData = (
    processedData
    | where dataSet == 'detectSet'
    | join kind = inner (aggregatedCandidateScopeData) on scope 
    | join kind = leftouter (entityModelData) on scope, entity 
    | join kind = leftouter (scopeModelData) on scope
    | extend zScoreEntity       = iff(countSlicesEntity >= minSlicesPerEntity, round((toreal(numVec) - avgNumEntity)/(sdNumEntity + 1), 2), 0.0)
            , qScoreEntity      = iff(countSlicesEntity >= minSlicesPerEntity, round((toreal(numVec) - highPrcNumEntity)/(highPrcNumEntity - lowPrcNumEntity + 1), 2), 0.0)
            , zScoreScope       = iff(countSlicesScope >= minSlicesPerScope, round((toreal(numVec) - avgNumScope)/(sdNumScope + 1), 2), 0.0)
            , qScoreScope       = iff(countSlicesScope >= minSlicesPerScope, round((toreal(numVec) - highPrcNumScope)/(highPrcNumScope - lowPrcNumScope + 1), 2), 0.0)
    | extend isSpikeOnEntity    = iff((slicesInTrainingEntity >= minTrainingDaysThresh and zScoreEntity > zScoreThreshEntity and qScoreEntity > qScoreThreshEntity and numVec >= minNumValueThreshEntity), 1, 0)
            , entityHighBaseline= round(max_of((avgNumEntity + sdNumEntity), highPrcNumEntity), 2)
            , isSpikeOnScope    = iff((countSlicesScope >= minTrainingDaysThresh and zScoreScope > zScoreThreshScope and qScoreScope > qScoreThreshScope and numVec >= minNumValueThreshScope), 1, 0)
            , scopeHighBaseline = round(max_of((avgNumEntity + 2 * sdNumEntity), highPrcNumScope), 2)
    | extend entitySpikeAnomalyScore = iff(isSpikeOnEntity  == 1, round(1.0 - 0.25/(max_of(zScoreEntity, qScoreEntity)),4), 0.00)
            , scopeSpikeAnomalyScore = iff(isSpikeOnScope == 1, round(1.0 - 0.25/(max_of(zScoreScope, qScoreScope)), 4), 0.00)
    | where isSpikeOnEntity == 1 or isSpikeOnScope == 1
    | extend avgNumEntity   = round(avgNumEntity, 2), sdNumEntity = round(sdNumEntity, 2)
            , avgNumScope   = round(avgNumScope, 2), sdNumScope = round(sdNumScope, 2)
   | project-away entity1, scope1, scope2, scope3
   | extend anomalyType = iff(isSpikeOnEntity == 1, strcat('spike_', entityColumnName), strcat('spike_', scopeColumnName)), anomalyScore = max_of(entitySpikeAnomalyScore, scopeSpikeAnomalyScore)
   | extend anomalyExplainability = iff(isSpikeOnEntity == 1
        , strcat('The value of numeric variable ', numericColumnName, ' for ', entityColumnName, ' ', entity, ' is ', numVec, ', which is abnormally high for this '
            , entityColumnName, ' at this ', scopeColumnName
            , '. Based on observations from last ' , slicesInTrainingEntity, ' ', timePeriodBinSize, 's, the expected baseline value is below ', entityHighBaseline, '.')
        , strcat('The value of numeric variable ', numericColumnName, ' on ', scopeColumnName, ' ', scope, ' is ', numVec, ', which is abnormally high for this '
            , scopeColumnName, '. Based on observations from last ' , slicesInTrainingScope, ' ', timePeriodBinSize, 's, the expected baseline value is below ', scopeHighBaseline, '.'))
   | extend anomalyState = iff(isSpikeOnEntity == 1
        , bag_pack('avg', avgNumEntity, 'stdev', sdNumEntity, strcat('percentile_', lowPercentileForQscore), lowPrcNumEntity, strcat('percentile_', highPercentileForQscore), highPrcNumEntity)
        , bag_pack('avg', avgNumScope, 'stdev', sdNumScope, strcat('percentile_', lowPercentileForQscore), lowPrcNumScope, strcat('percentile_', highPercentileForQscore), highPrcNumScope))
   | project-away lowPrcNumEntity, highPrcNumEntity, lowPrcNumScope, highPrcNumScope
);
resultsData
};
// Write your query to use the function here.

Example

The following example uses the invoke operator to run the function.

To use a query-defined function, invoke it after the embedded function definition.

let detect_anomalous_spike_fl = (T:(*), numericColumnName:string, entityColumnName:string, scopeColumnName:string
                            , timeColumnName:string, startTraining:datetime, startDetection:datetime, endDetection:datetime, minTrainingDaysThresh:int = 14
                            , lowPercentileForQscore:real = 0.25, highPercentileForQscore:real = 0.9
                            , minSlicesPerEntity:int = 20, zScoreThreshEntity:real = 3.0, qScoreThreshEntity:real = 2.0, minNumValueThreshEntity:long = 0
                            , minSlicesPerScope:int = 20, zScoreThreshScope:real = 3.0, qScoreThreshScope:real = 2.0, minNumValueThreshScope:long = 0)
{
// pre-process the input data by adding standard column names and dividing to datasets
let timePeriodBinSize = 'day';      // we assume a reasonable bin for time is day
let processedData = (
    T
    | extend scope      = column_ifexists(scopeColumnName, '')
    | extend entity     = column_ifexists(entityColumnName, '')
    | extend numVec     = tolong(column_ifexists(numericColumnName, 0))
    | extend sliceTime  = todatetime(column_ifexists(timeColumnName, ''))
    | where isnotempty(scope) and isnotempty(sliceTime)
    | extend dataSet = case((sliceTime >= startTraining and sliceTime < startDetection), 'trainSet'
                           , sliceTime >= startDetection and sliceTime <= endDetection,  'detectSet'
                                                                                       , 'other')
    | where dataSet in ('trainSet', 'detectSet')
);
let aggregatedCandidateScopeData = (
    processedData
    | summarize firstSeenScope = min(sliceTime), lastSeenScope = max(sliceTime) by scope
    | extend slicesInTrainingScope = datetime_diff(timePeriodBinSize, startDetection, firstSeenScope)
    | where slicesInTrainingScope >= minTrainingDaysThresh and lastSeenScope >= startDetection
);
let entityModelData = (
    processedData
    | join kind = inner (aggregatedCandidateScopeData) on scope
    | where dataSet == 'trainSet'
    | summarize countSlicesEntity = dcount(sliceTime), avgNumEntity = avg(numVec), sdNumEntity = stdev(numVec)
            , lowPrcNumEntity = percentile(numVec, lowPercentileForQscore), highPrcNumEntity = percentile(numVec, highPercentileForQscore)
            , firstSeenEntity = min(sliceTime), lastSeenEntity = max(sliceTime)
        by scope, entity
    | extend slicesInTrainingEntity = datetime_diff(timePeriodBinSize, startDetection, firstSeenEntity)
);
let scopeModelData = (
    processedData
    | join kind = inner (aggregatedCandidateScopeData) on scope
    | where dataSet == 'trainSet'
    | summarize countSlicesScope = dcount(sliceTime), avgNumScope = avg(numVec), sdNumScope = stdev(numVec)
            , lowPrcNumScope = percentile(numVec, lowPercentileForQscore), highPrcNumScope = percentile(numVec, highPercentileForQscore)
        by scope
);
let resultsData = (
    processedData
    | where dataSet == 'detectSet'
    | join kind = inner (aggregatedCandidateScopeData) on scope 
    | join kind = leftouter (entityModelData) on scope, entity 
    | join kind = leftouter (scopeModelData) on scope
    | extend zScoreEntity       = iff(countSlicesEntity >= minSlicesPerEntity, round((toreal(numVec) - avgNumEntity)/(sdNumEntity + 1), 2), 0.0)
            , qScoreEntity      = iff(countSlicesEntity >= minSlicesPerEntity, round((toreal(numVec) - highPrcNumEntity)/(highPrcNumEntity - lowPrcNumEntity + 1), 2), 0.0)
            , zScoreScope       = iff(countSlicesScope >= minSlicesPerScope, round((toreal(numVec) - avgNumScope)/(sdNumScope + 1), 2), 0.0)
            , qScoreScope       = iff(countSlicesScope >= minSlicesPerScope, round((toreal(numVec) - highPrcNumScope)/(highPrcNumScope - lowPrcNumScope + 1), 2), 0.0)
    | extend isSpikeOnEntity    = iff((slicesInTrainingEntity >= minTrainingDaysThresh and zScoreEntity > zScoreThreshEntity and qScoreEntity > qScoreThreshEntity and numVec >= minNumValueThreshEntity), 1, 0)
            , entityHighBaseline= round(max_of((avgNumEntity + sdNumEntity), highPrcNumEntity), 2)
            , isSpikeOnScope    = iff((countSlicesScope >= minTrainingDaysThresh and zScoreScope > zScoreThreshScope and qScoreScope > qScoreThreshScope and numVec >= minNumValueThreshScope), 1, 0)
            , scopeHighBaseline = round(max_of((avgNumEntity + 2 * sdNumEntity), highPrcNumScope), 2)
    | extend entitySpikeAnomalyScore = iff(isSpikeOnEntity  == 1, round(1.0 - 0.25/(max_of(zScoreEntity, qScoreEntity)),4), 0.00)
            , scopeSpikeAnomalyScore = iff(isSpikeOnScope == 1, round(1.0 - 0.25/(max_of(zScoreScope, qScoreScope)), 4), 0.00)
    | where isSpikeOnEntity == 1 or isSpikeOnScope == 1
    | extend avgNumEntity   = round(avgNumEntity, 2), sdNumEntity = round(sdNumEntity, 2)
            , avgNumScope   = round(avgNumScope, 2), sdNumScope = round(sdNumScope, 2)
   | project-away entity1, scope1, scope2, scope3
   | extend anomalyType = iff(isSpikeOnEntity == 1, strcat('spike_', entityColumnName), strcat('spike_', scopeColumnName)), anomalyScore = max_of(entitySpikeAnomalyScore, scopeSpikeAnomalyScore)
   | extend anomalyExplainability = iff(isSpikeOnEntity == 1
        , strcat('The value of numeric variable ', numericColumnName, ' for ', entityColumnName, ' ', entity, ' is ', numVec, ', which is abnormally high for this '
            , entityColumnName, ' at this ', scopeColumnName
            , '. Based on observations from last ' , slicesInTrainingEntity, ' ', timePeriodBinSize, 's, the expected baseline value is below ', entityHighBaseline, '.')
        , strcat('The value of numeric variable ', numericColumnName, ' on ', scopeColumnName, ' ', scope, ' is ', numVec, ', which is abnormally high for this '
            , scopeColumnName, '. Based on observations from last ' , slicesInTrainingScope, ' ', timePeriodBinSize, 's, the expected baseline value is below ', scopeHighBaseline, '.'))
   | extend anomalyState = iff(isSpikeOnEntity == 1
        , bag_pack('avg', avgNumEntity, 'stdev', sdNumEntity, strcat('percentile_', lowPercentileForQscore), lowPrcNumEntity, strcat('percentile_', highPercentileForQscore), highPrcNumEntity)
        , bag_pack('avg', avgNumScope, 'stdev', sdNumScope, strcat('percentile_', lowPercentileForQscore), lowPrcNumScope, strcat('percentile_', highPercentileForQscore), highPrcNumScope))
   | project-away lowPrcNumEntity, highPrcNumEntity, lowPrcNumScope, highPrcNumScope
);
resultsData
};
let detectPeriodStart   	= datetime(2022-04-30 05:00:00.0000000);
let trainPeriodStart    	= datetime(2022-03-01 05:00);
let names               	= pack_array("Admin", "Dev1", "Dev2", "IT-support");
let countNames          	= array_length(names);
let testData            	= range t from 1 to 24*60 step 1
    | extend timeSlice      = trainPeriodStart + 1h * t
    | extend countEvents    = round(2*rand() + iff((t/24)%7>=5, 10.0, 15.0) - (((t%24)/10)*((t%24)/10)), 2) * 100
    | extend userName       = tostring(names[toint(rand(countNames))])
    | extend deviceId       = hash_md5(rand())
    | extend accountName    = iff(((rand() < 0.2) and (timeSlice < detectPeriodStart)), 'testEnvironment', 'prodEnvironment')
    | extend userName       = iff(timeSlice == detectPeriodStart, 'H4ck3r', userName)
    | extend countEvents 	= iff(timeSlice == detectPeriodStart, 3*countEvents, countEvents)
    | sort by timeSlice desc
;    
testData
| invoke detect_anomalous_spike_fl(numericColumnName        = 'countEvents'
                                , entityColumnName          = 'userName'
                                , scopeColumnName           = 'accountName'
                                , timeColumnName            = 'timeSlice'
                                , startTraining             = trainPeriodStart
                                , startDetection            = detectPeriodStart
                                , endDetection              = detectPeriodStart
                            )

Output

t timeSlice countEvents userName deviceId accountName scope entity numVec sliceTime dataSet firstSeenScope lastSeenScope slicesInTrainingScope countSlicesEntity avgNumEntity sdNumEntity firstSeenEntity lastSeenEntity slicesInTrainingEntity countSlicesScope avgNumScope sdNumScope zScoreEntity qScoreEntity zScoreScope qScoreScope isSpikeOnEntity entityHighBaseline isSpikeOnScope scopeHighBaseline entitySpikeAnomalyScore scopeSpikeAnomalyScore anomalyType anomalyScore anomalyExplainability anomalyState
1440 2022-04-30 05:00:00.0000000 5079 H4ck3r 9e8e151aced5a64938b93ee0c13fe940 prodEnvironment prodEnvironment H4ck3r 5079 2022-04-30 05:00:00.0000000 detectSet 2022-03-01 08:00:00.0000000 2022-04-30 05:00:00.0000000 60 1155 1363.22 267.51 0 0 13.84 185.46 0 1 628 0 0.9987 spike_accountName 0.9987 The value of numeric variable countEvents on accountName prodEnvironment is 5079, which is abnormally high for this accountName. Based on observations from last 60 days, the expected baseline value is below 628.0. {"avg": 1363.22,"stdev": 267.51,"percentile_0.25": 605,"percentile_0.9": 628}

The output of running the function is the rows in detection dataset that were tagged as anomalous spikes either at scope or entity levels. Some other fields are added for clarity:

  • dataSet: current dataset (is always detectSet).
  • firstSeenScope: timestamp when the scope was first seen.
  • lastSeenScope: timestamp when the scope was last seen.
  • slicesInTrainingScope: number of slices (for example, days) that the scope exists in training dataset.
  • countSlicesEntity: number of slices (for example, days) that the entity exists on scope.
  • avgNumEntity: average of the numeric variable in training set per entity on scope.
  • sdNumEntity: standard deviation of the numeric variable in training set per entity on scope.
  • firstSeenEntity: timestamp when the entity was first seen on scope.
  • lastSeenEntity: timestamp when the entity was last seen on scope.
  • slicesInTrainingEntity: number of slices (for example, days) that the entity exists on scope in training dataset.
  • countSlicesScope: number of slices (for example, days) that the scope exists.
  • avgNumScope: average of the numeric variable in training set per scope.
  • sdNumScope: standard deviation of the numeric variable in training set per scope.
  • zScoreEntity: Z-score for the current value of numeric variable based on entity model.
  • qScoreEntity: Q-score for the current value of numeric variable based on entity model.
  • zScoreScope: Z-score for the current value of numeric variable based on scope model.
  • qScoreScope: Q-score for the current value of numeric variable based on scope model.
  • isSpikeOnEntity: binary flag for anomalous spike based on entity model.
  • entityHighBaseline: expected high baseline for numeric variable values based on entity model.
  • isSpikeOnScope: binary flag for anomalous spike based on scope model.
  • scopeHighBaseline: expected high baseline for numeric variable values based on scope model.
  • entitySpikeAnomalyScore: anomaly score for the spike based on entity model; a number in range [0,1], higher values meaning more anomaly.
  • scopeSpikeAnomalyScore: anomaly score for the spike based on scope model; a number in range [0,1], higher values meaning more anomaly.
  • anomalyType: shows the type of anomaly (helpful when running several anomaly detection logics together).
  • anomalyScore: anomaly score for the spike based on the chosen model.
  • anomalyExplainability: textual wrapper for generated anomaly and its explanation.
  • anomalyState: bag of metrics from the chosen model (average, standard deviation and percentiles) describing the model.

In the example above, running this function on countEvents variable using user as entity and account as scope with default parameters detects a spike on scope level. Since the user 'H4ck3r' doesn't have enough data in training period, the anomaly isn't calculated for entity level and all relevant fields are empty. The scope level anomaly has an anomaly score of 0.998, meaning that this spike is anomalous for the scope.

If we raise any of the minimum thresholds high enough, no anomaly will be detected since requirements would be too high.

The output shows the rows with anomalous spikes together with explanation fields in standardized format. These fields are useful for investigating the anomaly and for running anomalous spike detection on several numeric variables or running other algorithms together.

The suggested usage in cybersecurity context is running the function on meaningful numeric variables (amounts of downloaded data, counts of uploaded files or failed sign in attempts) per meaningful scopes (such as subscription on accounts) and entities (such as users or devices). A detected anomalous spike means that the numeric value is higher than what is expected on that scope or entity, and might be suspicious.