basket plugin finds frequent patterns of attributes in the data and returns the patterns that pass a frequency threshold in that data. A pattern represents a subset of the rows that have the same value across one or more columns. The
basket plugin is based on the Apriori algorithm originally developed for basket analysis data mining.
( [Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...]
All arguments are optional, but they must be ordered as above. To indicate that the default value should be used, use the string tilde value - '~'. See examples below.
Threshold - 0.015 < double < 1 [default: 0.05]
Sets the minimal ratio of the rows to be considered frequent. Patterns with a smaller ratio won't be returned.
T | evaluate basket(0.02)
WeightColumn - column_name
Considers each row in the input according to the specified weight. By default, each row has a weight of '1'. The argument must be a name of a numeric column, such as int, long, real. A common use of a weight column, is to take into account sampling or bucketing/aggregation of the data that is already embedded into each row.
T | evaluate basket('~', sample_Count)
MaxDimensions - 1 < int [default: 5]
Sets the maximal number of uncorrelated dimensions per basket, limited by default, to minimize the query runtime.
T | evaluate basket('~', '~', 3)
CustomWildcard - "any_value_per_type"
Sets the wildcard value for a specific type in the result table that will indicate that the current pattern doesn't have a restriction on this column. Default is null. The default for a string is an empty string. If the default is a good value in the data, a different wildcard value should be used, such as
T | evaluate basket('~', '~', '~', '*', int(-1), double(-1), long(0), datetime(1900-1-1))
basket plugin returns frequent patterns that pass a ratio threshold. The default threshold is 0.05.
Each pattern is represented by a row in the results. The first column is the segment ID. The next two columns are the count and percentage of rows, from the original query that match the pattern. The remaining columns relate to the original query, with either a specific value from the column or a wildcard value, which is by default null, meaning a variable value.
Notes The algorithm uses sampling to determine the initial frequent values. Therefore, the results could slightly differ between multiple runs for patterns whose frequency is close to the threshold.
StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State, EventType, Damage, DamageCrops | evaluate basket(0.2)
Example with custom wildcards
StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State, EventType, Damage, DamageCrops | evaluate basket(0.2, '~', '~', '*', int(-1))
Submit and view feedback for