scan operator
Scans data, matches, and builds sequences based on the predicates.
Matching records are determined according to predicates defined in the operator’s steps. A predicate can depend on the state that is generated by previous steps. The output for the matching record is determined by the input record and assignments defined in the operator's steps.
Steps are evaluated from last to first, according to the scan logic.
T
| sort by Timestamp asc
| scan with
(
step s1 output=last: Event == "Start";
step s2: Event != "Start" and Event != "Stop" and Timestamp - s1.Timestamp <= 5m;
step s3: Event == "Stop" and Ts - s1.Timestamp <= 5m;
)
Syntax
T | scan
[ with_match_id
=
MatchIdColumnName ] [ declare
(
ColumnDeclarations )
] with
(
StepDefinitions )
ColumnDeclarations syntax
ColumnName :
ColumnType[=
DefaultValue ] [,
... ]
StepDefinition syntax
step
StepName [ output
= all
| last
| none
] :
Condition [ =>
Column =
Assignment [,
... ] ] ;
Parameters
Name | Type | Required | Description |
---|---|---|---|
MatchIdColumnName | string | The name of a column of type long that is appended to the output as part of the scan execution. Indicates the 0-based index of the match for the row. |
|
ColumnDeclarations | string | Declares an extension to the schema of the operator’s source. Additional columns are assigned in the steps or DefaultValue if not assigned. DefaultValue is null if not specified. |
|
StepName | string | ✓ | Used to reference values in the state of scan for conditions and assignments. The step name must be unique. |
Condition | string | ✓ | An expression that evaluates to a bool , true or false , that defines which records from the input matches the step. A record matches the step when the condition is true with the step’s state or with the previous step’s state. |
Assignment | string | A scalar expression that is assigned to the corresponding column when a record matches a step. | |
output |
string | Controls the output logic of the step on repeated matches. all outputs all records matching the step, last outputs only the last record in a series of repeating matches for the step, and none does not output records matching the step. The default is all . |
Returns
A record for each match of a record from the input to a step. The schema of the output is the schema of the source extended with the column in the declare
clause.
Examples
Cumulative sum
Calculate the cumulative sum for an input column. The result of this example is equivalent to using row_cumsum().
range x from 1 to 5 step 1
| scan declare (cumulative_x:long=0) with
(
step s1: true => cumulative_x = x + s1.cumulative_x;
)
Output
x | cumulative_x |
---|---|
1 | 1 |
2 | 3 |
3 | 6 |
4 | 10 |
5 | 15 |
Cumulative sum on multiple columns with a reset condition
Calculate the cumulative sum for two input column, reset the sum value to the current row value whenever the cumulative sum reached 10 or more.
range x from 1 to 5 step 1
| extend y = 2 * x
| scan declare (cumulative_x:long=0, cumulative_y:long=0) with
(
step s1: true => cumulative_x = iff(s1.cumulative_x >= 10, x, x + s1.cumulative_x),
cumulative_y = iff(s1.cumulative_y >= 10, y, y + s1.cumulative_y);
)
Output
x | y | cumulative_x | cumulative_y |
---|---|---|---|
1 | 2 | 1 | 2 |
2 | 4 | 3 | 6 |
3 | 6 | 6 | 12 |
4 | 8 | 10 | 8 |
5 | 10 | 5 | 18 |
Fill forward a column
Fill forward a string column. Each empty value is assigned the last seen non-empty value.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "",
2m, "B",
3m, "",
4m, "",
6m, "C",
8m, "",
11m, "D",
12m, ""
]
;
Events
| sort by Ts asc
| scan declare (Event_filled: string="") with
(
step s1: true => Event_filled = iff(isempty(Event), s1.Event_filled, Event);
)
Output
Ts | Event | Event_filled |
---|---|---|
00:00:00 | A | A |
00:01:00 | A | |
00:02:00 | B | B |
00:03:00 | B | |
00:04:00 | B | |
00:06:00 | C | C |
00:08:00 | C | |
00:11:00 | D | D |
00:12:00 | D |
Sessions tagging
Divide the input into sessions: a session ends 30 minutes after the first event of the session, after which a new session starts. Note the use of with_match_id
flag which assigns a unique value for each distinct match (session) of scan. Also note the special use of two steps in this example, inSession
has true
as condition so it captures and outputs all the records from the input while endSession
captures records that happen more than 30m from the sessionStart
value for the current match. The endSession
step has output=none
meaning it doesn't produce output records. The endSession
step is used to advance the state of the current match from inSession
to endSession
, allowing a new match (session) to begin, starting from the current record.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "A",
2m, "B",
3m, "D",
32m, "B",
36m, "C",
38m, "D",
41m, "E",
75m, "A"
]
;
Events
| sort by Ts asc
| scan with_match_id=session_id declare (sessionStart: timespan) with
(
step inSession: true => sessionStart = iff(isnull(inSession.sessionStart), Ts, inSession.sessionStart);
step endSession output=none: Ts - inSession.sessionStart > 30m;
)
Output
Ts | Event | sessionStart | session_id |
---|---|---|---|
00:00:00 | A | 00:00:00 | 0 |
00:01:00 | A | 00:00:00 | 0 |
00:02:00 | B | 00:00:00 | 0 |
00:03:00 | D | 00:00:00 | 0 |
00:32:00 | B | 00:32:00 | 1 |
00:36:00 | C | 00:32:00 | 1 |
00:38:00 | D | 00:32:00 | 1 |
00:41:00 | E | 00:32:00 | 1 |
01:15:00 | A | 01:15:00 | 2 |
Events between Start and Stop
Find all sequences of events between the event Start
and the event Stop
that occur within 5 minutes. Assign a match ID for each sequence.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "Start",
2m, "B",
3m, "D",
4m, "Stop",
6m, "C",
8m, "Start",
11m, "E",
12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with
(
step s1: Event == "Start";
step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)
Output
Ts | Event | m_id |
---|---|---|
00:01:00 | Start | 0 |
00:02:00 | B | 0 |
00:03:00 | D | 0 |
00:04:00 | Stop | 0 |
00:08:00 | Start | 1 |
00:11:00 | E | 1 |
00:12:00 | Stop | 1 |
Calculate a custom funnel of events
Calculate a funnel completion of the sequence Hail
-> Tornado
-> Thunderstorm Wind
by State
with custom thresholds on the times between the events (Tornado
within 1h
and Thunderstorm Wind
within 2h
). This example is similar to the funnel_sequence_completion plugin, but allows greater flexibility.
StormEvents
| partition hint.strategy=native by State
(
sort by StartTime asc
| scan with
(
step hail: EventType == "Hail";
step tornado: EventType == "Tornado" and StartTime - hail.StartTime <= 1h;
step thunderstormWind: EventType == "Thunderstorm Wind" and StartTime - tornado.StartTime <= 2h;
)
)
| summarize dcount(State) by EventType
Output
EventType | dcount_State |
---|---|
Hail | 50 |
Tornado | 34 |
Thunderstorm Wind | 32 |
Scan logic
scan
goes over the serialized input data, record by record, comparing each record against each step’s condition while taking into account the current state of each step.
Scan's state
The state that is used behind the scenes by scan
is a set of records, with the same schema of the output, including source and declared columns.
Each step has its own state, the state of step k has k records in it, where each record in the step’s state corresponds to a step up to k.
For example, if a scan operator has n steps named s_1, s_2, ..., s_n then step s_k would have k records in its state corresponding to s_1, s_2, ..., s_k.
Referencing a value in the state is done in the form StepName.ColumnName. For example, s_2.col1
references column col1
that belongs to step s_2 in the state of s_k.
Matching logic
Each record from the input is evaluated against all of scan’s steps, starting from last to first. When a record r is considered against some step s_k, the following logic is applied:
- If the state of the previous step isn't empty and the record r satisfies the condition of s_k using the state of the previous step s_(k-1), then the following happens:
- The state of s_k is deleted.
- The state of s_(k-1) becomes ("promoted" to be) the state of s_k, and the state of s_(k-1) becomes empty.
- All the assignments of s_k are calculated and extend r.
- The extended r is added to the output (if s_k is defined as
output=all
) and to the state of s_k.
- If r doesn't satisfy the condition of s_k with the state of s_(k-1), r is then checked with the state of s_k. If r satisfies the condition of s_k with the state of s_k, the following happens:
- The record r is extended with the assignments of s_k.
- If s_k is defined as
output=all
, the extended record r is added to the output. - The last record in the state of s_k (which represents s_k itself in the state) is replaced by the extended record r.
- Whenever the first step is matched while its state is empty, a new match begins and the match ID is increased by
1
. This only affects the output whenwith_match_id
is used.
- If r doesn't satisfy the condition s_k with the state s_k, evaluate r against condition s_k-1 and repeat the logic above.