Redigera

Dela via


scan operator

Applies to: ✅ Microsoft FabricAzure Data ExplorerAzure MonitorMicrosoft Sentinel

Scans data, matches, and builds sequences based on the predicates.

Matching records are determined according to predicates defined in the operator’s steps. A predicate can depend on the state that is generated by previous steps. The output for the matching record is determined by the input record and assignments defined in the operator's steps.

Syntax

T | scan [ with_match_id = MatchIdColumnName ] [ declare ( ColumnDeclarations ) ] with ( StepDefinitions )

ColumnDeclarations syntax

ColumnName : ColumnType[= DefaultValue ] [, ... ]

StepDefinition syntax

step StepName [ output = all | last | none] : Condition [ => Column = Assignment [, ... ] ] ;

Learn more about syntax conventions.

Parameters

Name Type Required Description
T string ✔️ The input tabular source.
MatchIdColumnName string The name of a column of type long that is appended to the output as part of the scan execution. Indicates the 0-based index of the match for the record.
ColumnDeclarations string Declares an extension to the schema of T. These columns are assigned values in the steps. If not assigned, the DefaultValue is returned. Unless otherwise specified, DefaultValue is null.
StepName string ✔️ Used to reference values in the state of scan for conditions and assignments. The step name must be unique.
Condition string ✔️ An expression that evaluates to true or false that defines which records from the input match the step. A record matches the step when the condition is true with the step’s state or with the previous step’s state.
Assignment string A scalar expression that is assigned to the corresponding column when a record matches a step.
output string Controls the output logic of the step on repeated matches. all outputs all records matching the step, last outputs only the last record in a series of repeating matches for the step, and none doesn't output records matching the step. The default is all.

Returns

A record for each match of a record from the input to a step. The schema of the output is the schema of the source extended with the column in the declare clause.

Scan logic

scan goes over the serialized input data, record by record, comparing each record against each step’s condition while taking into account the current state of each step.

State

The underlying state of the scan operator can be thought of as a table with a row for each step. Each step maintains its own state with the latest values of the columns and declared variables from all of the previous steps and the current step. If relevant, it also holds the match ID for the ongoing sequence.

If a scan operator has n steps named s_1, s_2, ..., s_n then step s_k would have k records in its state corresponding to s_1, s_2, ..., s_k. The StepName.ColumnName format is used to reference a value in the state. For instance, s_2.col1 would reference column col1 that belongs to step s_2 in the state of s_k. For a detailed example, see the scan logic walkthrough.

The state starts empty and updates whenever a scanned input record matches a step. When the state of the current step is nonempty, the step is referred to as having an active sequence.

Matching logic

Each input record is evaluated against all of the steps in reverse order, from the last step to the first. When a record r is evaluated against some step s_k, the following logic is applied:

  • Check 1: If the state of the previous step (s_k-1) is nonempty, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:

    1. The state of s_k is cleared.
    2. The state of s_k-1 is promoted to become the state of s_k.
    3. The assignments of s_k are calculated and extend r.
    4. The extended r is added to the output and to the state of s_k.

    Note

    If Check 1 results in a match, Check 2 is disregarded, and r moves on to be evaluated against s_k-1.

  • Check 2: If the state of s_k has an active sequence or s_k is the first step, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:

    1. The assignments of s_k are calculated and extend r.
    2. The values that represent s_k in the state of s_k are replaced with the values of the extended r.
    3. If s_k is defined as output=all, the extended r is added to the output.
    4. If s_k is the first step, a new sequence begins and the match ID increases by 1. This only affects the output when with_match_id is used.

Once the checks for s_k are complete, r moves on to be evaluated against s_k-1.

For a detailed example of this logic, see the scan logic walkthrough.

Examples

Cumulative sum

Calculate the cumulative sum for an input column. The result of this example is equivalent to using row_cumsum().

range x from 1 to 5 step 1 
| scan declare (cumulative_x:long=0) with 
(
    step s1: true => cumulative_x = x + s1.cumulative_x;
)

Output

x cumulative_x
1 1
2 3
3 6
4 10
5 15

Cumulative sum on multiple columns with a reset condition

Calculate the cumulative sum for two input columns, reset the sum value to the current record value whenever the cumulative sum reached 10 or more.

range x from 1 to 5 step 1
| extend y = 2 * x
| scan declare (cumulative_x:long=0, cumulative_y:long=0) with 
(
    step s1: true => cumulative_x = iff(s1.cumulative_x >= 10, x, x + s1.cumulative_x), 
                     cumulative_y = iff(s1.cumulative_y >= 10, y, y + s1.cumulative_y);
)

Output

x y cumulative_x cumulative_y
1 2 1 2
2 4 3 6
3 6 6 12
4 8 10 8
5 10 5 18

Fill forward a column

Fill forward a string column. Each empty value is assigned the last seen nonempty value.

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "",
    2m, "B",
    3m, "",
    4m, "",
    6m, "C",
    8m, "",
    11m, "D",
    12m, ""
]
;
Events
| sort by Ts asc
| scan declare (Event_filled: string="") with 
(
    step s1: true => Event_filled = iff(isempty(Event), s1.Event_filled, Event);
)

Output

Ts Event Event_filled
00:00:00 A A
00:01:00 A
00:02:00 B B
00:03:00 B
00:04:00 B
00:06:00 C C
00:08:00 C
00:11:00 D D
00:12:00 D

Sessions tagging

Divide the input into sessions: a session ends 30 minutes after the first event of the session, after which a new session starts. Note the use of with_match_id flag, which assigns a unique value for each distinct match (session) of scan. Also note the special use of two steps in this example, inSession has true as condition so it captures and outputs all the records from the input while endSession captures records that happen more than 30m from the sessionStart value for the current match. The endSession step has output=none meaning it doesn't produce output records. The endSession step is used to advance the state of the current match from inSession to endSession, allowing a new match (session) to begin, starting from the current record.

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "A",
    2m, "B",
    3m, "D",
    32m, "B",
    36m, "C",
    38m, "D",
    41m, "E",
    75m, "A"
]
;
Events
| sort by Ts asc
| scan with_match_id=session_id declare (sessionStart: timespan) with 
(
    step inSession: true => sessionStart = iff(isnull(inSession.sessionStart), Ts, inSession.sessionStart);
    step endSession output=none: Ts - inSession.sessionStart > 30m;
)

Output

Ts Event sessionStart session_id
00:00:00 A 00:00:00 0
00:01:00 A 00:00:00 0
00:02:00 B 00:00:00 0
00:03:00 D 00:00:00 0
00:32:00 B 00:32:00 1
00:36:00 C 00:32:00 1
00:38:00 D 00:32:00 1
00:41:00 E 00:32:00 1
01:15:00 A 01:15:00 2

Events between Start and Stop

Find all sequences of events between the event Start and the event Stop that occur within 5 minutes. Assign a match ID for each sequence.

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "Start",
    2m, "B",
    3m, "D",
    4m, "Stop",
    6m, "C",
    8m, "Start",
    11m, "E",
    12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with 
(
    step s1: Event == "Start";
    step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
    step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)

Output

Ts Event m_id
00:01:00 Start 0
00:02:00 B 0
00:03:00 D 0
00:04:00 Stop 0
00:08:00 Start 1
00:11:00 E 1
00:12:00 Stop 1

Calculate a custom funnel of events

Calculate a funnel completion of the sequence Hail -> Tornado -> Thunderstorm Wind by State with custom thresholds on the times between the events (Tornado within 1h and Thunderstorm Wind within 2h). This example is similar to the funnel_sequence_completion plugin, but allows greater flexibility.

StormEvents
| partition hint.strategy=native by State 
    (
    sort by StartTime asc
    | scan with 
    (
        step hail: EventType == "Hail";
        step tornado: EventType == "Tornado" and StartTime - hail.StartTime <= 1h;
        step thunderstormWind: EventType == "Thunderstorm Wind" and StartTime - tornado.StartTime <= 2h;
    )
    )
| summarize dcount(State) by EventType

Output

EventType dcount_State
Hail 50
Tornado 34
Thunderstorm Wind 32

Scan logic walkthrough

This section demonstrates the scan logic using a step-by-step walkthrough of the Events between start and stop example:

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "Start",
    2m, "B",
    3m, "D",
    4m, "Stop",
    6m, "C",
    8m, "Start",
    11m, "E",
    12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with 
(
    step s1: Event == "Start";
    step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
    step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)

The state

Think of the state of the scan operator as a table with a row for each step, in which each step has its own state. This state contains the latest values of the columns and declared variables from all of the previous steps and the current step. To learn more, see State.

For this example, the state can be represented with the following table:

step m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 X X
s3

The "X" indicates that a specific field is irrelevant for that step.

The matching logic

This section follows the matching logic through each record of the Events table, explaining the transformation of the state and output at each step.

Note

An input record is evaluated against the steps in reverse order, from the last step (s3) to the first step (s1).

Record 1

Ts Event
0m "A"

Record evaluation at each step:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because s3 lacks an active sequence.
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there's no previous step. Check 2 isn't passed because the record doesn't meet the condition of Event == "Start". Record 1 is discarded without affecting the state or output.

State:

step m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 X X
s3

Record 2

Ts Event
1m "Start"

Record evaluation at each step:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because s3 lacks an active sequence.
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there's no previous step. Check 2 is passed because the record meets the condition of Event == "Start". This match initiates a new sequence, and the m_id is assigned. Record 2 and its m_id (0) are added to the state and the output.

State:

step m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 0 00:01:00 "Start" X X X X
s2 X X
s3

Record 3

Ts Event
2m "B"

Record evaluation at each step:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because s3 lacks an active sequence.
  • s2: Check 1 is passed because the state of s1 is nonempty and the record meets the condition of Ts - s1.Ts < 5m. This match causes the state of s1 to be cleared and the sequence in s1 to be promoted to s2. Record 3 and its m_id (0) are added to the state and the output.
  • s1: Check 1 is irrelevant because there's no previous step, and Check 2 isn't passed because the record doesn't meet the condition of Event == "Start".

State:

step m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 0 00:01:00 "Start" 00:02:00 "B" X X
s3

Record 4

Ts Event
3m "D"

Record evaluation at each step:

  • s3: Check 1 isn't passed because the record doesn't meet the condition of Event == "Stop", and Check 2 isn't passed because s3 lacks an active sequence.
  • s2: Check 1 isn't passed because the state of s1 is empty. it passes Check 2 because it meets the condition of Ts - s1.Ts < 5m. Record 4 and its m_id (0) are added to the state and the output. The values from this record overwrite the previous state values for s2.Ts and s2.Event.
  • s1: Check 1 is irrelevant because there's no previous step, and Check 2 isn't passed because the record doesn't meet the condition of Event == "Start".

State:

step m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 0 00:01:00 "Start" 00:03:00 "D" X X
s3

Record 5

Ts Event
4m "Stop"

Record evaluation at each step:

  • s3: Check 1 is passed because s2 is nonempty and it meets the s3 condition of Event == "Stop". This match causes the state of s2 to be cleared and the sequence in s2 to be promoted to s3. Record 5 and its m_id (0) are added to the state and the output.
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there's no previous step. Check 2 isn't passed because the record doesn't meet the condition of Event == "Start".

State:

step m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 X X
s3 0 00:01:00 "Start" 00:03:00 "D" 00:04:00 "Stop"

Record 6

Ts Event
6m "C"

Record evaluation at each step:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because s3 doesn't meet the s3 condition of Event == "Stop".
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 isn't passed because there's no previous step, and Check 2 isn't passed because it doesn't meet the condition of Event == "Start". Record 6 is discarded without affecting the state or output.

State:

step m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 X X
s3 0 00:01:00 "Start" 00:03:00 "D" 00:04:00 "Stop"

Record 7

Ts Event
8m "Start"

Record evaluation at each step:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because it doesn't meet the condition of Event == "Stop".
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 isn't passed because there's no previous step. it passes Check 2 because it meets the condition of Event == "Start". This match initiates a new sequence in s1 with a new m_id. Record 7 and its m_id (1) are added to the state and the output.

State:

step m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 1 00:08:00 "Start" X X X X
s2 X X
s3 0 00:01:00 "Start" 00:03:00 "D" 00:04:00 "Stop"

Note

There are now two active sequences in the state.

Record 8

Ts Event
11m "E"

Record evaluation at each step:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because it doesn't meet the s3 condition of Event == "Stop".
  • s2: Check 1 is passed because the state of s1 is nonempty and the record meets the condition of Ts - s1.Ts < 5m. This match causes the state of s1 to be cleared and the sequence in s1 to be promoted to s2. Record 8 and its m_id (1) are added to the state and the output.
  • s1: Check 1 is irrelevant because there's no previous step, and Check 2 isn't passed because the record doesn't meet the condition of Event == "Start".

State:

step m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 1 00:08:00 "Start" 00:11:00 "E" X X
s3 0 00:01:00 "Start" 00:03:00 "D" 00:04:00 "Stop"

Record 9

Ts Event
12m "Stop"

Record evaluation at each step:

  • s3: Check 1 is passed because s2 is nonempty and it meets the s3 condition of Event == "Stop". This match causes the state of s2 to be cleared and the sequence in s2 to be promoted to s3. Record 9 and its m_id (1) are added to the state and the output.
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 isn't passed because there's no previous step. it passes Check 2 because it meets the condition of Event == "Start". This match initiates a new sequence in s1 with a new m_id.

State:

step m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 X X
s3 1 00:08:00 "Start" 00:11:00 "E" 00:12:00 "Stop"

Final output

Ts Event m_id
00:01:00 Start 0
00:02:00 B 0
00:03:00 D 0
00:04:00 Stop 0
00:08:00 Start 1
00:11:00 E 1
00:12:00 Stop 1