Process event data with event processor editor

The event processor in a Lakehouse destination allows you to process your data before it's ingested into your lakehouse. The event processor editor is a no-code experience that allows you to drag and drop to design the event data processing logic. This article describes how to use the editor to design your processing logic.

Prerequisites

Before you start, you must complete the following prerequisites:

  • Get access to a premium workspace with Contributor or above permissions where your eventstream is located.
  • Get access to a premium workspace with Contributor or above permissions where your lakehouse is located.

Design the event processing with the editor

To design your event processing with the event processor editor:

  1. Add a Lakehouse destination and enter the necessary parameters in the right pane. (See Add and manage a destination in an eventstream for detailed instructions. )

  2. Select Open event processor. The Event processing editor screen appears.

    Screenshot showing where to select Open event processor in the Lakehouse destination configuration screen.

  3. In the Event processing editor canvas, select the eventstream node. You can preview the data schema or change the data type in the right Eventstream pane.

    Screenshot showing the data schema in the right pane of the Event processing editor screen.

  4. To insert an event processing operator between this eventstream and destination in the event processor editor, you can use one of the following two methods:

    1. Insert the operator directly from the connection line. Hover on the connection line and then select the "+" button. A drop-down menu appears on the connection line, and you can select an operator from this menu.

      Screenshot showing where to hover on connection line to insert a node.

    2. Insert the operator from ribbon menu or canvas.

      1. You can select an operator from the Operations menu in the ribbon. Alternatively, you can hover on one of the nodes and then select the "+" button if you have deleted the connection line. A drop-down menu appears next to that node, and you can select an operator from this menu.

        Screenshot showing where to select an operator from the Operations menu.

        Screenshot showing where to hover on nodes to insert a node.

      2. Finally, you need to reconnect these nodes. Hover on the left edge of the event stream node, and then select and drag the green circle to connect it to the Manage fields operator node. Follow the same process to connect the Manage fields operator node to the lakehouse node.

        Screenshot showing where to connect the nodes.

  5. Select the Manage fields operator node. In the Manage fields configuration panel, select the fields you want to output. If you want to add all fields, select Add all fields. You can also add a new field with the built-in functions to aggregate the data from upstream. (Currently, the built-in functions we support are some functions in String Functions, Date and Time Functions, Mathematical Functions. To find them, search on "built-in.")

    Screenshot showing how to configure the operator.

  6. After you have configured the Manage fields operator, select Refresh static preview to preview the data this operator produces.

    Screenshot showing how to preview data in the event processor editor.

  7. If you have any configuration errors, they appear in the Authoring error tab in the bottom pane.

    Screenshot showing the authoring error tab in event processor editor.

  8. If your previewed data looks correct, select Done to save the event processing logic and return to the Lakehouse destination configuration screen.

  9. Select Add to complete the creation of your lakehouse destination.

Event processor editor

The Event processor enables you to transform the data that you're ingesting into a lakehouse destination. When you configure your lakehouse destination, you find the Open event processor option in the middle of the Lakehouse destination configuration screen.

Screenshot showing where to open the event processor editor.

Selecting Open event processor launches the Event processing editor screen, where you can define your data transformation logic.

The event processor editor includes a canvas and lower pane where you can:

  • Build the event data transformation logic with drag and drop.
  • Preview the data in each of the processing nodes from beginning to end.
  • Discover any authoring errors within the processing nodes.

The screen layout is like the main editor. It consists of three sections, shown in the following image:

Screenshot of the Event processing editor screen, indicating the three main sections.

  1. Canvas with diagram view: In this pane, you can design your data transformation logic by selecting an operator (from the Operations menu) and connecting the eventstream and the destination nodes via the newly created operator node. You can drag and drop connecting lines or select and delete connections.

  2. Right editing pane: This pane allows you to configure the selected operation node or view the schema of the eventstream and destination.

  3. Bottom pane with data preview and authoring error tabs: In this pane, preview the data in a selected node with Data preview tab. The Authoring errors tab lists any incomplete or incorrect configuration in the operation nodes.

Authoring errors

Authoring errors refers to the errors that occur in the Event processor editor due to incomplete or incorrect configuration of the operation nodes, helping you find and fix potential problems in your event processor.

You can view Authoring errors in the bottom panel of the Event processor editor. The bottom panel lists all the authoring errors, each authoring error has four columns:

  • Node ID: Indicates the ID of the operation node where the Authoring error occurred.
  • Node type: Indicates the type of the operation node where the Authoring error occurred.
  • Level: Indicates the severity of the Authoring error, there are two levels, Fatal and Information. Fatal level authoring error means that your event processor has serious problems and can't be saved or run. Information level authoring error means that your event processor has some tips or suggestions that can help you optimize or improve your event processor.
  • Error: Indicates the specific information of the authoring error, briefly describing the cause and impact of the authoring error. You can select the Show details tab to see details.

Since Eventstream and KQL Database support different data types, the process of data type conversion may generate authoring errors.

The following table shows the results of data type conversion from Eventstream to KQL Database. The columns represent the data types supported by Eventstream, and the rows represent the data types supported by KQL Database. The cells indicate the conversion results, which can be one of the following three:

✔️ Indicates successful conversion, no errors or warnings are generated.

❌ Indicates impossible conversion, fatal authoring error is generated. The error message is similar to: The data type "{1}" for the column "{0}" does not match the expected type "{2}" in the selected KQL table. and cannot be auto-converted.

⚠️ Indicates possible but inaccurate conversion, information authoring error is generated. The error message is similar to: The data type "{1}" for the column "{0}" does not exactly match the expected type "{2}" in the selected KQL table. It will be auto-converted to "{2}".

string bool datetime dynamic guid int long real timespan decimal
Int64 ✔️ ⚠️ ✔️ ⚠️ ✔️
Double ✔️ ⚠️ ⚠️
String ✔️ ✔️
Datetime ⚠️ ✔️ ✔️
Record ⚠️ ✔️
Array ⚠️ ✔️

As you can see from the table, some data type conversions are successful, such as string to string. These conversions do not generate any authoring errors, and do not affect the operation of your event processor.

Some data type conversions are impossible, such as int to string. These conversions generate fatal level authoring errors, causing your event processor to fail to save. You need to change your data type either in your Eventstream or in KQL table to avoid these errors.

Some data type conversions are possible, but not precise, such as int to real. These conversions generate information level authoring errors, indicating the mismatch between data types, and the automatic conversion results. These conversions may cause your data to lose precision or structure. You can choose whether to ignore these errors, or modify your data type either in your Eventstream or in KQL table to optimize your event processor.

Transformation operators

The event processor provides six operators, which you can use to transform your event data according to your business needs.

Screenshot showing the operators available to in the Operations menu.

Aggregate

Use the Aggregate transformation to calculate an aggregation (Sum, Minimum, Maximum, or Average) every time a new event occurs over a period of time. This operation also allows for the renaming of these calculated columns, as well as filtering or slicing the aggregation based on other dimensions in your data. You can have one or more aggregations in the same transformation.

Expand

Use the Expand array transformation to create a new row for each value within an array.

Filter

Use the Filter transformation to filter events based on the value of a field in the input. Depending on the data type (number or text), the transformation keeps the values that match the selected condition, such as is null or is not null.

Group by

Use the Group by transformation to calculate aggregations across all events within a certain time window. You can group by the values in one or more fields. It's like the Aggregate transformation allows for the renaming of columns, but provides more options for aggregation and includes more complex options for time windows. Like Aggregate, you can add more than one aggregation per transformation.

The aggregations available in the transformation are:

  • Average
  • Count
  • Maximum
  • Minimum
  • Percentile (continuous and discrete)
  • Standard Deviation
  • Sum
  • Variance

In time-streaming scenarios, performing operations on the data contained in temporal windows is a common pattern. The event processor supports windowing functions, which is integrated with the Group by operator. You can define it in the setting of this operator.

Screenshot showing the Group by operator available in the event processor editor.

Manage fields

The Manage fields transformation allows you to add, remove, change data type or rename fields coming in from an input or another transformation. The side pane settings give you the option of adding a new field by selecting Add field, adding multiple fields, or adding all fields at once.

Screenshot showing the Manage field operator available in the event processor editor.

Furthermore, you can add a new field with the built-in functions to aggregate the data from upstream. (Currently, the built-in functions we support are some functions in String Functions, Date and Time Functions, and Mathematical Functions. To find them, search on "built-in.")

Screenshot showing the Manage field build-in functions.

The following table shows the results of changing the data type using manage fields. The columns represents the original data type, and the rows represents the target data type.

  • If there is a ✔️ in the cell, it means that it can be converted directly and the target data type option is shown in the dropdown list.
  • If there is a ❌ in the cell, it means that it cannot be converted and the target data type option is not shown in the dropdown list.
  • If there is a ⚠️ in the cell, it means that it can be converted, but it needs to meet certain conditions, such as the string format must conform to the requirements of the target data type. For example, when converting from string to int, the string needs to be a valid integer form, such as “123”, not “abc”.
Int64 Double String Datetime Record Array
Int64 ✔️ ✔️ ✔️
Double ✔️ ✔️ ✔️
String ⚠️ ⚠️ ✔️ ⚠️
Datetime ✔️ ✔️
Record ✔️ ✔️
Array ✔️ ✔️

Union

Use the Union transformation to connect two or more nodes and add events that have shared fields (with the same name and data type) into one table. Fields that don't match are dropped and not included in the output.