Configure dataflows in Azure IoT Operations

Important

Azure IoT Operations Preview – enabled by Azure Arc is currently in preview. You shouldn't use this preview software in production environments.

You'll need to deploy a new Azure IoT Operations installation when a generally available release is made available. You won't be able to upgrade a preview installation.

See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

A dataflow is the path that data takes from the source to the destination with optional transformations. You can configure the dataflow by creating a Dataflow custom resource or using the Azure IoT Operations Studio portal. A dataflow is made up of three parts: the source, the transformation, and the destination.

Diagram of a dataflow showing flow from source to transform then destination.

To define the source and destination, you need to configure the dataflow endpoints. The transformation is optional and can include operations like enriching the data, filtering the data, and mapping the data to another field.

This article shows you how to create a dataflow with an example, including the source, transformation, and destination.

Prerequisites

Create dataflow

Once you have dataflow endpoints, you can use them to create a dataflow. Recall that a dataflow is made up of three parts: the source, the transformation, and the destination.

To create a dataflow in the operations experience portal, select Dataflow > Create dataflow.

Screenshot using operations experience portal to create a dataflow.

Review the following sections to learn how to configure the operation types of the dataflow.

Configure a source with a dataflow endpoint to get data

To configure a source for the dataflow, specify the endpoint reference and data source. You can specify a list of data sources for the endpoint.

Use Asset as a source

You can use an asset as the source for the dataflow. This is only available in the operations experience portal.

  1. Under Source details, select Asset.

  2. Select the asset you want to use as the source endpoint.

  3. Select Proceed.

    A list of datapoints for the selected asset is displayed.

    Screenshot using operations experience portal to select an asset as the source endpoint.

  4. Select Apply to use the asset as the source endpoint.

Use MQTT as a source

  1. Under Source details, select MQTT.

  2. Enter the MQTT Topic that you want to listen to for incoming messages.

  3. Choose a Message schema from the dropdown list or upload a new schema. If the source data has optional fields or fields with different types, specify a deserialization schema to ensure consistency. For example, the data might have fields that aren't present in all messages. Without the schema, the transformation can't handle these fields as they would have empty values. With the schema, you can specify default values or ignore the fields.

    Screenshot using operations experience portal to select MQTT as the source endpoint.

  4. Select Apply.

Configure transformation to process data

The transformation operation is where you can transform the data from the source before you send it to the destination. Transformations are optional. If you don't need to make changes to the data, don't include the transformation operation in the dataflow configuration. Multiple transformations are chained together in stages regardless of the order in which they're specified in the configuration. The order of the stages is always

  1. Enrich: Add additional data to the source data given a dataset and condition to match.
  2. Filter: Filter the data based on a condition.
  3. Map: Move data from one field to another with an optional conversion.

In the operations experience portal, select Dataflow > Add transform (optional).

Screenshot using operations experience portal to add a transform to a dataflow.

Enrich: Add reference data

To enrich the data, you can use the reference dataset in the Azure IoT Operations distributed state store (DSS). The dataset is used to add extra data to the source data based on a condition. The condition is specified as a field in the source data that matches a field in the dataset.

Key names in the distributed state store correspond to a dataset in the dataflow configuration.

Currently, the enrich operation isn't available in the operations experience portal.

You can load sample data into the DSS by using the DSS set tool sample.

For more information about condition syntax, see Enrich data by using dataflows and Convert data using dataflows.

Filter: Filter data based on a condition

To filter the data on a condition, you can use the filter stage. The condition is specified as a field in the source data that matches a value.

  1. Under Transform (optional), select Filter > Add.

  2. Choose the datapoints to include in the dataset.

  3. Add a filter condition and description.

    Screenshot using operations experience portal to add a filter transform.

  4. Select Apply.

Map: Move data from one field to another

To map the data to another field with optional conversion, you can use the map operation. The conversion is specified as a formula that uses the fields in the source data.

In the operations experience portal, mapping is currently supported using Compute transforms.

  1. Under Transform (optional), select Compute > Add.

  2. Enter the required fields and expressions.

    Screenshot using operations experience portal to add a compute transform.

  3. Select Apply.

To learn more, see Map data by using dataflows and Convert data by using dataflows.

Serialize data according to a schema

If you want to serialize the data before sending it to the destination, you need to specify a schema and serialization format. Otherwise, the data is serialized in JSON with the types inferred. Remember that storage endpoints like Microsoft Fabric or Azure Data Lake require a schema to ensure data consistency.

Specify the Output schema when you add the destination dataflow endpoint.

Supported serialization formats are JSON, Parquet, and Delta.

Configure destination with a dataflow endpoint to send data

To configure a destination for the dataflow, specify the endpoint reference and data destination. You can specify a list of data destinations for the endpoint which are MQTT or Kafka topics.

  1. Select the dataflow endpoint to use as the destination.

    Screenshot using operations experience portal to select Event Hubs destination endpoint.

  2. Select Proceed to configure the destination.

  3. Add the mapping details based on the type of destination.

Verify a dataflow is working

Follow Tutorial: Bi-directional MQTT bridge to Azure Event Grid to verify the dataflow is working.

Export dataflow configuration

To export the dataflow configuration, you can use the operations experience portal or by exporting the Dataflow custom resource.

Select the dataflow you want to export and select Export from the toolbar.

Screenshot using operations experience portal to export a dataflow.