Serialization and deserialization formats in data processor pipelines

Article
09/07/2023

Important

Azure IoT Operations Preview – enabled by Azure Arc is currently in PREVIEW. You shouldn't use this preview software in production environments.

You will need to deploy a new Azure IoT Operations installation when a generally available release is made available, you won't be able to upgrade a preview installation.

See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

The data processor is a data agnostic platform. The data processor can ingest, process, and write out data in any format.

However, to use jq path expressions in some pipeline stages, the data must be in a structured format within a pipeline. You may need to deserialize your data to get it into a suitable structured format.

Some pipeline destinations or call outs from stages may require the data to be in a specific format. You may need to serialize your data to a suitable format for the destination.

Deserialize messages

The data processor natively supports deserialization of various formats at both the data source stage and the call out stages where the pipeline reads external data:

The source stage can deserialize incoming data.
The call out stages can deserialize the API response.

You may not need to deserialize incoming data if:

You're not using the stages that require deserialized data.
You're processing metadata only.
The incoming data is already in a format that's consistent with the stages being used.

Following table lists the formats for which deserialization is supported and the corresponding stages.

Format	Data source	Call out
Raw	Supported	HTTP
JSON	Supported	HTTP
Protobuf	Supported	All (HTTP and gRPC)
CSV	Supported	HTTP
MessagePack	Supported	HTTP
CBOR	Supported	HTTP

Tip

Select Raw when you don't require deserialization. The Raw option passes the data through in it's current format.

Serialize messages

The data processor natively supports serialization to various formats at both the destination and call out stages where the pipeline writes external data:

The destination stage can serialize outgoing data to suitable format.
Call out stages can serialize the data sent in an API request.

Format	Call out	Output stage
`Raw`	HTTP	All except Microsoft Fabric
`JSON`	HTTP	All except Microsoft Fabric
`Parquet`	Not supported	Microsoft Fabric
`Protobuf`	All	All except Microsoft Fabric
`CSV`	HTTP	All except Microsoft Fabric
`MessagePack`	HTTP	All except Microsoft Fabric
`CBOR`	HTTP	All except Microsoft Fabric

Tip

Select Raw when no serialization is required. The Raw option passes the data through in its current format.

Raw/JSON/MessagePack/CBOR data formats

Raw is the option to use when you don't need to deserialize or serialize data. Raw is the default in most stages where deserialization or serialization isn't enforced.

The serialization or deserialization configuration is common for the Raw, JSON, MessagePack, and CBOR formats. For these formats, use the following configuration options.

Use the following configuration options to deserialize data:

Field	Type	Description	Required?	Default	Example
`type`	`string enum`	The format for deserialization	No	-	`JSON`
`path`	Path	The path to the portion of the data processor message where the deserialized data is written to.	(see following note)	`.payload`	`.payload.response`

Note

You don't need to specify path when you deserialize data in the source stage. The deserialized data is automatically placed in the .payload section of the message.

Use the following configuration options to serialize data:

Field	Type	Description	Required?	Default	Example
`type`	`string enum`	The format for serialization	Yes	-	`JSON`
`path`	Path	The path to the portion of the data processor message that should be serialized.	(see following note)	`.payload`	`.payload.response`

Note

You don't need to specify path when you serialize batched data. The default path is ., which represents the entire message. For unbatched data, you must specify path.

The following example shows the configuration for serializing or deserializing unbatched JSON data:

{
    "format": {
        "type": "json",
        "path": ".payload"
    }
}

The following example shows the configuration for deserializing JSON data in the source stage or serializing batched JSON data:

{
    "format": {
        "type": "json"
    }
}

Protocol Buffers data format

Use the following configuration options to deserialize Protocol Buffers (protobuf) data:

Field	Type	Description	Required?	Default	Example
`type`	`string enum`	The format for deserialization	Yes	-	`protobuf`
`descriptor`	`string`	The base64 encoded descriptor for the protobuf definition file(s).	Yes	-	`Zm9v..`
`package`	`string`	The name of the package in the descriptor where the type is defined.	Yes	-	`package1..`
`message`	`string`	The name of the message type that's used to format the data.	Yes	-	`message1..`
`path`	Path	The path to the portion of the data processor message where the deserialized data should be written.	(see following note)	`.payload`	`.payload.gRPCResponse`

Note

You don't need to specify path when you deserialize data in the source stage. The deserialized data is automatically placed in the .payload section of the message.

Use the following configuration options to serialize protobuf data:

Field	Type	Description	Required?	Default	Example
`type`	`string enum`	The format for serialization	Yes	-	`protobuf`
`descriptor`	`string`	The base64 encoded descriptor for the protobuf definition file(s).	Yes	-	`Zm9v..`
`package`	`string`	The name of the package in the descriptor where the type is defined.	Yes	-	`package1..`
`message`	`string`	The name of the message type that's used to format the data.	Yes	-	`message1..`
`path`	Path	The path to the portion of the data processor message where data to be serialized is read from.	(see following note)	-	`.payload.gRPCRequest`

Note

You don't need to specify path when you serialize batched data. The default path is ., which represents the entire message.

The following example shows the configuration for serializing or deserializing unbatched protobuf data:

{
    "format": {
        "type": "protobuf",
        "descriptor": "Zm9v..",
        "package": "package1",
        "message": "message1",
        "path": ".payload"
    }
}

The following example shows the configuration for deserializing protobuf data in the source stage or serializing batched protobuf data:

{
    "format": {
        "type": "protobuf",
        "descriptor": "Zm9v...", // The full descriptor
        "package": "package1",
        "message": "message1"
    }
}

CSV data format

Use the following configuration options to deserialize CSV data:

Field	Type	Description	Required?	Default	Example
`type`	`string enum`	The format for deserialization	Yes	-	`CSV`
`header`	`boolean`	This field indicates whether the input data has a CSV header row.	Yes	-	`true`
`columns`	`array`	The schema definition of the CSV to read.	Yes	-	(see following table)
`path`	Path	The path to the portion of the data processor message where the deserialized data should be written.	(see following note)	-	`.payload`

Note

You don't need to specify path when you deserialize data in the source stage. The deserialized data is automatically placed in the .payload section of the message.

Each element in the columns array is an object with the following schema:

Field	Type	Description	Required?	Default	Example
`name`	`string`	The name of the column as it appears in the CSV header.	Yes	-	`temperature`
`type`	`string enum`	The data processor data type held in the column that's used to determine how to parse the data.	No	string	`integer`
`path`	Path	The location within each record of the data where the value of the column should be read from.	No	`.{{name}}`	`.temperature`

Use the following configuration options to serialize CSV data:

Field	Type	Description	Required?	Default	Example
`type`	`string enum`	The format for serialization	Yes	-	`CSV`
`header`	`boolean`	This field indicates whether to include the header line with column names in the serialized CSV.	Yes	-	`true`
`columns`	`array`	The schema definition of the CSV to write.	Yes	-	(see following table)
`path`	Path	The path to the portion of the data processor message where data to be serialized is written.	(see following note)	-	`.payload`

Note

You don't need to specify path when you serialize batched data. The default path is ., which represents the entire message.

Field	Type	Description	Required?	Default	Example
`name`	`string`	The name of the column as it would appear in a CSV header.	Yes	-	`temperature`
`path`	Path	The location within each record of the data where the value of the column should be written to.	No	`.{{name}}`	`.temperature`

The following example shows the configuration for serializing unbatched CSV data:

{
    "format": {
        "type": "csv",
        "header": true,
        "columns": [
            {
                "name": "assetId",
                "path": ".assetId"
            },
            {
                "name": "timestamp",
                "path": ".eventTime"
            },
            {
                "name": "temperature",
                // Path is optional, defaults to the name
            }
        ],
        "path": ".payload"
    }
}

The following example shows the configuration for serializing batched CSV data. Omit the top-level path for batched data:

{
    "format": {
        "type": "csv",
        "header": true,
        "columns": [
            {
                "name": "assetId",
                "path": ".assetId"
            },
            {
                "name": "timestamp",
                "path": ".eventTime"
            },
            {
                "name": "temperature",
                // Path is optional, defaults to .temperature
            }
        ]
    }
}

The following example shows the configuration for deserializing unbatched CSV data:

{
    "format": {
        "type": "csv",
        "header": false,
        "columns": [
            {
                "name": "assetId",
                "type": "string",
                "path": ".assetId"
            },
            {
                "name": "timestamp",
                // Type is optional, defaults to string
                "path": ".eventTime"
            },
            {
                "name": "temperature",
                "type": "float"
                // Path is optional, defaults to .temperature
            }
        ],
        "path": ".payload"
    }
}

The following example shows the configuration for deserializing batched CSV data in the source stage:

{
    "format": {
        "type": "csv",
        "header": false,
        "columns": [
            {
                "name": "assetId",
                "type": "string",
                "path": ".assetId"
            },
            {
                "name": "timestamp",
                // Type is optional, defaults to string
                "path": ".eventTime"
            },
            {
                "name": "temperature",
                "type": "float",
                // Path is optional, defaults to .temperature
            }
        ]
    }
}

Share via

Serialization and deserialization formats in data processor pipelines

Deserialize messages

Serialize messages

Raw/JSON/MessagePack/CBOR data formats

Protocol Buffers data format

CSV data format

Additional resources

Share via

Serialization and deserialization formats in data processor pipelines

Deserialize messages

Serialize messages

Raw/JSON/MessagePack/CBOR data formats

Protocol Buffers data format

CSV data format

Related content

Additional resources