Serialization and deserialization formats in data processor pipelines
Important
Azure IoT Operations Preview – enabled by Azure Arc is currently in PREVIEW. You shouldn't use this preview software in production environments.
You will need to deploy a new Azure IoT Operations installation when a generally available release is made available, you won't be able to upgrade a preview installation.
See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.
The data processor is a data agnostic platform. The data processor can ingest, process, and write out data in any format.
However, to use jq path expressions in some pipeline stages, the data must be in a structured format within a pipeline. You may need to deserialize your data to get it into a suitable structured format.
Some pipeline destinations or call outs from stages may require the data to be in a specific format. You may need to serialize your data to a suitable format for the destination.
Deserialize messages
The data processor natively supports deserialization of various formats at both the data source stage and the call out stages where the pipeline reads external data:
- The source stage can deserialize incoming data.
- The call out stages can deserialize the API response.
You may not need to deserialize incoming data if:
- You're not using the stages that require deserialized data.
- You're processing metadata only.
- The incoming data is already in a format that's consistent with the stages being used.
Following table lists the formats for which deserialization is supported and the corresponding stages.
Format | Data source | Call out |
---|---|---|
Raw | Supported | HTTP |
JSON | Supported | HTTP |
Protobuf | Supported | All (HTTP and gRPC) |
CSV | Supported | HTTP |
MessagePack | Supported | HTTP |
CBOR | Supported | HTTP |
Tip
Select Raw
when you don't require deserialization. The Raw
option passes the data through in it's current format.
Serialize messages
The data processor natively supports serialization to various formats at both the destination and call out stages where the pipeline writes external data:
- The destination stage can serialize outgoing data to suitable format.
- Call out stages can serialize the data sent in an API request.
Format | Call out | Output stage |
---|---|---|
Raw |
HTTP | All except Microsoft Fabric |
JSON |
HTTP | All except Microsoft Fabric |
Parquet |
Not supported | Microsoft Fabric |
Protobuf |
All | All except Microsoft Fabric |
CSV |
HTTP | All except Microsoft Fabric |
MessagePack |
HTTP | All except Microsoft Fabric |
CBOR |
HTTP | All except Microsoft Fabric |
Tip
Select Raw
when no serialization is required. The Raw
option passes the data through in its current format.
Raw/JSON/MessagePack/CBOR data formats
Raw is the option to use when you don't need to deserialize or serialize data. Raw is the default in most stages where deserialization or serialization isn't enforced.
The serialization or deserialization configuration is common for the Raw
, JSON
, MessagePack
, and CBOR
formats. For these formats, use the following configuration options.
Use the following configuration options to deserialize data:
Field | Type | Description | Required? | Default | Example |
---|---|---|---|---|---|
type |
string enum |
The format for deserialization | No | - | JSON |
path |
Path | The path to the portion of the data processor message where the deserialized data is written to. | (see following note) | .payload |
.payload.response |
Note
You don't need to specify path
when you deserialize data in the source stage. The deserialized data is automatically placed in the .payload
section of the message.
Use the following configuration options to serialize data:
Field | Type | Description | Required? | Default | Example |
---|---|---|---|---|---|
type |
string enum |
The format for serialization | Yes | - | JSON |
path |
Path | The path to the portion of the data processor message that should be serialized. | (see following note) | .payload |
.payload.response |
Note
You don't need to specify path
when you serialize batched data. The default path is .
, which represents the entire message. For unbatched data, you must specify path
.
The following example shows the configuration for serializing or deserializing unbatched JSON data:
{
"format": {
"type": "json",
"path": ".payload"
}
}
The following example shows the configuration for deserializing JSON data in the source stage or serializing batched JSON data:
{
"format": {
"type": "json"
}
}
Protocol Buffers data format
Use the following configuration options to deserialize Protocol Buffers (protobuf) data:
Field | Type | Description | Required? | Default | Example |
---|---|---|---|---|---|
type |
string enum |
The format for deserialization | Yes | - | protobuf |
descriptor |
string |
The base64 encoded descriptor for the protobuf definition file(s). | Yes | - | Zm9v.. |
package |
string |
The name of the package in the descriptor where the type is defined. | Yes | - | package1.. |
message |
string |
The name of the message type that's used to format the data. | Yes | - | message1.. |
path |
Path | The path to the portion of the data processor message where the deserialized data should be written. | (see following note) | .payload |
.payload.gRPCResponse |
Note
You don't need to specify path
when you deserialize data in the source stage. The deserialized data is automatically placed in the .payload
section of the message.
Use the following configuration options to serialize protobuf data:
Field | Type | Description | Required? | Default | Example |
---|---|---|---|---|---|
type |
string enum |
The format for serialization | Yes | - | protobuf |
descriptor |
string |
The base64 encoded descriptor for the protobuf definition file(s). | Yes | - | Zm9v.. |
package |
string |
The name of the package in the descriptor where the type is defined. | Yes | - | package1.. |
message |
string |
The name of the message type that's used to format the data. | Yes | - | message1.. |
path |
Path | The path to the portion of the data processor message where data to be serialized is read from. | (see following note) | - | .payload.gRPCRequest |
Note
You don't need to specify path
when you serialize batched data. The default path is .
, which represents the entire message.
The following example shows the configuration for serializing or deserializing unbatched protobuf data:
{
"format": {
"type": "protobuf",
"descriptor": "Zm9v..",
"package": "package1",
"message": "message1",
"path": ".payload"
}
}
The following example shows the configuration for deserializing protobuf data in the source stage or serializing batched protobuf data:
{
"format": {
"type": "protobuf",
"descriptor": "Zm9v...", // The full descriptor
"package": "package1",
"message": "message1"
}
}
CSV data format
Use the following configuration options to deserialize CSV data:
Field | Type | Description | Required? | Default | Example |
---|---|---|---|---|---|
type |
string enum |
The format for deserialization | Yes | - | CSV |
header |
boolean |
This field indicates whether the input data has a CSV header row. | Yes | - | true |
columns |
array |
The schema definition of the CSV to read. | Yes | - | (see following table) |
path |
Path | The path to the portion of the data processor message where the deserialized data should be written. | (see following note) | - | .payload |
Note
You don't need to specify path
when you deserialize data in the source stage. The deserialized data is automatically placed in the .payload
section of the message.
Each element in the columns array is an object with the following schema:
Field | Type | Description | Required? | Default | Example |
---|---|---|---|---|---|
name |
string |
The name of the column as it appears in the CSV header. | Yes | - | temperature |
type |
string enum |
The data processor data type held in the column that's used to determine how to parse the data. | No | string | integer |
path |
Path | The location within each record of the data where the value of the column should be read from. | No | .{{name}} |
.temperature |
Use the following configuration options to serialize CSV data:
Field | Type | Description | Required? | Default | Example |
---|---|---|---|---|---|
type |
string enum |
The format for serialization | Yes | - | CSV |
header |
boolean |
This field indicates whether to include the header line with column names in the serialized CSV. | Yes | - | true |
columns |
array |
The schema definition of the CSV to write. | Yes | - | (see following table) |
path |
Path | The path to the portion of the data processor message where data to be serialized is written. | (see following note) | - | .payload |
Note
You don't need to specify path
when you serialize batched data. The default path is .
, which represents the entire message.
Field | Type | Description | Required? | Default | Example |
---|---|---|---|---|---|
name |
string |
The name of the column as it would appear in a CSV header. | Yes | - | temperature |
path |
Path | The location within each record of the data where the value of the column should be written to. | No | .{{name}} |
.temperature |
The following example shows the configuration for serializing unbatched CSV data:
{
"format": {
"type": "csv",
"header": true,
"columns": [
{
"name": "assetId",
"path": ".assetId"
},
{
"name": "timestamp",
"path": ".eventTime"
},
{
"name": "temperature",
// Path is optional, defaults to the name
}
],
"path": ".payload"
}
}
The following example shows the configuration for serializing batched CSV data. Omit the top-level path
for batched data:
{
"format": {
"type": "csv",
"header": true,
"columns": [
{
"name": "assetId",
"path": ".assetId"
},
{
"name": "timestamp",
"path": ".eventTime"
},
{
"name": "temperature",
// Path is optional, defaults to .temperature
}
]
}
}
The following example shows the configuration for deserializing unbatched CSV data:
{
"format": {
"type": "csv",
"header": false,
"columns": [
{
"name": "assetId",
"type": "string",
"path": ".assetId"
},
{
"name": "timestamp",
// Type is optional, defaults to string
"path": ".eventTime"
},
{
"name": "temperature",
"type": "float"
// Path is optional, defaults to .temperature
}
],
"path": ".payload"
}
}
The following example shows the configuration for deserializing batched CSV data in the source stage:
{
"format": {
"type": "csv",
"header": false,
"columns": [
{
"name": "assetId",
"type": "string",
"path": ".assetId"
},
{
"name": "timestamp",
// Type is optional, defaults to string
"path": ".eventTime"
},
{
"name": "temperature",
"type": "float",
// Path is optional, defaults to .temperature
}
]
}
}