What are jq path expressions in the data processor?
Important
Azure IoT Operations Preview – enabled by Azure Arc is currently in PREVIEW. You shouldn't use this preview software in production environments.
You will need to deploy a new Azure IoT Operations installation when a generally available release is made available, you won't be able to upgrade a preview installation.
See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.
Many pipeline stages in the data processor make use of jq path expressions. Whenever you need to retrieve information from a message or to place some information into a message, you use a path. jq paths let you:
- Locate a piece of information in a message.
- Identify where to place a piece of information into a message.
Both cases use the same syntax and specify locations relative to the root of the message structure.
The jq paths supported by the data processor are syntactically correct for jq, but have simplified semantics to make the them easier to use and to help reduce errors in the data processor pipeline. In particular, the data processor doesn't use the ?
syntax to suppress errors for misaligned data structures. Those errors are automatically suppressed for you when working with paths.
Examples of data access within a data processor pipeline include the inputPath
in the aggregate and last known value stages. Use the data access pattern whenever you need to access some data within a data processor message.
Data update uses the same syntax as data access, but there are some special behaviors in specific update scenarios. Examples of data update within a data processor pipeline include the outputPath
in the aggregate and last known value pipeline stages. Use the data update pattern whenever you need to place the result of an operation into the data processor message.
Note
A data processor message contains more than just the body of your message. A data processor message includes any properties and metadata that you sent and other relevant system information. The primary payload containing the data sent into the processing pipeline is placed in a payload
field at the root of the message. This is why many of the examples in this guide include paths that start with .payload
.
Syntax
Every jq path consists of a sequence of one or more of the following segments:
- The root path:
.
. - A field in a map or object that uses one of:
.<identifier>
for alphanumeric object keys. For example,.temperature
.."<identifier>"
for arbitrary object keys. For example,."asset-id"
.["<identifier>"]
or arbitrary object keys. For example,["asset-id"]
.
- An array index:
[<index>]
. For example,[2]
.
Paths must always start with a .
. Even if you have an array or complex map key at the beginning of your path, there must be a .
that precedes it. The paths .["complex-key"]
and .[1].value
are valid. The paths ["complex-key"]
and [1].value
are invalid.
Example message
The following data access and data update examples use the following message to illustrate the use of different path expressions:
{
"systemProperties": {
"partitionKey": "slicer-3345",
"partitionId": 5,
"timestamp": "2023-01-11T10:02:07Z"
},
"qos": 1,
"topic": "assets/slicer-3345",
"properties": {
"responseTopic": "assets/slicer-3345/output",
"contentType": "application/json"
},
"payload": {
"Timestamp": 1681926048,
"Payload": {
"dtmi:com:prod1:slicer3345:humidity": {
"sourceTimestamp": 1681926048,
"value": 10
},
"dtmi:com:prod1:slicer3345:lineStatus": {
"sourceTimestamp": 1681926048,
"value": [1, 5, 2]
},
"dtmi:com:prod1:slicer3345:speed": {
"sourceTimestamp": 1681926048,
"value": 85
},
"dtmi:com:prod1:slicer3345:temperature": {
"sourceTimestamp": 1681926048,
"value": 46
}
},
"DataSetWriterName": "slicer-3345",
"SequenceNumber": 461092
}
}
Root path for data access
The most basic path is the root path, which points to the root of the message and returns the entire message. Given the following jq path:
.
The result is:
{
"systemProperties": {
"partitionKey": "slicer-3345",
"partitionId": 5,
"timestamp": "2023-01-11T10:02:07Z"
},
"qos": 1,
"topic": "assets/slicer-3345",
"properties": {
"responseTopic": "assets/slicer-3345/output",
"contentType": "application/json"
},
"payload": {
"Timestamp": 1681926048,
"Payload": {
"dtmi:com:prod1:slicer3345:humidity": {
"sourceTimestamp": 1681926048,
"value": 10
},
"dtmi:com:prod1:slicer3345:lineStatus": {
"sourceTimestamp": 1681926048,
"value": [1, 5, 2]
},
"dtmi:com:prod1:slicer3345:speed": {
"sourceTimestamp": 1681926048,
"value": 85
},
"dtmi:com:prod1:slicer3345:temperature": {
"sourceTimestamp": 1681926048,
"value": 46
}
},
"DataSetWriterName": "slicer-3345",
"SequenceNumber": 461092
}
}
Simple identifier for data access
The next simplest path involves a single identifier, in this case the payload
field. Given the following jq path:
.payload
Tip
."payload"
and .["payload"]
are also valid ways to specify this path. However identifiers that only contain a-z
, A-Z
, 0-9
and _
don't require the more complex syntax.
The result is:
{
"Timestamp": 1681926048,
"Payload": {
"dtmi:com:prod1:slicer3345:humidity": {
"SourceTimestamp": 1681926048,
"Value": 10
},
"dtmi:com:prod1:slicer3345:lineStatus": {
"SourceTimestamp": 1681926048,
"Value": [1, 5, 2]
},
"dtmi:com:prod1:slicer3345:speed": {
"SourceTimestamp": 1681926048,
"Value": 85
},
"dtmi:com:prod1:slicer3345:temperature": {
"SourceTimestamp": 1681926048,
"Value": 46
}
},
"DataSetWriterName": "slicer-3345",
"SequenceNumber": 461092
}
Nested fields for data access
You can combine path segments to retrieve data nested deeply within the message, such as a single leaf value. Given either of the following two jq paths:
.payload.Payload.["dtmi:com:prod1:slicer3345:temperature"].Value
.payload.Payload."dtmi:com:prod1:slicer3345:temperature".Value
The result is:
46
Array elements for data access
Array elements work in the same way as map keys, except that you use a number in place a string in the[]
. Given either of the following two jq paths:
.payload.Payload.["dtmi:com:prod1:slicer3345:lineStatus"].Value[1]
.payload.Payload."dtmi:com:prod1:slicer3345:lineStatus".Value[1]
The result is:
5
Nonexistent and invalid paths in data access
If a jq path identifies a location that doesn't exist or is incompatible with the structure of the message, then no value is returned.
Important
Some processing stages require some value to be present and may fail if no value is found. Others are designed to continue processing normally and either skip the requested operation or perform a different action if no value is found at the path.
Given the following jq path:
.payload[1].temperature
The result is:
No value
Root path for data update
The most basic path is the root path, which points to the root of the message and replaces the entire message. Given the following new value to insert and jq path:
{ "update": "data" }
.
The result is:
{ "update": "data" }
Updates aren't deep merged with the previous data, but instead replace the data at the level where the update happens. To avoid overwriting data, scope your update to the finest-grained path you want to change or update a separate field to the side of your primary data.
Simple identifier for data update
The next simplest path involves a single identifier, in this case the payload
field. Given the following new value to insert and jq path:
{ "update": "data" }
.payload
Tip
."payload"
and .["payload"]
are also valid ways to specify this path. However identifiers that only contain a-z
, A-Z
, 0-9
and _
don't require the more complex syntax.
The result is:
{
"systemProperties": {
"partitionKey": "slicer-3345",
"partitionId": 5,
"timestamp": "2023-01-11T10:02:07Z"
},
"qos": 1,
"topic": "assets/slicer-3345",
"properties": {
"responseTopic": "assets/slicer-3345/output",
"contentType": "application/json"
},
"payload": { "update": "data" }
}
Nested fields for data update
You can combine path segments to retrieve data nested deeply within the message, such as a single leaf value. Given the following new value to insert and either of the following two jq paths:
{ "update": "data" }
.payload.Payload.["dtmi:com:prod1:slicer3345:temperature"].Value
.payload.Payload."dtmi:com:prod1:slicer3345:temperature".Value
The result is:
{
"systemProperties": {
"partitionKey": "slicer-3345",
"partitionId": 5,
"timestamp": "2023-01-11T10:02:07Z"
},
"qos": 1,
"topic": "assets/slicer-3345",
"properties": {
"responseTopic": "assets/slicer-3345/output",
"contentType": "application/json"
},
"payload": {
"Timestamp": 1681926048,
"Payload": {
"dtmi:com:prod1:slicer3345:humidity": {
"sourceTimestamp": 1681926048,
"value": 10
},
"dtmi:com:prod1:slicer3345:lineStatus": {
"sourceTimestamp": 1681926048,
"value": [1, 5, 2]
},
"dtmi:com:prod1:slicer3345:speed": {
"sourceTimestamp": 1681926048,
"value": 85
},
"dtmi:com:prod1:slicer3345:temperature": {
"sourceTimestamp": 1681926048,
"value": { "update": "data" }
}
},
"DataSetWriterName": "slicer-3345",
"SequenceNumber": 461092
}
}
Array elements for data update
Array elements work in the same way as map keys, except that you use a number in place a string in the[]
. Given the following new value to insert and either of the following two jq paths:
{ "update": "data" }
.payload.Payload.["dtmi:com:prod1:slicer3345:lineStatus"].Value[1]
.payload.Payload."dtmi:com:prod1:slicer3345:lineStatus".Value[1]
The result is:
{
"systemProperties": {
"partitionKey": "slicer-3345",
"partitionId": 5,
"timestamp": "2023-01-11T10:02:07Z"
},
"qos": 1,
"topic": "assets/slicer-3345",
"properties": {
"responseTopic": "assets/slicer-3345/output",
"contentType": "application/json"
},
"payload": {
"Timestamp": 1681926048,
"Payload": {
"dtmi:com:prod1:slicer3345:humidity": {
"sourceTimestamp": 1681926048,
"value": 10
},
"dtmi:com:prod1:slicer3345:lineStatus": {
"sourceTimestamp": 1681926048,
"value": [1, { "update": "data" }, 2]
},
"dtmi:com:prod1:slicer3345:speed": {
"sourceTimestamp": 1681926048,
"value": 85
},
"dtmi:com:prod1:slicer3345:temperature": {
"sourceTimestamp": 1681926048,
"value": 46
}
},
"DataSetWriterName": "slicer-3345",
"SequenceNumber": 461092
}
}
Nonexistent and type-mismatched paths in data update
If a jq path identifies a location that doesn't exist or is incompatible with the structure of the message, then the following semantics apply:
- If any segments of the path don't exist, they're created:
- For object keys, the key is added to the object.
- For array indexes, the array is lengthened with
null
values to make it long enough to have the required index, then the index is updated. - For negative array indexes, the same lengthening procedure happens, but then the first element is replaced.
- If a path segment has a different type than what it needs, the expression changes the type and discards any existing data at that path location.
The following examples use the same input message as the previous examples and insert the following new value:
{ "update": "data" }
Given the following jq path:
.payload[1].temperature
The result is:
{
"systemProperties": {
"partitionKey": "slicer-3345",
"partitionId": 5,
"timestamp": "2023-01-11T10:02:07Z"
},
"qos": 1,
"topic": "assets/slicer-3345",
"properties": {
"responseTopic": "assets/slicer-3345/output",
"contentType": "application/json"
},
"payload": [null, { "update": "data" }]
}
Given the following jq path:
.payload.nested.additional.data
The result is:
{
"systemProperties": {
"partitionKey": "slicer-3345",
"partitionId": 5,
"timestamp": "2023-01-11T10:02:07Z"
},
"qos": 1,
"topic": "assets/slicer-3345",
"properties": {
"responseTopic": "assets/slicer-3345/output",
"contentType": "application/json"
},
"payload": {
"Timestamp": 1681926048,
"Payload": {
"dtmi:com:prod1:slicer3345:humidity": {
"sourceTimestamp": 1681926048,
"value": 10
},
"dtmi:com:prod1:slicer3345:lineStatus": {
"sourceTimestamp": 1681926048,
"value": [1, 5, 2]
},
"dtmi:com:prod1:slicer3345:speed": {
"sourceTimestamp": 1681926048,
"value": 85
},
"dtmi:com:prod1:slicer3345:temperature": {
"sourceTimestamp": 1681926048,
"value": 46
}
},
"DataSetWriterName": "slicer-3345",
"SequenceNumber": 461092,
"nested": {
"additional": {
"data": { "update": "data" }
}
}
}
}
Given the following jq path:
.systemProperties.partitionKey[-4]
The result is:
{
"systemProperties": {
"partitionKey": [{"update": "data"}, null, null, null],
"partitionId": 5,
"timestamp": "2023-01-11T10:02:07Z"
},
"qos": 1,
"topic": "assets/slicer-3345",
"properties": {
"responseTopic": "assets/slicer-3345/output",
"contentType": "application/json"
},
"payload": {
"Timestamp": 1681926048,
"Payload": {
"dtmi:com:prod1:slicer3345:humidity": {
"sourceTimestamp": 1681926048,
"value": 10
},
"dtmi:com:prod1:slicer3345:lineStatus": {
"sourceTimestamp": 1681926048,
"value": [1, 5, 2]
},
"dtmi:com:prod1:slicer3345:speed": {
"sourceTimestamp": 1681926048,
"value": 85
},
"dtmi:com:prod1:slicer3345:temperature": {
"sourceTimestamp": 1681926048,
"value": 46
}
},
"DataSetWriterName": "slicer-3345",
"SequenceNumber": 461092,
}