Configure an HTTP endpoint source stage in a data processor pipeline
Important
Azure IoT Operations Preview – enabled by Azure Arc is currently in PREVIEW. You shouldn't use this preview software in production environments.
You will need to deploy a new Azure IoT Operations installation when a generally available release is made available, you won't be able to upgrade a preview installation.
See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.
The source stage is the first and required stage in a data processor pipeline. The source stage gets data into the data processing pipeline and prepares it for further processing. The HTTP endpoint source stage lets you read data from an HTTP endpoint at a user-defined interval. The stage has an optional request body and receives a response from the endpoint.
In the source stage, you define:
- Connection details to the HTTP endpoint.
- The interval at which to call the HTTP endpoint. The stage waits for a response before it resets the interval timer.
- A partitioning configuration based on your specific data processing requirements.
Prerequisites
- A deployed instance of the data processor that includes the optional data processor component.
- An HTTP endpoint with all necessary raw data available is operational and reachable.
Configure the HTTP endpoint source
To configure the HTTP endpoint source:
- Provide details of the HTTP endpoint. This configuration includes the method, URL and request payload to use.
- Specify the authentication method. Currently limited to username/password-based or header-based authentication.
The following table describes the HTTP endpoint source configuration parameters:
Field | Type | Description | Required | Default | Example |
---|---|---|---|---|---|
Name | String | A customer-visible name for the source stage. | Required | NA | erp-endpoint |
Description | String | A customer-visible description of the source stage. | Optional | NA | Enterprise application data |
Method | Enum | The HTTP method to use for the requests. One of GET or POST |
Optional | GET |
GET |
URL | String | The URL to use for the requests. Both http and https are supported. |
Required | NA | https://contoso.com/some/url/path |
Authentication | Authentication type | The authentication method for the HTTP request. One of: None , Username/Password , or Header . |
Optional | NA |
Username/Password |
Username/Password > Username | String | The username for the username/password authentication | Yes | NA | myuser |
Username/Password > Secret | Reference to the password stored in Azure Key Vault. | Yes | Yes | AKV_USERNAME_PASSWORD |
|
Header > Key | String | The name of the key for header-based authentication. | Yes | NA | Authorization |
Header > Value | String | The credential name in Azure Key Vault for header-based authentication. | Yes | NA | AKV_PASSWORD |
Data format | Format | Data format of the incoming data | Required | NA | {"type": "json"} |
API request > Request Body | String | The static request body to send with the HTTP request. | Optional | NA | {"foo": "bar"} |
API request > Headers | Key/Value pairs | The static request headers to send with the HTTP request. | Optional | NA | [{"key": {"type":"static", "value": "asset"}, "value": {"type": "static", "value": "asset-id-0"}} ] |
Request interval | Duration | String representation of the time to wait before the next API call. | Required | 10s |
24h |
Partitioning | Partitioning | Partitioning configuration for the source stage. | Required | NA | See partitioning |
To learn more about secrets, see Manage secrets for your Azure IoT Operations Preview deployment.
Select data format
In a data processor pipeline, the format field in the source stage specifies how to deserialize the incoming data. By default, the data processor pipeline uses the raw
format that means it doesn't convert the incoming data. To use many data processor features such as Filter
or Enrich
stages in a pipeline, you must deserialize your data in the input stage. You can choose to deserialize your incoming data from JSON
, jsonStream
, MessagePack
, CBOR
, CSV
, or Protobuf
formats into a data processor readable message in order to use the full data processor functionality.
The following tables describe the different deserialization configuration options:
Field | Description | Required | Default | Value |
---|---|---|---|---|
Data Format | The type of the data format. | Yes | Raw |
Raw JSON jsonStream MessagePack CBOR CSV Protobuf |
The Data Format
field is mandatory and its value determines the other required fields.
To deserialize CSV messages, you also need to specify the following fields:
Field | Description | Required | Value | Example |
---|---|---|---|---|
Header | Whether the CSV data includes a header line. | Yes | Yes No |
No |
Name | Name of the column in CSV | Yes | - | temp , asset |
Path | The jq path in the message where the column information is added. | No | - | The default jq path is the column name |
Data Type | The data type of the data in the column and how it's represented inside the data processor pipeline. | No | String , Float , Integer , Boolean , Bytes |
Default: String |
To deserialize Protobuf messages, you also need to specify the following fields:
Field | Description | Required | Value | Example |
---|---|---|---|---|
Descriptor | The base64-encoded descriptor for the protobuf definition. | Yes | - | Zhf... |
Message | The name of the message type that's used to format the data. | Yes | - | pipeline |
Package | The name of the package in the descriptor where the type is defined. | Yes | - | schedulerv1 |
Note
The data processor supports only one message type in each .proto file.
Configure partitioning
Partitioning in a pipeline divides the incoming data into separate partitions. Partitioning enables data parallelism in the pipeline, which can improve throughput and reduce latency. Partitioning strategies affect how the data is processed in the other stages of the pipeline. For example, the last known value stage and aggregate stage operate on each logical partition.
To partition your data, specify a partitioning strategy and the number of partitions to use:
Field | Description | Required | Default | Example |
---|---|---|---|---|
Partition type | The type of partitioning to be used: Partition ID or Partition Key |
Required | ID |
ID |
Partition expression | The jq expression to use on the incoming message to compute the partition ID or partition Key |
Required | 0 |
.payload.header |
Number of partitions | The number of partitions in a data processor pipeline. | Required | 1 |
1 |
The source stage applies the partitioning expression to the incoming message to compute the partition ID
or Key
.
The data processor adds additional metadata to the incoming message. See Data processor message structure overview to understand how to correctly specify the partitioning expression that runs on the incoming message. By default, the partitioning expression is set to 0
with the Partition type as ID
to send all the incoming data to a single partition.
For recommendations and to learn more, see What is partitioning?.