Redaguoti

Bendrinti naudojant


Azure Event Hubs data connection

Azure Event Hubs is a big data streaming platform and event ingestion service. Azure Data Explorer offers continuous ingestion from customer-managed Event Hubs.

The Event Hubs ingestion pipeline transfers events to Azure Data Explorer in several steps. First you create an event hub in the Azure portal. Then create a target table in Azure Data Explorer into which the data in a particular format, is ingested using the provided ingestion properties. The Event Hubs connection needs to be aware of events routing. Data may be embedded with selected properties according to the event system properties. Create a connection to Event Hubs to create an event hub and send events. This process can be managed through the Azure portal, programmatically with C# or Python, or with the Azure Resource Manager template.

For general information about data ingestion in Azure Data Explorer, see Azure Data Explorer data ingestion overview.

Azure Data Explorer data connection authentication options

  • Managed Identity based data connection (recommended): Using a managed identity-based data connection is the most secure way to connect to data sources. It provides full control over the ability to fetch data from a data source. Setup of a data connection using managed identity requires the following steps:

    1. Add a managed identity to your cluster.
    2. Grant permissions to the managed identity on the data source. To fetch data from Azure Event Hubs, the managed identity must have Azure Event Hubs Data Receiver permissions.
    3. Set a managed identity policy on the target databases.
    4. Create a data connection using the managed identity authentication to fetch data.

    Caution

    If the managed identity permissions are removed from the data source, the data connection will no longer work and will be unable to fetch data from the data source.

  • Key-based data connection: If a managed identity authentication is not specified for the data connection, the connection automatically defaults to key-based authentication. Key-based connections fetch data using a resource connection string, such as the Azure Event Hubs connection string. Azure Data Explorer gets the resource connection string for the specified resource and securely saves it. The connection string is then used to fetch data from the data source.

    Caution

    If the key is rotated, the data connection will no longer work and will be unable to fetch data from the data source. To fix the issue, update or recreate the data connection.

Data format

Note

  • Ingestion from Event Hubs doesn't support RAW format.
  • Azure Event Hubs Schema Registry and schema-less Avro are not supported.
  • Data can be compressed using the gzip compression algorithm. You can specify Compression dynamically using ingestion properties, or in the static Data Connection settings.
  • Data compression isn't supported for binary formats (Avro, ApacheAvro, Parquet, ORC, and W3CLOGFILE).
  • Custom encoding and embedded system properties aren't supported with binary formats and compressed data.
  • When using binary formats (Avro, ApacheAvro, Parquet, ORC, and W3CLOGFILE) and ingestion mappings, the order of the fields in the ingestion mapping definition must match the order of the corresponding columns in the table.

Event Hubs properties

Azure Data Explorer supports the following Event Hubs properties:

Note

Ingesting Event Hubs custom properties, used to associate metadata with events, isn't supported. If you need to ingest custom properties, send them in the body of the event data. For more information, see Ingest custom properties.

Ingestion properties

Ingestion properties instruct the ingestion process, where to route the data, and how to process it. You can specify ingestion properties of the events ingestion using the EventData.Properties. You can set the following properties:

Note

Property names are case sensitive.

Property Description
Database The case-sensitive name of the target database. By default, data is ingested into the target database associated with the data connection. Use this property to override the default database and send data to a different database. To do so, you must first set up the connection as a multi-database connection.
Table The case-sensitive name of the existing target table. Overrides the Table set on the Data Connection pane.
Format Data format. Overrides the Data format set on the Data Connection pane.
IngestionMappingReference Name of the existing ingestion mapping to be used. Overrides the Column mapping set on the Data Connection pane.
Compression Data compression, None (default), or gzip.
Encoding Data encoding, the default is UTF8. Can be any of .NET supported encodings.
Tags A list of tags to associate with the ingested data, formatted as a JSON array string. There are performance implications when using tags.
RawHeaders Indicates that event source is Kafka and Azure Data Explorer must use byte array deserialization to read other routing properties. Value is ignored.

Note

Only events enqueued after you create the data connection are ingested, unless a custom retrieval start date is provided. In any case, the lookback period cannot exceed the actual Event Hub retention period.

Events routing

When you create a data connection to your cluster, you can specify the routing for where to send ingested data. The default routing is to the target table specified in the connection string that is associated with the target database. The default routing for your data is also referred to as static routing. You can specify an alternative routing and processing options for your data by setting one or more of event data properties mentioned in the previous paragraph.

Note

Event Hubs data connection will attempt to process all the events it reads from the Event Hub, and every event it cannot process for whatever reason will be reported as an ingestion failure. Read on how to monitor Azure Data Explorer ingestion here.

Route event data to an alternate database

Routing data to an alternate database is off by default. To send the data to a different database, you must first set the connection as a multi-database connection. This feature can be enabled in the Azure portal Azure portal, with C# or Python management SDKs, or with an ARM template. The user, group, service principal, or managed identity used to allow database routing must at least have the contributor role and write permissions on the cluster.

To specify an alternate database, set the Database ingestion property.

Warning

Specifying an alternate database without setting the connection as a multi-database data connection will cause the ingestion to fail.

Route event data to an alternate table

To specify an alternate table for each event, set the Table, Format, Compression, and mapping ingestion properties. The connection dynamically routes the ingested data as specified in the EventData.Properties, overriding the static properties for this event.

The following example shows you how to set the event hub details and send weather metric data to alternate database (MetricsDB) and table (WeatherMetrics). The data is in JSON format and mapping1 is pre-defined on table WeatherMetrics.

// This sample uses Azure.Messaging.EventHubs which is a .Net Framework library.
await using var producerClient = new EventHubProducerClient("<eventHubConnectionString>");
// Create the event and add optional "dynamic routing" properties
var eventData = new EventData(Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(
    new { TimeStamp = DateTime.UtcNow, MetricName = "Temperature", Value = 32 }
)));
eventData.Properties.Add("Database", "MetricsDB");
eventData.Properties.Add("Table", "WeatherMetrics");
eventData.Properties.Add("Format", "json");
eventData.Properties.Add("IngestionMappingReference", "mapping1");
eventData.Properties.Add("Tags", "['myDataTag']");
var events = new[] { eventData };
// Send events
await producerClient.SendAsync(events);

Event Hubs system properties mapping

System properties are fields set by the Event Hubs service, at the time the event is enqueued. Azuer Data Explorer Event Hubs data connection can embed a predefined set of system properties into the data ingested into a table based on a given mapping.

Note

  • Embedding system properties is supported for json and tabular formats (i.e. JSON, MultiJSON, CSV, TSV, PSV, SCsv, SOHsv, TSVE).
  • When using a non-supported format (i.e. TXT or compressed formats like Parquet, Avro etc.) the data will still be ingested, but the properties will be ignored.
  • Embedding system properties is not supported when a compression of Event Hub messages is set. In such scenarios, an appropriate error will be emitted and the data will not be ingested.
  • For tabular data, system properties are supported only for single-record event messages.
  • For json data, system properties are also supported for multiple-record event messages. In such cases, the system properties are added only to the first record of the event message.
  • For CSV mapping, properties are added at the beginning of the record in the order listed in the creation of the data connection. Don't rely on the order of these properties, as it may change in the future.
  • For JSON mapping, properties are added according to property names in the System properties table.

Event Hubs service exposes the following system properties:

Property Data Type Description
x-opt-enqueued-time datetime UTC time when the event was enqueued
x-opt-sequence-number long The logical sequence number of the event within the partition stream of the event hub
x-opt-offset string The offset of the event from the event hub partition stream. The offset identifier is unique within a partition of the event hub stream
x-opt-publisher string The publisher name, if the message was sent to a publisher endpoint
x-opt-partition-key string The partition key of the corresponding partition that stored the event

When you work with IoT Central event hubs, you can also embed IoT Hub system properties in the payload. For the complete list, see IoT Hub system properties.

If you selected Event system properties in the Data Source section of the table, you must include the properties in the table schema and mapping.

Schema mapping examples

Table schema mapping example

If your data includes three columns (TimeStamp, MetricName, and Value) and the properties you include are x-opt-enqueued-time and x-opt-offset, create or alter the table schema by using this command:

    .create-merge table TestTable (TimeStamp: datetime, MetricName: string, Value: int, EventHubEnqueuedTime:datetime, EventHubOffset:string)

CSV mapping example

Run the following commands to add data to the beginning of the record. Note ordinal values.

    .create table TestTable ingestion csv mapping "CsvMapping1"
    '['
    '   { "column" : "TimeStamp", "Properties":{"Ordinal":"2"}},'
    '   { "column" : "MetricName", "Properties":{"Ordinal":"3"}},'
    '   { "column" : "Value", "Properties":{"Ordinal":"4"}},'
    '   { "column" : "EventHubEnqueuedTime", "Properties":{"Ordinal":"0"}},'
    '   { "column" : "EventHubOffset", "Properties":{"Ordinal":"1"}}'
    ']'

JSON mapping example

Data is added by using the system properties mapping. Run these commands:

    .create table TestTable ingestion json mapping "JsonMapping1"
    '['
    '    { "column" : "TimeStamp", "Properties":{"Path":"$.TimeStamp"}},'
    '    { "column" : "MetricName", "Properties":{"Path":"$.MetricName"}},'
    '    { "column" : "Value", "Properties":{"Path":"$.Value"}},'
    '    { "column" : "EventHubEnqueuedTime", "Properties":{"Path":"$.x-opt-enqueued-time"}},'
    '    { "column" : "EventHubOffset", "Properties":{"Path":"$.x-opt-offset"}}'
    ']'

Schema mapping for Event Hubs Capture Avro files

One way to consume Event Hubs data is to capture events through Azure Event Hubs in Azure Blob Storage or Azure Data Lake Storage. You can then ingest the capture files as they are written using an Event Grid Data Connection in Azure Data Explorer.

The schema of the capture files is different from the schema of the original event sent to Event Hubs. You should design the destination table schema with this difference in mind. Specifically, the event payload is represented in the capture file as a byte array, and this array isn't automatically decoded by the Event Grid Azure Data Explorer data connection. For more information on the file schema for Event Hubs Avro capture data, see Exploring captured Avro files in Azure Event Hubs.

To correctly decode the event payload:

  1. Map the Body field of the captured event to a column of type dynamic in the destination table.
  2. Apply an update policy that converts the byte array into a readable string using the unicode_codepoints_to_string() function.

Ingest custom properties

When ingesting events from Event Hubs, data is taken from the body section of the event data object. However, Event Hubs custom properties are defined in the properties section of the object and are not ingested. To ingest customer properties, you must embed them into the data in body section of the object.

The following example compares the events data object containing custom property customProperty as defined by Event Hubs (left) with the embedded property required for ingestion (right).

{
"body":{
"value": 42
},
"properties":{
"customProperty": "123456789"
}
}
{
"body":{
"value": 42,
"customProperty": "123456789"
}
}

You can use one of the following methods to embed custom properties into the data in body section of the event data object:

Create Event Hubs

If you don't already have one, Create an event hub. Connecting to event hub can be managed through the Azure portal, programmatically with C# or Python, or with the Azure Resource Manager template.

Note

  • The ability to dynamically add partitions after creating an event hub is only available with Event Hubs Premium and Dedicated tiers. Consider the long-term scale when setting partition count.
  • Consumer group must be unique per consumer. Create a consumer group dedicated to Azure Data Explorer connection.

Cross-region Event Hubs data connection

For best performance, create the event hub in the same region as the cluster. If this is not possible, consider using Premium or Dedicated Event Hubs tiers. For a comparison of tiers, see Compare Azure Event Hubs tiers.

Send events

See the sample app that generates data and sends it to an event hub.

Note

To enable efficient processing of events from Event Hubs to Azure Data Explorer, avoid an unbalanced distribution of events across partitions. Uneven mapping can cause a high discovery latency. For more information, see Mapping of events to partitions.

Set up Geo-disaster recovery solution

Event hub offers a Geo-disaster recovery solution. Azure Data Explorer doesn't support Alias event hub namespaces. To implement the Geo-disaster recovery in your solution, create two event hub data connections: one for the primary namespace and one for the secondary namespace. Azure Data Explorer listens to both event hub connections.

Note

It's the user's responsibility to implement a failover from the primary namespace to the secondary namespace.