Event Hub data connection (Preview)

Azure Event Hubs is a big data streaming platform and event ingestion service. Azure Synapse Data Explorer offers continuous ingestion from customer-managed Event Hubs.

The Event Hub ingestion pipeline transfers events to Azure Synapse Data Explorer in several steps. You first create an Event Hub in the Azure portal. You then create a target table in Azure Synapse Data Explorer into which the data in a particular format, will be ingested using the given ingestion properties. The Event Hub connection needs to know events routing. Data is embedded with selected properties according to the event system properties mapping. Create a connection to Event Hub to create an Event Hub and send events. This process can be managed through the Azure portal, programmatically with C# or Python, or with the Azure Resource Manager template.

For general information about data ingestion in Azure Synapse Data Explorer, see Azure Synapse Data Explorer data ingestion overview.

Data format

  • Data is read from the Event Hub in form of EventData objects.

  • See supported formats.

    Note

    Event Hub doesn't support the .raw format.

  • Data can be compressed using the GZip compression algorithm. Specify Compression in ingestion properties.

    • Data compression isn't supported for compressed formats (Avro, Parquet, ORC).
    • Custom encoding and embedded system properties aren't supported on compressed data.

Ingestion properties

Ingestion properties instruct the ingestion process, where to route the data, and how to process it. You can specify ingestion properties of the events ingestion using the EventData.Properties. You can set the following properties:

Property Description
Table Name (case sensitive) of the existing target table. Overrides the Table set on the Data Connection pane.
Format Data format. Overrides the Data format set on the Data Connection pane.
IngestionMappingReference Name of the existing ingestion mapping to be used. Overrides the Column mapping set on the Data Connection pane.
Compression Data compression, None (default), or GZip compression.
Encoding Data encoding, the default is UTF8. Can be any of .NET supported encodings.
Tags A list of tags to associate with the ingested data, formatted as a JSON array string. There are performance implications when using tags.

Note

Only events enqueued after you create the data connection are ingested.

Events routing

When you set up an Event Hub connection to Azure Synapse Data Explorer cluster, you specify target table properties (table name, data format, compression, and mapping). The default routing for your data is also referred to as static routing. You can also specify target table properties for each event, using event properties. The connection will dynamically route the data as specified in the EventData.Properties, overriding the static properties for this event.

In the following example, set Event Hub details and send weather metric data to table WeatherMetrics. Data is in json format. mapping1 is pre-defined on the table WeatherMetrics.

var eventHubNamespaceConnectionString=<connection_string>;
var eventHubName=<event_hub>;

// Create the data
var metric = new Metric { Timestamp = DateTime.UtcNow, MetricName = "Temperature", Value = 32 }; 
var data = JsonConvert.SerializeObject(metric);

// Create the event and add optional "dynamic routing" properties
var eventData = new EventData(Encoding.UTF8.GetBytes(data));
eventData.Properties.Add("Table", "WeatherMetrics");
eventData.Properties.Add("Format", "json");
eventData.Properties.Add("IngestionMappingReference", "mapping1");
eventData.Properties.Add("Tags", "['mydatatag']");

// Send events
var eventHubClient = EventHubClient.CreateFromConnectionString(eventHubNamespaceConnectionString, eventHubName);
eventHubClient.Send(eventData);
eventHubClient.Close();

Event system properties mapping

System properties store properties that are set by the Event Hubs service, at the time the event is enqueued. The Azure Synapse Data Explorer Event Hub connection will embed the selected properties into the data landing in your table.

Note

  • System properties are supported for json and tabular formats (csv, tsv etc.) and aren't supported on compressed data. When using a non-supported format, the data will still be ingested, but the properties will be ignored.
  • For tabular data, system properties are supported only for single-record event messages.
  • For JSON data, system properties are also supported for multiple-record event messages. In such cases, the system properties are added only to the first record of the event message.
  • For csv mapping, properties are added at the beginning of the record in the order listed in the System properties table.
  • For json mapping, properties are added according to property names in the System properties table.

System properties

Event Hub exposes the following system properties:

Property Data Type Description
x-opt-enqueued-time datetime UTC time when the event was enqueued
x-opt-sequence-number long The logical sequence number of the event within the partition stream of the Event Hub
x-opt-offset string The offset of the event from the Event Hub partition stream. The offset identifier is unique within a partition of the Event Hub stream
x-opt-publisher string The publisher name, if the message was sent to a publisher endpoint
x-opt-partition-key string The partition key of the corresponding partition that stored the event

If you selected Event system properties in the Data Source section of the table, you must include the properties in the table schema and mapping.

Schema mapping examples

Table schema mapping example

If your data includes three columns (Timespan, Metric, and Value) and the properties you include are x-opt-enqueued-time and x-opt-offset, create or alter the table schema by using this command:

    .create-merge table TestTable (TimeStamp: datetime, Metric: string, Value: int, EventHubEnqueuedTime:datetime, EventHubOffset:string)

CSV mapping example

Run the following commands to add data to the beginning of the record. Note ordinal values.

    .create table TestTable ingestion csv mapping "CsvMapping1"
    '['
    '   { "column" : "Timespan", "Properties":{"Ordinal":"2"}},'
    '   { "column" : "Metric", "Properties":{"Ordinal":"3"}},'
    '   { "column" : "Value", "Properties":{"Ordinal":"4"}},'
    '   { "column" : "EventHubEnqueuedTime", "Properties":{"Ordinal":"0"}},'
    '   { "column" : "EventHubOffset", "Properties":{"Ordinal":"1"}}'
    ']'

JSON mapping example

Data is added by using the system properties mapping. Run these commands:

    .create table TestTable ingestion json mapping "JsonMapping1"
    '['
    '    { "column" : "Timespan", "Properties":{"Path":"$.timestamp"}},'
    '    { "column" : "Metric", "Properties":{"Path":"$.metric"}},'
    '    { "column" : "Value", "Properties":{"Path":"$.value"}},'
    '    { "column" : "EventHubEnqueuedTime", "Properties":{"Path":"$.x-opt-enqueued-time"}},'
    '    { "column" : "EventHubOffset", "Properties":{"Path":"$.x-opt-offset"}}'
    ']'

Event Hub connection

Note

For best performance, create all resources in the same region as the Azure Synapse Data Explorer cluster.

Create an Event Hub

If you don't already have one, Create an Event Hub. Connecting to Event Hub can be managed through the Azure portal, programmatically with C# or Python, or with the Azure Resource Manager template.

Note

  • The partition count isn't changeable, so you should consider long-term scale when setting partition count.
  • Consumer group must be unique per consumer. Create a consumer group dedicated to Azure Synapse Data Explorer connection.

Send events

See the sample app that generates data and sends it to an Event Hub.

For an example of how to generate sample data, see Ingest data from Event Hub into Azure Synapse Data Explorer

Set up Geo-disaster recovery solution

Event Hub offers a Geo-disaster recovery solution. Azure Synapse Data Explorer doesn't support Alias Event Hub namespaces. To implement the Geo-disaster recovery in your solution, create two Event Hub data connections: one for the primary namespace and one for the secondary namespace. Azure Synapse Data Explorer will listen to both Event Hub connections.

Note

It's the user's responsibility to implement a failover from the primary namespace to the secondary namespace.

Next steps