Create an Event Hubs data connection for Azure Data Explorer
Article
Azure Data Explorer offers ingestion from Event Hubs, a big data streaming platform and event ingestion service. Event Hubs can process millions of events per second in near real time.
In this article, you connect to an event hub and ingest data into Azure Data Explorer. For an overview on ingesting from Event Hubs, see Azure Event Hubs data connection.
For code samples based on previous SDK versions, see the archived article.
Create an event hub data connection
In this section, you establish a connection between the event hub and your Azure Data Explorer table. As long as this connection is in place, data is transmitted from the event hub into your target table. If the event hub is moved to a different resource or subscription, you need to update or recreate the connection.
Right-click on the database where you want to ingest the data. Select Get data.
Source
In the Get data window, the Source tab is selected.
Select the data source from the available list. In this example, you're ingesting data from Event Hubs.
Configure
Select a target database and table. If you want to ingest data into a new table, select + New table and enter a table name.
Note
Table names can be up to 1024 characters including spaces, alphanumeric, hyphens, and underscores. Special characters aren't supported.
Fill in the following fields:
Setting
Field description
Subscription
The subscription ID where the event hub resource is located.
Event hub namespace
The name that identifies your namespace.
Event hub
The event hub you wish to
Consumer group
The consumer group defined in your event
Data connection name
The name that identifies your data connection.
Advanced filters
Compression
The compression type of the event hub messages payload.
Event system properties
The event hub system properties. If there are multiple records per event message, the system properties are added to the first one. When adding system properties, create or update table schema and mapping to include the selected properties.
Event retrieval start date
The data connection retrieves existing Event Hubs events created after the Event retrieval start date. Only events retained by Event Hubs's retention period can be retrieved. If the Event retrieval start date isn't specified, the default time is the time at which the data connection is created.
Select Next
Inspect
The Inspect tab opens with a preview of the data.
To complete the ingestion process, select Finish.
Optionally:
If the data you see in the preview window isn't complete, you might need more data to create a table with all necessary data fields. Use the following commands to fetch new data from your event hub:
Discard and fetch new data: Discards the data presented and searches for new events.
Fetch more data: Searches for more events in addition to the events already found.
Note
To see a preview of your data, your event hub must be sending events.
Select Command viewer to view and copy the automatic commands generated from your inputs.
Use the Schema definition file dropdown to change the file that the schema is inferred from.
For tabular formats (CSV, TSV, PSV), you can't map a column twice. To map to an existing column, first delete the new column.
You can't change an existing column type. If you try to map to a column having a different format, you may end up with empty columns.
The changes you can make in a table depend on the following parameters:
Table type is new or existing
Mapping type is new or existing
Table type
Mapping type
Available adjustments
New table
New mapping
Rename column, change data type, change data source, mapping transformation, add column, delete column
Existing table
New mapping
Add column (on which you can then change data type, rename, and update)
Existing table
Existing mapping
none
Mapping transformations
Some data format mappings (Parquet, JSON, and Avro) support simple ingest-time transformations. To apply mapping transformations, create or update a column in the Edit columns window.
Mapping transformations can be performed on a column of type string or datetime, with the source having data type int or long. Supported mapping transformations are:
DateTimeFromUnixSeconds
DateTimeFromUnixMilliseconds
DateTimeFromUnixMicroseconds
DateTimeFromUnixNanoseconds
Advanced options based on data type
Tabular (CSV, TSV, PSV):
If you're ingesting tabular formats in an existing table, you can select Advanced > Keep current table schema. Tabular data doesn't necessarily include the column names that are used to map source data to the existing columns. When this option is checked, mapping is done by-order, and the table schema remains the same. If this option is unchecked, new columns are created for incoming data, regardless of data structure.
To use the first row as column names, select Advanced > First row is column header.
JSON:
To determine column division of JSON data, select Advanced > Nested levels, from 1 to 100.
If you select Advanced > Ignore data format errors, the data is ingested in JSON format. If you leave this check box unselected, the data is ingested in multijson format.
Summary
In the Data preparation window, all three steps are marked with green check marks when data ingestion finishes successfully. You can view the commands that were used for each step, or select a card to query, visualize, or drop the ingested data.
In the Azure portal, go to your cluster and select Databases. Then, select the database that contains your target table.
From the left menu, select Data ingestion. Then, in the top bar, select Add data connection.
Fill out the form with the following information, and then select Create.
Setting
Suggested value
Field description
Data connection name
test-hub-connection
The name of the connection you want to create in Azure Data Explorer.
Subscription
The subscription ID where the event hub resource is located.
Event hub namespace
A unique namespace name
The name you chose earlier that identifies your namespace.
Event hub
test-hub
The event hub you created.
Consumer group
test-group
The consumer group defined in the event hub you created.
Event system properties
Select relevant properties
The event hub system properties. If there are multiple records per event message, the system properties are added to the first record. When adding system properties, create or update table schema and mapping to include the selected properties.
Compression
None
The compression type of the event hub messages payload. Supported compression types: None, gzip.
Managed Identity (recommended)
System-assigned
The managed identity used by the Data Explorer cluster for access to read from the event hub. We recommend using managed identities to control access to your event hub.
Note: When the data connection is created: * System-assigned identities are automatically created if they don't exist * The managed identity is automatically assigned the Azure Event Hubs Data Receiver role and is added to your Data Explorer cluster. We recommend verifying that the role was assigned and that the identity was added to the cluster.
Note
If you have an existing data connection that is not using managed identities, we recommend updating it to use managed identities.
The cluster and event hub should be associated with the same tenants. If not, use one of the SDK options, such as C# or Python.
The Ingest data side pane opens with the Destination tab selected. Select the Cluster and Database fields from the drop-downs. Make sure you select a cluster that is running. Otherwise, you won't be able to select Database and proceed with the ingestion process.
Under Table, select New table and enter a name for the new table. Alternatively, use an Existing table.
Select Next: Source.
Under Source type, the Event Hub type and details are autopopulated based on the Event Hubs Instance that you started from.
Under Data Connection, fill in the following fields and select Next: Schema.
Setting
Suggested value
Field description
Subscription
The subscription ID where the event hub resource is located.
Event hub namespace
The name that identifies your namespace.
Event hub
The event hub you wish to use.
Data connection name
TestDataConnection
The name that identifies your data connection.
Consumer group
The consumer group defined in your event hub.
Compression
The compression type of the event hub messages payload.
Event system properties
Select relevant properties
The event hub system properties. If there are multiple records per event message, the system properties are added to the first one. When adding system properties, create or update table schema and mapping to include the selected properties.
Event retrieval start date
Coordinated Universal Time (UTC)
The data connection retrieves existing Event Hubs events created after the Event retrieval start date. Only events retained by Event Hubs's retention period can be retrieved. If the Event retrieval start date isn't specified, the default time is the time at which the data connection is created.
If streaming is enabled for the cluster, you can select Streaming ingestion. If streaming isn't enabled for the cluster, set the Data batching latency. For Event Hubs, the recommended batching time is 30 seconds.
Select the Data format. For CSV-formatted data, Ignore the first record to ignore the heading row of the file. For JSON-formatted data, select Ignore data format errors to ingest the data in JSON format or leave unselected to ingest the data in multijson format. Select the Nested levels to determine the table column data division.
If the data you see in the preview window isn't complete, you might need more data to create a table with all necessary data fields. Use the following commands to fetch new data from your event hub:
Discard and fetch new data: discards the data presented and searches for new events.
Fetch more data: Searches for more events in addition to the events already found.
Note
To see a preview of your data, your event hub must be sending events.
Select Next: Summary.
In the Continuous ingestion from Event Hub established window, all steps are marked with green check marks when establishment finishes successfully.
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"namespaces_eventhubns_name": {
"type": "string",
"defaultValue": "eventhubns",
"metadata": {
"description": "Specifies the Event Hubs Namespace name."
}
},
"EventHubs_eventhubdemo_name": {
"type": "string",
"defaultValue": "eventhubdemo",
"metadata": {
"description": "Specifies the event hub name."
}
},
"consumergroup_default_name": {
"type": "string",
"defaultValue": "$Default",
"metadata": {
"description": "Specifies the consumer group of the event hub."
}
},
"Clusters_kustocluster_name": {
"type": "string",
"defaultValue": "kustocluster",
"metadata": {
"description": "Specifies the name of the cluster"
}
},
"databases_kustodb_name": {
"type": "string",
"defaultValue": "kustodb",
"metadata": {
"description": "Specifies the name of the database"
}
},
"tables_kustotable_name": {
"type": "string",
"defaultValue": "kustotable",
"metadata": {
"description": "Specifies the name of the table"
}
},
"mapping_kustomapping_name": {
"type": "string",
"defaultValue": "kustomapping",
"metadata": {
"description": "Specifies the name of the mapping rule"
}
},
"dataformat_type": {
"type": "string",
"defaultValue": "csv",
"metadata": {
"description": "Specifies the data format"
}
},
"databaseRouting_type": {
"type": "string",
"defaultValue": "Single",
"metadata": {
"description": "The database routing for the connection. If you set the value to **Single**, the data connection will be routed to a single database in the cluster as specified in the *databaseName* setting. If you set the value to **Multi**, you can override the default target database using the *Database* EventData property."
}
},
"dataconnections_kustodc_name": {
"type": "string",
"defaultValue": "kustodc",
"metadata": {
"description": "Name of the data connection to create"
}
},
"subscriptionId": {
"type": "string",
"defaultValue": "[subscription().subscriptionId]",
"metadata": {
"description": "Specifies the subscriptionId of the event hub"
}
},
"resourceGroup": {
"type": "string",
"defaultValue": "[resourceGroup().name]",
"metadata": {
"description": "Specifies the resourceGroup of the event hub"
}
},
"location": {
"type": "string",
"defaultValue": "[resourceGroup().location]",
"metadata": {
"description": "Location for all resources."
}
}
},
"variables": {
},
"resources": [{
"type": "Microsoft.Kusto/Clusters/Databases/DataConnections",
"apiVersion": "2022-02-01",
"name": "[concat(parameters('Clusters_kustocluster_name'), '/', parameters('databases_kustodb_name'), '/', parameters('dataconnections_kustodc_name'))]",
"location": "[parameters('location')]",
"kind": "EventHub",
"properties": {
"managedIdentityResourceId": "[resourceId('Microsoft.Kusto/clusters', parameters('clusters_kustocluster_name'))]",
"eventHubResourceId": "[resourceId(parameters('subscriptionId'), parameters('resourceGroup'), 'Microsoft.EventHub/namespaces/eventhubs', parameters('namespaces_eventhubns_name'), parameters('EventHubs_eventhubdemo_name'))]",
"consumerGroup": "[parameters('consumergroup_default_name')]",
"tableName": "[parameters('tables_kustotable_name')]",
"mappingRuleName": "[parameters('mapping_kustomapping_name')]",
"dataFormat": "[parameters('dataformat_type')]",
"databaseRouting": "[parameters('databaseRouting_type')]"
}
}
]
}
Learn to use Azure Event Hubs to reliably process high-volume data streams to enable you to code applications to send and receive messages through the hub.
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.