Azure Data Explorer (Kusto) connector for Apache Spark

Article
2023/10/12

The Azure Data Explorer (Kusto) connector for Apache Spark is designed to efficiently transfer data between Kusto clusters and Spark. This connector is available in Python, Java, and .NET.

Authentication

When using Azure Synapse Notebooks or Apache Spark job definitions, the authentication between systems is made seamless with the linked service. The Token Service connects with Microsoft Entra ID to obtain security tokens for use when accessing the Kusto cluster.

For Azure Synapse Pipelines, the authentication uses the service principal name. Currently, managed identities aren't supported with the Azure Data Explorer connector.

Prerequisites

Connect to Azure Data Explorer: You need to set up a Linked Service to connect to an existing Kusto cluster.

Limitations

The Azure Data Explorer linked service can only be configured with the Service Principal Name.
Within Azure Synapse Notebooks or Apache Spark Job Definitions, the Azure Data Explorer connector uses Microsoft Entra pass-through to connect to the Kusto Cluster.

Use the Azure Data Explorer (Kusto) connector

The following section provides a simple example of how to write data to a Kusto table and read data from a Kusto table. See the Azure Data Explorer (Kusto) connector project for detailed documentation.

Read data

Python

kustoDf  = spark.read \
            .format("com.microsoft.kusto.spark.synapse.datasource") \
            .option("spark.synapse.linkedService", "<link service name>") \
            .option("kustoDatabase", "<Database name>") \
            .option("kustoQuery", "<KQL Query>") \
            .load()

display(kustoDf)

You can also batch read with forced distribution mode and other advanced options. For additional information, you can refer to Kusto source options reference.

Python

crp = sc._jvm.com.microsoft.azure.kusto.data.ClientRequestProperties()
crp.setOption("norequesttimeout",True)
crp.toString()

kustoDf  = spark.read \
            .format("com.microsoft.kusto.spark.synapse.datasource") \
            .option("spark.synapse.linkedService", "<link service name>") \
            .option("kustoDatabase", "<Database name>") \
            .option("kustoQuery", "<KQL Query>") \
            .option("clientRequestPropertiesJson", crp.toString()) \
            .option("readMode", 'ForceDistributedMode') \
            .load()

display(kustoDf)

Write data

Python

df.write \
    .format("com.microsoft.kusto.spark.synapse.datasource") \
    .option("spark.synapse.linkedService", "<link service name>") \
    .option("kustoDatabase", "<Database name>") \
    .option("kustoTable", "<Table name>") \
    .mode("Append") \
    .save()

In addition, you can also batch write data by providing additional ingestion properties. For more info on the supported ingestion properties, you can visit the Kusto ingestion properties reference material.

Python

extentsCreationTime = sc._jvm.org.joda.time.DateTime.now().plusDays(1)
csvMap = "[{\"Name\":\"ColA\",\"Ordinal\":0},{\"Name\":\"ColB\",\"Ordinal\":1}]"
# Alternatively use an existing csv mapping configured on the table and pass it as the last parameter of SparkIngestionProperties or use none

sp = sc._jvm.com.microsoft.kusto.spark.datasink.SparkIngestionProperties(
        False, ["dropByTags"], ["ingestByTags"], ["tags"], ["ingestIfNotExistsTags"], extentsCreationTime, csvMap, None)

df.write \
    .format("com.microsoft.kusto.spark.synapse.datasource") \
    .option("spark.synapse.linkedService", "<link service name>") \
    .option("kustoDatabase", "<Database name>") \
    .option("kustoTable", "<Table name>") \
    .option("sparkIngestionPropertiesJson", sp.toString()) \
    .option("tableCreateOptions","CreateIfNotExist") \
    .mode("Append") \
    .save()

Next steps

Additional resources

Documentation

Use the Azure Data Explorer connector for Apache Spark to move data between Azure Data Explorer and Spark clusters. - Azure Data Explorer

This topic shows you how to move data between Azure Data Explorer and Apache Spark clusters.
Quickstart: Connect Azure Data Explorer to an Azure Synapse Analytics workspace - Azure Synapse Analytics

Connect an Azure Data Explorer cluster to an Azure Synapse Analytics workspace by using Apache Spark for Azure Synapse Analytics.
Create an external table using the Azure Data Explorer web UI wizard in Azure Data Explorer - Azure Data Explorer

Use the wizard experience to create an external table.
External tables - Kusto

This article describes External tables.
Create and alter Azure Storage external tables - Kusto

This article describes how to create and alter external tables based on Azure Blob Storage or Azure Data Lake
Query SQL external tables - Kusto

This article describes how to query external tables based on SQL tables.
Create a table in Azure Data Explorer - Azure Data Explorer

Learn how to easily create a table and manually define the schema in Azure Data Explorer with the table creation wizard.
Manage external table roles - Kusto

Learn how to use management commands to view, add, and remove external table admins on an external table level.

Training

Learning path

Data analysis in Azure Data Explorer with Kusto Query Language - Training

Learn how to analyze data in Azure Data Explorer using the Kusto Query Language

Certification

Microsoft Certified: Azure Data Engineer Associate - Certifications

Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.

Events

FabCon Vegas

31 Mar, 23 - 02 Apr, 23

The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.

Share via

Azure Data Explorer (Kusto) connector for Apache Spark

Authentication

Prerequisites

Limitations

Use the Azure Data Explorer (Kusto) connector

Read data

Write data

Next steps

Feedback

Additional resources