Data connectors overview

2024-07-31

Data ingestion is the process used to load data from one or more sources into a Real-Time Intelligence KQL database in Microsoft Fabric. Once ingested, the data becomes available for query. Real-Time Intelligence provides several connectors for data ingestion.

The following table summarizes the available data connectors, tools, and integrations.

Name	Functionality	Supports streaming?	Type	Use cases
Apache Flink	Ingestion	✔️	Open source	Telemetry
Apache Kafka	Ingestion	✔️	Open source	Logs, Telemetry, Time series
Apache Log4J 2	Ingestion	✔️	Open source	Logs
Apache Spark	Export Ingestion		Open source	Telemetry
Apache Spark for Azure Synapse Analytics	Export Ingestion		First party	Telemetry
Azure Data Factory	Export Ingestion		First party	Data orchestration
Azure Event Hubs	Ingestion	✔️	First party	Messaging
Azure Functions	Export Ingestion		First party	Workflow integrations
Azure Stream Analytics	Ingestion	✔️	First party	Event processing
Cribl Stream	Ingestion	✔️	First party	Telemetry, Logs, Metrics, Machine data
Fluent Bit	Ingestion	✔️	Open source	Logs, Metrics, Traces
Logstash	Ingestion		Open source	Logs
NLog	Ingestion	✔️	Open source	Telemetry, Logs, Metrics
Open Telemetry	Ingestion	✔️	Open source	Traces, Metrics, Logs
Power Automate	Export Ingestion		First party	Data orchestration
Serilog	Ingestion	✔️	Open source	Logs
Splunk	Ingestion		Open source	Logs
Splunk Universal Forwarder	Ingestion		Open source	Logs
Telegraf	Ingestion	✔️	Open source	Metrics, Logs

The following table summarizes the available connectors and their capabilities:

Apache Flink

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. The connector implements data sink for moving data across Azure Data Explorer and Flink clusters. Using Azure Data Explorer and Apache Flink, you can build fast and scalable applications targeting data driven scenarios. For example, machine learning (ML), Extract-Transform-Load (ETL), and Log Analytics.

Functionality: Ingestion
Ingestion type supported: Streaming
Use cases: Telemetry
Underlying SDK: Java
Repository: Microsoft Azure - https://github.com/Azure/flink-connector-kusto/
Documentation: Get data from Apache Flink

Apache Kafka

Apache Kafka is a distributed streaming platform for building real-time streaming data pipelines that reliably move data between systems or applications. Kafka Connect is a tool for scalable and reliable streaming of data between Apache Kafka and other data systems. The Kafka Sink serves as the connector from Kafka and doesn't require using code. The connector is gold certified by Confluent and went through comprehensive review and testing for quality, feature completeness, compliance with standards, and for performance.

Functionality: Ingestion
Ingestion type supported: Batching, Streaming
Use cases: Logs, Telemetry, Time series
Underlying SDK: Java
Repository: Microsoft Azure - https://github.com/Azure/kafka-sink-azure-kusto/
Documentation: Get data from Apache Kafka
Community Blog: Kafka ingestion into Azure Data Explorer

Apache Log4J 2

Log4J is a popular logging framework for Java applications maintained by the Apache Foundation. Log4j allows developers to control which log statements are output with arbitrary granularity based on the logger's name, logger level, and message pattern. The Apache Log4J 2 sink allows you to stream your log data to your database, where you can analyze and visualize your logs in real time.

Functionality: Ingestion
Ingestion type supported: Batching, Streaming
Use cases: Logs
Underlying SDK: Java
Repository: Microsoft Azure - https://github.com/Azure/azure-kusto-log4j
Documentation: Get data with the Apache Log4J 2 connector
Community Blog: Getting started with Apache Log4J and Azure Data Explorer

Apache Spark

Apache Spark is a unified analytics engine for large-scale data processing. The Spark connector is an open source project that can run on any Spark cluster. It implements data source and data sink for moving data to or from Spark clusters. Using the Apache Spark connector, you can build fast and scalable applications targeting data driven scenarios. For example, machine learning (ML), Extract-Transform-Load (ETL), and Log Analytics. With the connector, your database becomes a valid data store for standard Spark source and sink operations, such as read, write, and writeStream.

Functionality: Ingestion, Export
Ingestion type supported: Batching, Streaming
Use cases: Telemetry
Underlying SDK: Java
Repository: Microsoft Azure - https://github.com/Azure/azure-kusto-spark/
Documentation: Apache Spark connector
Community Blog: Data preprocessing for Azure Data Explorer for Azure Data Explorer with Apache Spark

Apache Spark for Azure Synapse Analytics

Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big data analytic applications. Apache Spark in Azure Synapse Analytics is one of Microsoft's implementations of Apache Spark in the cloud. You can access a database from Synapse Studio with Apache Spark for Azure Synapse Analytics.

Functionality: Ingestion, Export
Ingestion type supported: Batching
Use cases: Telemetry
Underlying SDK: Java
Documentation: Connect to an Azure Synapse workspace

Azure Data Factory

Azure Data Factory (ADF) is a cloud-based data integration service that allows you to integrate different data stores and perform activities on the data.

Functionality: Ingestion, Export
Ingestion type supported: Batching
Use cases: Data orchestration
Documentation: Copy data to your database by using Azure Data Factory

Azure Event Hubs

Azure Event Hubs is a big data streaming platform and event ingestion service. You can configure continuous ingestion from customer-managed Event Hubs.

Functionality: Ingestion
Ingestion type supported: Batching, Streaming
Documentation: Azure Event Hubs data connection

Azure Functions

Azure Functions allows you to run serverless code in the cloud on a schedule or in response to an event. With input and output bindings for Azure Functions, you can integrate your database into your workflows to ingest data and run queries against your database.

Functionality: Ingestion, Export
Ingestion type supported: Batching
Use cases: Workflow integrations
Documentation: Integrating Azure Functions using input and output bindings (preview)
Community Blog: Azure Data Explorer (Kusto) Bindings for Azure Functions

Azure Stream Analytics

Azure Stream Analytics is a real-time analytics and complex event-processing engine that's designed to process high volumes of fast streaming data from multiple sources simultaneously.

Functionality: Ingestion
Ingestion type supported: Batching, Streaming
Use cases: Event processing
Documentation: Get data from Azure Stream Analytics

Cribl Stream

Cribl stream is a processing engine that securely collects, processes, and streams machine event data from any source. It allows you to parse and process that data for any destination for analysis.

Functionality: Ingestion
Ingestion type supported: Batching, Streaming
Use cases: Machine data processing including logs, metrics, instrumentation data
Documentation: Get data from Cribl Stream

Fluent Bit

Fluent Bit is an open-source agent that collects logs, metrics, and traces from various sources. It allows you to filter, modify, and aggregate event data before sending it to storage.

Functionality: Ingestion
Ingestion type supported: Batching
Use cases: Logs, Metrics, Traces
Repository: fluent-bit Kusto Output Plugin
Documentation: Get data with Fluent Bit

Logstash

The Logstash plugin enables you to process events from Logstash into a database for later analysis.

Functionality: Ingestion
Ingestion type supported: Batching
Use cases: Logs
Underlying SDK: Java
Repository: Microsoft Azure - https://github.com/Azure/logstash-output-kusto/
Documentation: Get data from Logstash
Community Blog: How to migrate from Elasticsearch to Azure Data Explorer

NLog

NLog is a flexible and free logging platform for various .NET platforms, including .NET standard. NLog allows you to write to several targets, such as a database, file, or console. With NLog, you can change the logging configuration on-the-fly. The NLog sink is a target for NLog that allows you to send your log messages to your database. The plugin provides an efficient way to sink your logs to your cluster.

Functionality: Ingestion
Ingestion type supported: Batching, Streaming
Use cases: Telemetry, Logs, Metrics
Underlying SDK: .NET
Repository: Microsoft Azure - https://github.com/Azure/azure-kusto-nlog-sink
Documentation: Get data with the NLog sink
Community Blog: Getting started with NLog sink and Azure Data Explorer

OpenTelemetry

The OpenTelemetry connector supports ingestion of data from many receivers into your database. It works as a bridge to ingest data generated by OpenTelemetry to your database by customizing the format of the exported data according to your needs.

Functionality: Ingestion
Ingestion type supported: Batching, Streaming
Use cases: Traces, Metrics, Logs
Underlying SDK: Go
Repository: OpenTelemetry - https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/azuredataexplorerexporter
Documentation: Get data from OpenTelemetry
Community Blog: Getting started with Open Telemetry and Azure Data Explorer

Power Automate

Power Automate is an orchestration service used to automate business processes. The Power Automate (previously Microsoft Flow) connector enables you to orchestrate and schedule flows, send notifications, and alerts, as part of a scheduled or triggered task.

Functionality: Ingestion, Export
Ingestion type supported: Batching
Use cases: Data orchestration
Documentation: Microsoft Power Automate connector

Serilog

Serilog is a popular logging framework for .NET applications. Serilog allows developers to control which log statements are output with arbitrary granularity based on the logger's name, logger level, and message pattern. The Serilog sink, also known as an appender, streams your log data to your database, where you can analyze and visualize your logs in real time.

Functionality: Ingestion
Ingestion type supported: Batching, Streaming
Use cases: Logs
Underlying SDK: .NET
Repository: Microsoft Azure - https://github.com/Azure/serilog-sinks-azuredataexplorer
Documentation: Get data from Serilog
Community Blog: Getting started with Serilog sink and Azure Data Explorer

Splunk

Splunk Enterprise is a software platform that allows you to ingest data from many sources simultaneously. The Azure Data Explorer add-on sends data from Splunk to a table in your cluster.

Functionality: Ingestion
Ingestion type supported: Batching
Use cases: Logs
Underlying SDK: Python
Repository: Microsoft Azure - https://github.com/Azure/azure-kusto-splunk/tree/main/splunk-adx-alert-addon
Documentation: Get data from Splunk
Splunk Base: Microsoft Fabric Add-On for Splunk
Community Blog: Getting started with Microsoft Azure Data Explorer Add-On for Splunk

Splunk Universal Forwarder

Functionality: Ingestion
Ingestion type supported: Batching
Use cases: Logs
Repository: Microsoft Azure - https://github.com/Azure/azure-kusto-splunk
Documentation: Get data from Splunk Universal Forwarder to Azure Data Explorer
Community Blog: Get data using Splunk Universal forwarder into Azure Data Explorer

Telegraf

Telegraf is an open source, lightweight, minimal memory foot print agent for collecting, processing, and writing telemetry data including logs, metrics, and IoT data. Telegraf supports hundreds of input and output plugins. It's widely used and well supported by the open source community. The output plugin serves as the connector from Telegraf and supports ingestion of data from many types of input plugins into your database.

Functionality: Ingestion
Ingestion type supported: Batching, Streaming
Use cases: Telemetry, Logs, Metrics
Underlying SDK: Go
Repository: InfluxData - https://github.com/influxdata/telegraf/tree/master/plugins/outputs/azure_data_explorer
Documentation: Get data from Telegraf
Community Blog: New Azure Data Explorer output plugin for Telegraf enables SQL monitoring at huge scale