Azure Event Hubs: A real-time data streaming platform with native Apache Kafka support
Article
Azure Event Hubs is a native data-streaming service in the cloud that can stream millions of events per second, with low latency, from any source to any destination. Event Hubs is compatible with Apache Kafka. It enables you to run existing Kafka workloads without any code changes.
Businesses can use Event Hubs to ingest and store streaming data. By using streaming data, businesses can gain valuable insights, drive real-time analytics, and respond to events as they happen. They can use this data to enhance their overall efficiency and customer experience.
Event Hubs is the preferred event ingestion layer of any event streaming solution that you build on top of Azure. It integrates with data and analytics services inside and outside Azure to build a complete data streaming pipeline to serve the following use cases:
Learn about the key capabilities of Azure Event Hubs in the following sections.
Apache Kafka on Azure Event Hubs
Event Hubs is a multi-protocol event streaming engine that natively supports Advanced Message Queuing Protocol (AMQP), Apache Kafka, and HTTPS protocols. Because it supports Apache Kafka, you can bring Kafka workloads to Event Hubs without making any code changes. You don't need to set up, configure, or manage your own Kafka clusters or use a Kafka-as-a-service offering that's not native to Azure.
Event Hubs is built as a cloud native broker engine. For this reason, you can run Kafka workloads with better performance, better cost efficiency, and no operational overhead.
Azure Schema Registry in Event Hubs provides a centralized repository for managing schemas of event streaming applications. Schema Registry comes free with every Event Hubs namespace. It integrates with your Kafka applications or Event Hubs SDK-based applications.
Schema Registry ensures data compatibility and consistency across event producers and consumers. It enables schema evolution, validation, and governance and promotes efficient data exchange and interoperability.
Schema Registry integrates with your existing Kafka applications and supports multiple schema formats, including Avro and JSON schemas.
Real-time processing of streaming events with Stream Analytics
Event Hubs integrates with Azure Stream Analytics to enable real-time stream processing. With the built-in no-code editor, you can develop a Stream Analytics job by using drag-and-drop functionality, without writing any code.
Alternatively, developers can use the SQL-based Stream Analytics query language to perform real-time stream processing and take advantage of a wide range of functions for analyzing streaming data.
Azure Data Explorer is a fully managed platform for big data analytics that delivers high performance and allows for the analysis of large volumes of data in near real time. By integrating Event Hubs with Azure Data Explorer, you can perform near real-time analytics and exploration of streaming data.
With Event Hubs, you can ingest, buffer, store, and process your stream in real time to get actionable insights. Event Hubs uses a partitioned consumer model. It enables multiple applications to process the stream concurrently and lets you control the speed of processing. Event Hubs also integrates with Azure Functions for serverless architectures.
A broad ecosystem is available for the industry-standard AMQP 1.0 protocol. SDKs are available in languages like .NET, Java, Python, and JavaScript, so you can start processing your streams from Event Hubs. All supported client languages provide low-level integration.
The ecosystem also allows you to integrate with Azure Functions, Azure Spring Apps, Kafka Connectors, and other data analytics platforms and technologies, such as Apache Spark and Apache Flink.
Supports Local development with Event Hubs emulator
Azure Event Hubs emulator offers a local development experience for the Event Hubs. You can use the emulator to develop and test code against the service in isolation, free from cloud interference. For more information, see Event Hubs emulator
Flexible and cost-efficient event streaming
You can experience flexible and cost-efficient event streaming through the Standard, Premium, or Dedicated tiers for Event Hubs. These options cater to data streaming needs that range from a few MB/sec to several GB/sec. You can choose the match that's appropriate for your requirements.
Scalable
With Event Hubs, you can start with data streams in megabytes and grow to gigabytes or terabytes. The auto-inflate feature is one of the options available to scale the number of throughput units or processing units to meet your usage needs.
Supports streaming large messages
In most streaming scenarios, data is characterized by being lightweight, typically less than 1 MB, and having a high throughput. There are also instances where messages can't be divided into smaller segments. Event Hubs can accommodate events up to 20 MB with self-serve scalable dedicated clusters at no extra charge. This capability allows Event Hubs to handle a wide range of message sizes to ensure uninterrupted business operations. For more information, see Send and receive large messages with Azure Event Hubs.
Capture streaming data for long-term retention and batch analytics
Capture your data in near real time in Azure Blob Storage or Azure Data Lake Storage for long-term retention or micro-batch processing. You can achieve this behavior on the same stream that you use for deriving real-time analytics. Setting up capture of event data is fast.
How it works
Event Hubs provides a unified event streaming platform with a time-retention buffer, decoupling event producers from event consumers. The producer and consumer applications can perform large-scale data ingestion through multiple protocols.
The following diagram shows the main components of Event Hubs architecture.
The key functional components of Event Hubs include:
Producer applications: These applications can ingest data to an event hub by using Event Hubs SDKs or any Kafka producer client.
Namespace: The management container for one or more event hubs or Kafka topics. The management tasks such as allocating streaming capacity, configuring network security, and enabling geo-disaster recovery are handled at the namespace level.
Event Hubs/Kafka topic: In Event Hubs, you can organize events into an event hub or a Kafka topic. It's an append-only distributed log, which can comprise one or more partitions.
Partitions: They're used to scale an event hub. They're like lanes in a freeway. If you need more streaming throughput, you can add more partitions.
Consumer applications: These applications can consume data by seeking through the event log and maintaining consumer offset. Consumers can be Kafka consumer clients or Event Hubs SDK clients.
Consumer group: This logical group of consumer instances reads data from an event hub or Kafka topic. It enables multiple consumers to read the same streaming data in an event hub independently at their own pace and with their own offsets.
Related content
To get started using Event Hubs, see the following quickstarts.
Stream data by using the Event Hubs SDK (AMQP)
You can use any of the following samples to stream data to Event Hubs by using SDKs.
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.