Streaming on Azure Databricks

Article
10/04/2024

You can use Azure Databricks for near real-time data ingestion, processing, machine learning, and AI for streaming data.

Azure Databricks offers numerous optimizations for streaming and incremental processing, including the following:

Delta Live Tables provides declarative syntax for incremental processing. See What is Delta Live Tables?.
Auto Loader simplifies incremental ingestion from cloud object storage. See What is Auto Loader?.
Unity Catalog adds data governance to streaming workloads. See Using Unity Catalog with Structured Streaming.

Delta Lake provides the storage layer for these integrations. See Delta table streaming reads and writes.

For real-time model serving, see Model serving with Azure Databricks.

Tutorial

Learn the basics of near real-time and incremental processing with Structured Streaming on Azure Databricks.
Concepts

Learn core concepts for configuring incremental and near real-time workloads with Structured Streaming.
Stateful streaming

Managing the intermediate state information of stateful Structured Streaming queries can help prevent unexpected latency and production problems.
Production considerations

This article contains recommendations to configure production incremental processing workloads with Structured Streaming on Azure Databricks to fulfill latency and cost requirements for real-time or batch applications.
Monitor streams

Learn how to monitor Structured Streaming applications on Azure Databricks.
Unity Catalog integration

Learn how to leverage Unity Catalog in conjunction with Structured Streaming on Azure Databricks.
Streaming with Delta

Learn how to use Delta Lake tables as streaming sources and sinks.
Examples

See examples of using Spark Structured Streaming with Cassandra, Azure Synapse Analytics, Python notebooks, and Scala notebooks in Azure Databricks.

Azure Databricks has specific features for working with semi-structured data fields contained in Avro, protocol buffers, and JSON data payloads. To learn more, see:

Additional resources

Apache Spark provides a Structured Streaming Programming Guide that has more information about Structured Streaming.

For reference information about Structured Streaming, Databricks recommends the following Apache Spark API references:

Share via

Streaming on Azure Databricks

Additional resources

Feedback

Additional resources