Query streaming data

2025-06-11

You can use Azure Databricks to query streaming data sources using Structured Streaming. Azure Databricks provides extensive support for streaming workloads in Python and Scala, and supports most Structured Streaming functionality with SQL.

The following examples demonstrate using a memory sink for manual inspection of streaming data during interactive development in notebooks. Because of row output limits in the notebook UI, you might not observe all data read by streaming queries. In production workloads, you should only trigger streaming queries by writing them to a target table or external system.

Note

SQL support for interactive queries on streaming data is limited to notebooks running on all-purpose compute. You can also use SQL when you declare streaming tables in Databricks SQL or Lakeflow Declarative Pipelines. See Streaming tables and Lakeflow Declarative Pipelines.

Query data from streaming systems

Azure Databricks provides streaming data readers for the following streaming systems:

Kafka
Kinesis
PubSub
Pulsar

You must provide configuration details when you initialize queries against these systems, which vary depending on your configured environment and the system you choose to read from. See Standard connectors in Lakeflow Connect.

Common workloads that involve streaming systems include data ingestion to the lakehouse and stream processing to sink data to external systems. For more on streaming workloads, see Structured Streaming concepts.

The following examples demonstrate an interactive streaming read from Kafka:

Python

display(spark.readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "<server:ip>")
  .option("subscribe", "<topic>")
  .option("startingOffsets", "latest")
  .load()
)

SQL

SELECT * FROM STREAM read_kafka(
  bootstrapServers => '<server:ip>',
  subscribe => '<topic>',
  startingOffsets => 'latest'
);

Query a table as a streaming read

Azure Databricks creates all tables using Delta Lake by default. When you perform a streaming query against a Delta table, the query automatically picks up new records when a version of the table is committed. By default, streaming queries expect source tables to contain only appended records. If you need to work with streaming data that contains updates and deletes, Databricks recommends using Lakeflow Declarative Pipelines and AUTO CDC ... INTO. See The AUTO CDC APIs: Simplify change data capture with Lakeflow Declarative Pipelines.

The following examples demonstrate performing an interactive streaming read from a table:

Python

display(spark.readStream.table("table_name"))

SQL

SELECT * FROM STREAM table_name

Query data in cloud object storage with Auto Loader

You can stream data from cloud object storage using Auto Loader, the Azure Databricks cloud data connector. You can use the connector with files stored in Unity Catalog volumes or other cloud object storage locations. Databricks recommends using volumes to manage access to data in cloud object storage. See Connect to data sources and external services.

Azure Databricks optimizes this connector for streaming ingestion of data in cloud object storage that is stored in popular structured, semi-structured, and unstructured formats. Databricks recommends storing ingested data in a nearly-raw format to maximize throughput and minimize potential data loss due to corrupt records or schema changes.

For more recommendations on ingesting data from cloud object storage, see Standard connectors in Lakeflow Connect.

The follow examples demonstrate an interactive streaming read from a directory of JSON files in a volume:

Python

display(spark.readStream.format("cloudFiles").option("cloudFiles.format", "json").load("/Volumes/catalog/schema/volumes/path/to/files"))

SQL

SELECT * FROM STREAM read_files('/Volumes/catalog/schema/volumes/path/to/files', format => 'json')

Share via

Query streaming data

Query data from streaming systems

Python

SQL

Query a table as a streaming read

Python

SQL

Query data in cloud object storage with Auto Loader

Python

SQL

Feedback

Additional resources