Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
This feature is in Public Preview.
Real-time mode enables ultra-low latency streaming with end-to-end latency as low as five milliseconds, making it ideal for operational workloads like fraud detection and real-time personalization. This tutorial guides you through setting up your first real-time streaming query using a simple example.
For conceptual information about real-time mode, when to use it, and supported features, see Real-time mode in Structured Streaming.
Requirements
- You have permission to create classic compute.
- Databricks Runtime 17.1 or above (required for using the
displayfunction with real-time mode).
Note
If you don't have classic compute creation privileges, contact your workspace administrator to create a real-time mode cluster for you using the configuration in Step 1.
Step 1: Create classic compute for real-time mode
Real-time mode requires a specific classic compute configuration to achieve ultra-low latency. These settings ensure that tasks run simultaneously across all stages and data is processed continuously as it arrives, rather than in batches.
To create a properly configured classic compute:
In your Azure Databricks workspace, click Compute in the sidebar.
Click Create compute.
Enter a name.
Select Databricks Runtime 17.1 or above.
Clear Photon acceleration (real-time mode doesn't support Photon).
Clear Enable autoscaling (real-time mode requires a fixed cluster size).
Under Advanced performance, clear Use spot instances (spot instances can cause interruptions).
Click Advanced options to expand additional settings.
Under Access mode, select Dedicated (formerly: Single user).
Under Spark config, add the following configuration:
spark.databricks.streaming.realTimeMode.enabled trueClick Create compute.
Step 2: Create a notebook
Notebooks provide an interactive environment for developing and testing streaming queries. You use this notebook to write your real-time query and see the results update continuously.
To create a notebook:
- Click New in the sidebar, then click Notebook.
- In the compute drop-down menu, select the compute you created in Step 1.
- Select Python or Scala as the default language.
Step 3: Run a real-time mode query
Copy and paste the following code into a notebook cell and run it. This example uses a rate source, which generates rows at a specified rate, and displays the results in real time.
Note
The display function with realTime trigger is available in Databricks Runtime 17.1 and above.
Python
inputDF = (
spark
.readStream
.format("rate")
.option("numPartitions", 2)
.option("rowsPerSecond", 1)
.load()
)
display(inputDF, realTime="5 minutes", outputMode="update")
Scala
import org.apache.spark.sql.streaming.Trigger
import org.apache.spark.sql.streaming.OutputMode
val inputDF = spark
.readStream
.format("rate")
.option("numPartitions", 2)
.option("rowsPerSecond", 1)
.load()
display(inputDF, trigger=Trigger.RealTime(), outputMode=OutputMode.Update())
After running the code, you see a table that updates in real time as new rows are generated. The table displays a timestamp column and a value column that increments with each row.
Understanding the code
The code above demonstrates the essential components of a real-time streaming query. The following tables explain the key parameters and what they control:
Python
| Parameter | Description |
|---|---|
format("rate") |
Uses the rate source, a built-in source that generates rows at a configurable rate. This is useful for testing without external dependencies. |
numPartitions |
Sets the number of partitions for the generated data. |
rowsPerSecond |
Controls how many rows are generated per second. |
realTime="5 minutes" |
Enables real-time mode. The interval specifies how often the query checkpoints progress. Longer intervals mean less frequent checkpointing but potentially longer recovery times after failures. |
outputMode="update" |
Real-time mode requires update output mode. |
Scala
| Parameter | Description |
|---|---|
format("rate") |
Uses the rate source, a built-in source that generates rows at a configurable rate. This is useful for testing without external dependencies. |
numPartitions |
Sets the number of partitions for the generated data. |
rowsPerSecond |
Controls how many rows are generated per second. |
Trigger.RealTime() |
Enables real-time mode with the default checkpoint interval. You can also specify an interval, for example Trigger.RealTime("5 minutes"). |
OutputMode.Update() |
Real-time mode requires update output mode. |
What you're seeing
When you run the query, the display function creates a table that updates in real time as the rate source generates new rows. Each row contains:
- timestamp: The time when the row was generated by the rate source
- value: A monotonically increasing counter that increments with each new row
The table updates continuously with minimal latency, demonstrating how real-time mode processes data as soon as it becomes available. This is the core benefit of real-time mode - the ability to see and act on data immediately rather than waiting for batch processing.
What you've learned
You've successfully set up and run your first real-time streaming query. You now know how to:
- Configure classic compute with the required settings for real-time mode (dedicated cluster, Photon disabled, autoscaling disabled, Spark config)
- Enable real-time processing using the
realTimetrigger - Use the
displayfunction for interactive development and testing - Verify that your query is running in real-time mode by observing continuous updates
You're ready to build production real-time pipelines with Kafka, Kinesis, and other supported sources. To learn more about Structured Streaming, see Structured Streaming concepts.
Next steps
Now that you've run your first real-time query, explore these resources to build production streaming applications:
- Real-time mode examples - Working code examples for Kafka sources and sinks, stateful queries, aggregations, and custom sinks
- Real-time mode reference - Learn about cluster sizing, supported operators, monitoring, and feature limitations
- Stateful streaming applications - Add state management to your streaming queries for deduplication, aggregations, and windowing
- Advanced state management - Use
transformWithStatefor custom stateful processing with time-to-live (TTL) and complex logic