Schedule a job

9 minutes

When you schedule a Lakeflow Job, you define exactly when your workflow runs without manual intervention. Scheduling transforms ad hoc data processing into reliable, automated pipelines that keep your data fresh and your business processes running smoothly.

Azure Databricks provides two scheduling approaches. Simple schedules handle recurring intervals like "every 12 hours," while advanced schedules give you precise control through cron expressions that can target specific days, times, and even months.

Create a simple schedule

Simple schedules work well when you need consistent intervals between job runs. You specify a number and a time unit—minutes, hours, days, or weeks—and Azure Databricks handles the rest.

To add a simple schedule to your job:

In your Azure Databricks workspace sidebar, select Jobs & Pipelines.
Select your job's name to open its details.
In the Job details panel, select Add trigger.
Set Trigger type to Scheduled.
Set Schedule type to Simple.
Specify your interval (for example, 12 hours).
Select Save.

With simple schedules, you can't control when the first run happens—the scheduler picks a time when you save the configuration. Each subsequent run then follows your specified interval.

Note

Azure Databricks enforces a minimum interval of 10 seconds between job runs, regardless of your configuration.

Configure an advanced schedule

Advanced schedules use cron expressions to define precisely when your job runs. This approach handles complex requirements like "at 8:00 AM every weekday" or "on the last Friday of each month."

To create an advanced schedule:

In the Job details panel, select Add trigger.
Set Trigger type to Scheduled.
Set Schedule type to Advanced.
Specify the period, starting time, and time zone.
Optionally, select Show Cron Syntax to view and edit the expression directly.
Select Save.

Understand cron expression format

A cron expression contains six or seven fields separated by spaces:

Field	Required	Allowed values	Special characters
Seconds	Yes	0-59	`, - * /`
Minutes	Yes	0-59	`, - * /`
Hours	Yes	0-23	`, - * /`
Day of month	Yes	1-31	`, - * ? / L W`
Month	Yes	1-12 or JAN-DEC	`, - * /`
Day of week	Yes	1-7 or SUN-SAT	`, - * ? / L #`
Year	No	empty, 1970-2099	`, - * /`

The special characters let you build flexible schedules:

* selects all values in a field ("every minute" or "every hour")
? means "no specific value"—use it in day-of-month or day-of-week when you specify the other
- defines ranges like 10-12 for hours 10, 11, and 12
, lists multiple values like MON,WED,FRI
/ specifies increments—0/15 in minutes means every 15 minutes starting at 0
L means "last"—in day-of-month, it's the last day; 6L in day-of-week means "last Friday"
# specifies "nth occurrence"—6#3 means "third Friday of the month"

Common cron expression examples

These expressions cover typical scheduling scenarios:

Expression	Schedule
`0 0 12 * * ?`	Every day at noon
`0 15 10 ? * MON-FRI`	Weekdays at 10:15 AM
`0 0 8 1 * ?`	First day of every month at 8:00 AM
`0 30 6 ? * 6L`	Last Friday of every month at 6:30 AM
`0 0 /2 * ?`	Every 2 hours

Choose the right time zone

Time zone selection affects when your scheduled jobs actually run. Azure Databricks lets you choose any time zone, but your choice has implications for consistency.

Time zones that observe daylight saving time (DST) create complications. When DST begins or ends, an hourly job might skip a run or appear delayed by an hour or two. If your business logic depends on consistent hourly execution—measured in absolute time—select UTC instead.

Consider these factors when choosing a time zone:

Business alignment: If stakeholders expect reports at 8:00 AM local time, schedule in their time zone and accept DST shifts.
Data consistency: If you process data from systems using UTC timestamps, schedule in UTC to maintain alignment.
Cross-region coordination: When jobs must run simultaneously across regions, UTC provides a single reference point.

Tip

For jobs that must run at exact intervals regardless of local time changes, always use UTC.

Control concurrent job runs

When scheduled jobs take longer than expected, a new run might start before the previous one finishes. This overlap can cause data corruption, duplicate processing, or resource contention. Azure Databricks provides concurrency settings to control this behavior.

Configure maximum concurrent runs

The Maximum concurrent runs setting limits how many instances of the same job can execute simultaneously. By default, jobs allow multiple concurrent runs. For jobs that must not overlap—such as those writing to the same tables—set this value to 1.

To configure maximum concurrent runs:

Open your job in the Jobs & Pipelines workspace UI.
In the Job details panel, locate the Maximum concurrent runs setting.
Set the value to control how many runs can execute at once.

When a new run is triggered but the maximum concurrent runs limit is reached, Azure Databricks must decide what to do with the incoming run.

Configure queue behavior for overlapping runs

When concurrent runs exceed your configured limit, you choose how the scheduler handles the new run:

Behavior	Description	Use case
Queue the run	The new run waits until a slot becomes available, then executes	Jobs that must eventually run—no triggers should be missed
Cancel the run	The new run is immediately canceled	Jobs where stale triggers are not valuable
Skip the run	Similar to cancel—the run doesn't execute	Jobs where missing occasional runs is acceptable

For most data pipelines, queue the run ensures that all scheduled executions eventually complete. This approach prevents data gaps when a job occasionally runs longer than its schedule interval.

Consider a job scheduled to run every hour. If a run takes 75 minutes to complete, the next scheduled trigger arrives while the job is still running. With concurrency set to 1 and queue enabled:

The first run continues processing.
The second run enters the queue.
When the first run completes, the queued run starts immediately.

This pattern ensures sequential, non-overlapping execution while preserving all scheduled runs.

Note

Queued runs consume no compute resources while waiting. They only start when a concurrent slot becomes available.

Scheduling considerations for production workloads

The Azure Databricks job scheduler handles most scenarios reliably, but it's not designed for low-latency requirements. Network conditions or cloud service issues can occasionally delay job starts by several minutes. When service recovers, scheduled jobs run immediately.

Keep these points in mind:

Schedule jobs with buffer time between dependent pipelines
Don't rely on sub-minute precision for business-critical timing
Monitor job run history to identify scheduling patterns or delays
Use the minimum 10-second interval rule when planning high-frequency jobs

Feedback

Was this page helpful?