Schedule machine learning pipeline jobs

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, you'll learn how to programmatically schedule a pipeline to run on Azure and use the schedule UI to do the same. You can create a schedule based on elapsed time. Time-based schedules can be used to take care of routine tasks, such as retrain models or do batch predictions regularly to keep them up-to-date. After learning how to create schedules, you'll learn how to retrieve, update and deactivate them via CLI, SDK, and studio UI.

Tip

If you need to schedule jobs using an external orchestrator, like Azure Data Factory or Microsoft Fabric, consider deploying your pipeline jobs under a Batch Endpoint. Learn more about how to deploy jobs under a batch endpoint, and how to consume batch endpoints from Microsoft Fabric.

Prerequisites

Schedule a pipeline job

To run a pipeline job on a recurring basis, you'll need to create a schedule. A Schedule associates a job, and a trigger. The trigger can either be cron that use cron expression to describe the wait between runs or recurrence that specify using what frequency to trigger job. In each case, you need to define a pipeline job first, it can be existing pipeline jobs or a pipeline job define inline, refer to Create a pipeline job in CLI and Create a pipeline job in SDK.

You can schedule a pipeline job yaml in local or an existing pipeline job in workspace.

Create a schedule

Create a time-based schedule with recurrence pattern

APPLIES TO: Azure CLI ml extension v2 (current)

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: simple_recurrence_job_schedule
display_name: Simple recurrence job schedule
description: a simple hourly recurrence job schedule

trigger:
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 #every day
  schedule:
    hours: [4,5,10,11,12]
    minutes: [0,30]
  start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
  time_zone: "Pacific Standard Time" # optional - default will be UTC

create_job: ./simple-pipeline-job.yml
# create_job: azureml:simple-pipeline-job

trigger contains the following properties:

  • (Required) type specifies the schedule type is recurrence. It can also be cron, see details in the next section.

List continues below.

Note

The following properties that need to be specified apply for CLI and SDK.

  • (Required) frequency specifies the unit of time that describes how often the schedule fires. Can be minute, hour, day, week, month.

  • (Required) interval specifies how often the schedule fires based on the frequency, which is the number of time units to wait until the schedule fires again.

  • (Optional) schedule defines the recurrence pattern, containing hours, minutes, and weekdays.

    • When frequency is day, pattern can specify hours and minutes.
    • When frequency is week and month, pattern can specify hours, minutes and weekdays.
    • hours should be an integer or a list, from 0 to 23.
    • minutes should be an integer or a list, from 0 to 59.
    • weekdays can be a string or list from monday to sunday.
    • If schedule is omitted, the job(s) will be triggered according to the logic of start_time, frequency and interval.
  • (Optional) start_time describes the start date and time with timezone. If start_time is omitted, start_time will be equal to the job created time. If the start time is in the past, the first job will run at the next calculated run time.

  • (Optional) end_time describes the end date and time with timezone. If end_time is omitted, the schedule will continue trigger jobs until the schedule is manually disabled.

  • (Optional) time_zone specifies the time zone of the recurrence. If omitted, by default is UTC. To learn more about timezone values, see appendix for timezone values.

Create a time-based schedule with cron expression

APPLIES TO: Azure CLI ml extension v2 (current)

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: simple_cron_job_schedule
display_name: Simple cron job schedule
description: a simple hourly cron job schedule

trigger:
  type: cron
  expression: "0 * * * *"
  start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
  time_zone: "Pacific Standard Time" # optional - default will be UTC

# create_job: azureml:simple-pipeline-job
create_job: ./simple-pipeline-job.yml

The trigger section defines the schedule details and contains following properties:

  • (Required) type specifies the schedule type is cron.

List continues below.

  • (Required) expression uses standard crontab expression to express a recurring schedule. A single expression is composed of five space-delimited fields:

    MINUTES HOURS DAYS MONTHS DAYS-OF-WEEK

    • A single wildcard (*), which covers all values for the field. So a * in days means all days of a month (which varies with month and year).

    • The expression: "15 16 * * 1" in the sample above means the 16:15PM on every Monday.

    • The table below lists the valid values for each field:

      Field Range Comment
      MINUTES 0-59 -
      HOURS 0-23 -
      DAYS - Not supported. The value will be ignored and treat as *.
      MONTHS - Not supported. The value will be ignored and treat as *.
      DAYS-OF-WEEK 0-6 Zero (0) means Sunday. Names of days also accepted.
    • To learn more about how to use crontab expression, see Crontab Expression wiki on GitHub .

    Important

    DAYS and MONTH are not supported. If you pass a value, it will be ignored and treat as *.

  • (Optional) start_time specifies the start date and time with timezone of the schedule. start_time: "2022-05-10T10:15:00-04:00" means the schedule starts from 10:15:00AM on 2022-05-10 in UTC-4 timezone. If start_time is omitted, the start_time will be equal to schedule creation time. If the start time is in the past, the first job will run at the next calculated run time.

  • (Optional) end_time describes the end date and time with timezone. If end_time is omitted, the schedule will continue trigger jobs until the schedule is manually disabled.

  • (Optional) time_zonespecifies the time zone of the expression. If omitted, by default is UTC. See appendix for timezone values.

Limitations:

  • Currently Azure Machine Learning v2 schedule doesn't support event-based trigger.
  • You can specify complex recurrence pattern containing multiple trigger timestamps using Azure Machine Learning SDK/CLI v2, while UI only displays the complex pattern and doesn't support editing.
  • If you set the recurrence as the 31st day of every month, in months with less than 31 days, the schedule won't trigger jobs.

Change runtime settings when defining schedule

When defining a schedule using an existing job, you can change the runtime settings of the job. Using this approach, you can define multi-schedules using the same job with different inputs.

APPLIES TO: Azure CLI ml extension v2 (current)

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: cron_with_settings_job_schedule
display_name: Simple cron job schedule
description: a simple hourly cron job schedule

trigger:
  type: cron
  expression: "0 * * * *"
  start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
  time_zone: "Pacific Standard Time" # optional - default will be UTC

create_job: 
  type: pipeline
  job: ./simple-pipeline-job.yml
  # job: azureml:simple-pipeline-job
  # runtime settings
  settings:
    #default_compute: azureml:cpu-cluster
    continue_on_step_failure: true
  inputs:
    hello_string_top_level_input: ${{name}} 
  tags: 
    schedule: cron_with_settings_schedule

Following properties can be changed when defining schedule:

Property Description
settings A dictionary of settings to be used when running the pipeline job.
inputs A dictionary of inputs to be used when running the pipeline job.
outputs A dictionary of inputs to be used when running the pipeline job.
experiment_name Experiment name of triggered job.

Note

Studio UI users can only modify input, output, and runtime settings when creating a schedule. experiment_name can only be changed using the CLI or SDK.

Expressions supported in schedule

When define schedule, we support following expression that will be resolved to real value during job runtime.

Expression Description Supported properties
${{creation_context.trigger_time}} The time when the schedule is triggered. String type inputs of pipeline job
${{name}} The name of job. outputs.path of pipeline job

Manage schedule

Create schedule

APPLIES TO: Azure CLI ml extension v2 (current)

After you create the schedule yaml, you can use the following command to create a schedule via CLI.

# This action will create related resources for a schedule. It will take dozens of seconds to complete.
az ml schedule create --file cron-schedule.yml --no-wait

List schedules in a workspace

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule list

Check schedule detail

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule show -n simple_cron_job_schedule

Update a schedule

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule update -n simple_cron_job_schedule  --set description="new description" --no-wait

Note

If you would like to update more than just tags/description, it is recomend to use az ml schedule create --file update_schedule.yml

Disable a schedule

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule disable -n simple_cron_job_schedule --no-wait

Enable a schedule

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule enable -n simple_cron_job_schedule --no-wait

Query triggered jobs from a schedule

All the display name of jobs triggered by schedule will have the display name as <schedule_name>-YYYYMMDDThhmmssZ. For example, if a schedule with a name of named-schedule is created with a scheduled run every 12 hours starting at 6 AM on Jan 1 2021, then the display names of the jobs created will be as follows:

  • named-schedule-20210101T060000Z
  • named-schedule-20210101T180000Z
  • named-schedule-20210102T060000Z
  • named-schedule-20210102T180000Z, and so on

Screenshot of the jobs tab in the Azure Machine Learning studio filtering by job display name.

You can also apply Azure CLI JMESPath query to query the jobs triggered by a schedule name.

# query triggered jobs from schedule, please replace the simple_cron_job_schedule to your schedule name
az ml job list --query "[?contains(display_name,'simple_cron_schedule')]"

Note

For a simpler way to find all jobs triggered by a schedule, see the Jobs history on the schedule detail page using the studio UI.


Delete a schedule

Important

A schedule must be disabled to be deleted. Delete is an unrecoverable action. After a schedule is deleted, you can never access or recover it.

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule delete -n simple_cron_job_schedule

RBAC (Role-based-access-control) support

Since schedules are usually used for production, to reduce impact of misoperation, workspace admins may want to restrict access to creating and managing schedules within a workspace.

Currently there are three action rules related to schedules and you can configure in Azure portal. You can learn more details about how to manage access to an Azure Machine Learning workspace.

Action Description Rule
Read Get and list schedules in Machine Learning workspace Microsoft.MachineLearningServices/workspaces/schedules/read
Write Create, update, disable and enable schedules in Machine Learning workspace Microsoft.MachineLearningServices/workspaces/schedules/write
Delete Delete a schedule in Machine Learning workspace Microsoft.MachineLearningServices/workspaces/schedules/delete

Frequently asked questions

  • Why my schedules created by SDK aren't listed in UI?

    The schedules UI is for v2 schedules. Hence, your v1 schedules won't be listed or accessed via UI.

    However, v2 schedules also support v1 pipeline jobs. You don't have to publish pipeline first, and you can directly set up schedules for a pipeline job.

  • Why my schedules don't trigger job at the time I set before?

    • By default schedules will use UTC timezone to calculate trigger time. You can specify timezone in the creation wizard, or update timezone in schedule detail page.
    • If you set the recurrence as the 31st day of every month, in months with less than 31 days, the schedule won't trigger jobs.
    • If you're using cron expressions, MONTH isn't supported. If you pass a value, it will be ignored and treated as *. This is a known limitation.
  • Are event-based schedules supported?

    • No, V2 schedule does not support event-based schedules.

Next steps