Schedule machine learning pipeline jobs

Grein
09/17/2024

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, you learn how to schedule machine learning pipelines to run on Azure. You can schedule routine tasks like retraining models or regularly updating batch predictions based on elapsed time.

This article shows you how to create, retrieve, update, and deactivate schedules by using the Azure Machine Learning CLI, Azure Machine Learning SDK v2 for Python, or Azure Machine Learning studio UI.

Tip

To schedule jobs by using an external orchestrator, like Azure Data Factory or Microsoft Fabric, consider deploying your pipeline jobs under a batch endpoint. For more information, see Deploy existing pipeline jobs to batch endpoints and Run Azure Machine Learning models from Fabric by using batch endpoints (preview).

Prerequisites

An Azure subscription. If you don't have one, create a free account before you begin.
An Azure Machine Learning workspace. To create a workspace, see Create workspace resources.
An understanding of Azure Machine Learning pipelines. For information, see What are machine learning pipelines.

The Azure CLI and ml extension installed by following the instructions in Install, set up, and use the CLI (v2).
Knowledge of how to create Azure Machine Learning YAML pipelines. For information, see Create and run machine learning pipelines using components with the Azure Machine Learning CLI.

Limitations

Azure Machine Learning v2 schedules don't support event-based triggers.
CLI and SDK v2 schedules support specifying complex recurrence patterns that contain multiple trigger timestamps. The studio UI displays the complex patterns but doesn't support editing them.
The studio UI supports only v2 schedules, and can't list or access v1 schedules that are based on published pipelines or pipeline endpoints. You can create a schedule for an unpublished pipeline.
If recurrence is set as the 31st or 30th day of every month, the schedule doesn't trigger jobs in months that have fewer days.
DAYS and MONTHS values aren't supported in cron schedule expressions. Values passed for these parameters are ignored and treated as *.

Create a schedule

When you have a pipeline job with satisfying performance and outputs, you can set up a schedule to automatically trigger the job on a regular basis. To do so, you must create a schedule that associates the job with a trigger. The trigger can be either a recurrence pattern or a cron expression that specifies the interval and frequency to run the job.

In both cases, you need to define a pipeline job first, either inline or by specifying an existing pipeline job. You can define pipelines in YAML and run them from the CLI, author pipelines inline in Python, or compose pipelines in Azure Machine Learning studio. You can create pipeline jobs locally or from existing jobs in the workspace.

You can create v2 schedules for v2 or v1 pipeline jobs by using the studio UI, SDK v2, or CLI v2. You don't have to publish existing pipelines first to set up schedules for pipeline jobs.

The code examples in this article are from Working with Schedule in Azure Machine Learning CLI 2.0.

Define a time-based schedule with a recurrence pattern

The following YAML code defines a recurring schedule for a pipeline job. The required type parameter specifies that the trigger type is recurrence.

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: simple_recurrence_job_schedule
display_name: Simple recurrence job schedule
description: a simple hourly recurrence job schedule

trigger:
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 #every day
  schedule:
    hours: [4,5,10,11,12]
    minutes: [0,30]
  start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
  time_zone: "Pacific Standard Time" # optional - default will be UTC

create_job: ./simple-pipeline-job.yml
# create_job: azureml:simple-pipeline-job

You must or can provide the following schedule parameters:

The following code uses RecurrenceTrigger to provide a better coding experience.

schedule_name = "simple_sdk_create_schedule_recurrence"

schedule_start_time = datetime.utcnow()
recurrence_trigger = RecurrenceTrigger(
    frequency="day",
    interval=1,
    schedule=RecurrencePattern(hours=10, minutes=[0, 1]),
    start_time=schedule_start_time,
    time_zone=TimeZone.UTC,
)

job_schedule = JobSchedule(
    name=schedule_name, trigger=recurrence_trigger, create_job=pipeline_job
)

You must or can provide the following schedule parameters:

Azure CLI / Python SDK
Studio UI

Parameters

frequency (required) is the time unit on which basis the schedule fires. Can be minutes, hours, days, weeks, or months.
interval (required) is the number of time units between schedule recurrences.
schedule (optional) defines the recurrence pattern, which can contain hours, minutes, and weekdays. If omitted, jobs trigger according to the logic of start_time, frequency, and interval.
- When frequency is day, the pattern can specify hours and minutes.
- When frequency is week or month, the pattern can specify hours, minutes, and weekdays.
  - hours is an integer or list from 0 to 23.
  - minutes is an integer or list from 0 to 59.
  - weekdays is a string or list from monday to sunday.
start_time (optional) is the start date and time with timezone. If omitted, the default is equal to schedule creation time. If the start time is in the past, the first job runs at the next calculated run time.
end_time (optional) is the end date and time with timezone. If omitted, the schedule remains active until manually disabled.
time_zone (optional) specifies the time zone of the recurrence schedule. If omitted, the default is Coordinated Universal Time (UTC). For more information about timezone values, see the appendix for timezone values.

After you create the schedule YAML, use the following command to create the schedule via CLI:

# This action creates related resources for a schedule. It takes dozens of seconds to complete.
az ml schedule create --file simple-pipeline-job.yml --no-wait

The following Python code creates the schedule you defined:

job_schedule = ml_client.schedules.begin_create_or_update(
    schedule=job_schedule
).result()
print(job_schedule)

Define a time-based schedule with a cron expression

A cron expression can specify a flexible and customized recurrence pattern for a schedule. A standard crontab expression is composed of the space-delimited fields MINUTES HOURS DAYS MONTHS DAYS-OF-WEEK. A wildcard * means all values for a field.

In an Azure Machine Language schedule cron expression:

MINUTES is an integer or list from 0 to 59.
HOURS is an integer or list from 0 to 23.
DAYS values aren't supported, and are always treated as *. The * value in DAYS means all days in a month, which varies with month and year.
MONTHS values aren't supported, and are always treated as *.
DAYS-OF-WEEK is an integer or list from 0 to 6, where 0 = Sunday. Names of days are also accepted.

For example, the expression 15 16 * * 1 means 4:15 PM UTC every Monday. For more information about crontab expressions, see the Crontab Expression wiki on GitHub.

The following YAML code defines a recurring schedule for a pipeline job. The required type parameter specifies that the trigger type is cron.

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: simple_cron_job_schedule
display_name: Simple cron job schedule
description: a simple hourly cron job schedule

trigger:
  type: cron
  expression: "0 * * * *"
  start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
  time_zone: "Pacific Standard Time" # optional - default will be UTC

# create_job: azureml:simple-pipeline-job
create_job: ./simple-pipeline-job.yml

You must or can provide the following schedule parameters:

The following code uses CronTrigger to provide a better coding experience.

schedule_name = "simple_sdk_create_schedule_cron"

schedule_start_time = datetime.utcnow()
cron_trigger = CronTrigger(
    expression="15 10 * * *",
    start_time=schedule_start_time,  # start time
    time_zone="Eastern Standard Time",  # time zone of expression
)

job_schedule = JobSchedule(
    name=schedule_name, trigger=cron_trigger, create_job=pipeline_job
)

You must or can provide the following schedule parameters:

Azure CLI / Python SDK
Studio UI

Parameters

expression (required) is a standard crontab expression that expresses a recurring schedule.
start_time (optional) is the schedule start date and time with timezone. For example, start_time: "2022-05-10T10:15:00-04:00" means the schedule starts from 10:15:00 AM on May 10, 2022 in UTC-4 timezone. If omitted, the default is equal to schedule creation time. If the start time is in the past, the first job runs at the next calculated run time.
end_time (optional) is the end date and time with timezone. If omitted, the schedule remains active until manually disabled.
time_zone (optional) specifies the time zone of the recurrence schedule. If omitted, the default is UTC.

After you create the schedule YAML, use the following command to create the schedule via CLI:

# This action creates related resources for a schedule. It takes dozens of seconds to complete.
az ml schedule create --file simple-pipeline-job.yml --no-wait

The following Python code creates the schedule you defined:

job_schedule = ml_client.schedules.begin_create_or_update(
    schedule=job_schedule
).result()
print(job_schedule)

Change job settings when you define schedules

Sometimes you might want the jobs triggered by schedules to have different configurations from the test jobs. When you define a schedule by using an existing job, you can change the job settings. This approach lets you define multiple schedules that use the same job with different inputs.

When you define a schedule, you can change the settings, inputs, or outputs to use when running the pipeline job. You can also change the experiment_name of the triggered job.

The following schedule definition changes the settings of an existing job.

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: cron_with_settings_job_schedule
display_name: Simple cron job schedule
description: a simple hourly cron job schedule

trigger:
  type: cron
  expression: "0 * * * *"
  start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
  time_zone: "Pacific Standard Time" # optional - default will be UTC

create_job: 
  type: pipeline
  job: ./simple-pipeline-job.yml
  # job: azureml:simple-pipeline-job
  # runtime settings
  settings:
    #default_compute: azureml:cpu-cluster
    continue_on_step_failure: true
  inputs:
    hello_string_top_level_input: ${{name}} 
  tags: 
    schedule: cron_with_settings_schedule

When you define a schedule, you can change the settings, inputs, or outputs to use when running the pipeline job. You can also change the experiment_name of the triggered job.

The following schedule definition changes the settings of a pipeline per job.

# set run time settings
pipeline_job = pipeline_with_components_from_yaml(
    training_input=Input(type="uri_folder", path=parent_dir + "/data/"),
    test_input=Input(type="uri_folder", path=parent_dir + "/data/"),
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
)

# set pipeline level compute
pipeline_job.settings.default_compute = "cpu-cluster"

In the studio UI, you can use Advanced settings in the schedule creation wizard to modify inputs, outputs, and runtime settings for a pipeline job. You can't change the experiment_name in the studio UI.

In Job inputs & outputs, you can modify inputs and outputs for future jobs triggered by the schedule. You can use macro expressions for the inputs and outputs paths.
In Job runtime settings, you can modify compute and other runtime settings for jobs triggered by the schedule.
Select Review + Create to review the schedule settings you configured, and then select Review + Create again to create the schedule.

Use supported expressions in schedules

When you define a schedule, you can use the following macro expressions to define dynamic parameter values that resolve to actual values during job runtime.

Expression	Description	Supported properties
`${{name}}`	Name of the job	`outputs` path of the pipeline job
`${{creation_context.trigger_time}}`	Trigger time of the job	String type `inputs` of the pipeline job

Manage schedule

You can list, view details, update, disable, enable, and delete schedules in a workspace.

List schedules

az ml schedule list

schedules = ml_client.schedules.list()
[s.name for s in schedules]

View schedule details

az ml schedule show -n simple_cron_job_schedule

created_schedule = ml_client.schedules.get(name=schedule_name)
[created_schedule.name]

Update a schedule

az ml schedule update -n simple_cron_job_schedule  --set description="new description" --no-wait

Note

To update more than just tags and description, consider using az ml schedule create --file update_schedule.yml.

# Update trigger expression
job_schedule.trigger.expression = "10 10 * * 1"
job_schedule = ml_client.schedules.begin_create_or_update(
    schedule=job_schedule
).result()
print(job_schedule)

Disable a schedule

az ml schedule disable -n simple_cron_job_schedule --no-wait

The following code returns False:

job_schedule = ml_client.schedules.begin_disable(name=schedule_name).result()
job_schedule.is_enabled

Enable a schedule

az ml schedule enable -n simple_cron_job_schedule --no-wait

The following code returns True:

job_schedule = ml_client.schedules.begin_enable(name=schedule_name).result()
job_schedule.is_enabled

Delete a schedule

Important

You must first disable a schedule to delete it. Deletion is permanent and unrecoverable.

az ml schedule delete -n simple_cron_job_schedule

# Only disabled schedules can be deleted
ml_client.schedules.begin_disable(name=schedule_name).result()
ml_client.schedules.begin_delete(name=schedule_name).result()

Query triggered jobs from a schedule

Jobs triggered by a specific schedule all have the display name <schedule_name>-YYYYMMDDThhmmssZ. For example, if a schedule named named-schedule runs every 12 hours starting at 6 AM on January 1, 2021, the display names of the jobs created are as follows:

named-schedule-20210101T060000Z
named-schedule-20210101T180000Z
named-schedule-20210102T060000Z
named-schedule-20210102T180000Z, and so on

You can also apply Azure CLI JMESPath query to query the jobs triggered by a schedule name.

# query triggered jobs from schedule, please replace the simple_cron_job_schedule to your schedule name
az ml job list --query "[?contains(display_name,'simple_cron_schedule')]"

Tip

The Jobs history tab on the schedule detail page in the studio provides a simple way to find all jobs triggered by a schedule.

Role-based access control (RBAC) support

Because schedules are used for production, it's important to reduce the possibility and impact of misoperation. Workspace admins can restrict access to schedule creation and management in a workspace.

Admins can configure the following action rules related to schedules in the Azure portal. For more information, see Manage access to Azure Machine Learning workspaces.

Action	Description	Rule
Read	Get and list schedules	Microsoft.MachineLearningServices/workspaces/schedules/read
Write	Create, update, disable, and enable schedules	Microsoft.MachineLearningServices/workspaces/schedules/write
Delete	Delete schedules	Microsoft.MachineLearningServices/workspaces/schedules/delete

Cost considerations

Schedules are billed based on the number of schedules. Each schedule creates a logic app that Azure Machine Learning hosts on behalf of (HOBO) the user.

The logic app charges back to the user's Azure subscription. HOBO resource costs are billed using the same meter emitted by the original resource provider. Charges appear under the host resource, which is the Azure Machine Learning workspace.

Deila með

Schedule machine learning pipeline jobs

Prerequisites

Limitations

Create a schedule

Define a time-based schedule with a recurrence pattern

Parameters

Define a time-based schedule with a cron expression

Parameters

Change job settings when you define schedules

Use supported expressions in schedules

Manage schedule

List schedules

View schedule details

Update a schedule

Update a new version pipeline to an existing schedule

Disable a schedule

Enable a schedule

Delete a schedule

Query triggered jobs from a schedule

Role-based access control (RBAC) support

Cost considerations

Athugasemdir

Fleiri tilföng

Deila með

Schedule machine learning pipeline jobs

Prerequisites

Limitations

Create a schedule

Define a time-based schedule with a recurrence pattern

Parameters

Define a time-based schedule with a cron expression

Parameters

Change job settings when you define schedules

Use supported expressions in schedules

Manage schedule

List schedules

View schedule details

Update a schedule

Disable a schedule

Enable a schedule

Delete a schedule

Query triggered jobs from a schedule

Role-based access control (RBAC) support

Cost considerations

Related content

Athugasemdir

Fleiri tilföng