In this article, you'll learn how to programmatically schedule data imports, using the schedule UI to do it. You can create a schedule based on elapsed time. Time-based schedules can handle routine tasks - for example, regular data imports to keep them up-to-date. After learning how to create schedules, you'll learn how to retrieve, update and deactivate them via CLI, SDK, and studio UI resources.
Prerequisites
You need an Azure subscription to use Azure Machine Learning. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning today.
To import data on a recurring basis, you must create a schedule. A Schedule associates a data import action with a trigger. The trigger can either be cron, which uses a cron expression to describe the delay between runs, or a recurrence, which specifies the frequency to trigger a job. In each case, you must first build an import data definition. An existing data import, or a data import that is defined inline, works for this. For more information, visit Create a data import in CLI, SDK and UI.
Create a schedule
Create a time-based schedule with recurrence pattern
Under Assets in the left navigation, select Data. At the Data import tab, select the imported data asset to which you want to attach a schedule. The Import jobs history page should appear, as shown in this screenshot:
At the Import jobs history page, select the latest Import job name hyperlink URL, to open the pipelines job details page as shown in this screenshot:
At the pipeline job details page of any data import, select Schedule -> Create new schedule to open the schedule creation wizard, as shown in this screenshot:
The Basic settings of the schedule creation wizard have the properties shown in this screenshot:
Name: the unique identifier of the schedule within the workspace.
Description: the schedule description.
Trigger: the recurrence pattern of the schedule, which includes these properties:
Time zone: the trigger time calculation is based on this time zone; (UTC) Coordinated Universal Time by default.
Recurrence or Cron expression: select recurrence to specify the recurring pattern. Under Recurrence, you can specify the recurrence frequency - by minutes, hours, days, weeks, or months.
Start: the schedule first becomes active on this date. By default, the creation date of this schedule.
End: the schedule will become inactive after this date. By default, it's NONE, which means that the schedule remains active until you manually disable it.
Tags: the selected schedule tags.
After you configure the basic settings, you can select Review + Create, and the schedule will automatically submit the data import based on the recurrence pattern you specified. You can also select Next, and navigate through the wizard to select or update the data import parameters.
Note
These properties apply to CLI and SDK:
(Required)frequency specifies the unit of time that describes how often the schedule fires. Can have values
minute
hour
day
week
month
(Required)interval specifies how often the schedule fires based on the frequency, which is the number of time units to wait until the schedule fires again.
(Optional) schedule defines the recurrence pattern, containing hours, minutes, and weekdays.
When frequency equals day, a pattern can specify hours and minutes.
When frequency equals week and month, a pattern can specify hours, minutes and weekdays.
hours should be an integer or a list, ranging between 0 and 23.
minutes should be an integer or a list, ranging between 0 and 59.
weekdays a string or list ranging from monday to sunday.
If schedule is omitted, the job(s) triggers fire according to the logic of start_time, frequency and interval.
(Optional) start_time describes the start date and time, with a timezone. If start_time is omitted, start_time equals the job creation time. For a start time in the past, the first job runs at the next calculated run time.
(Optional) end_time describes the end date and time with a timezone. If end_time is omitted, the schedule continues to trigger jobs until the schedule is manually disabled.
(Optional) time_zone specifies the time zone of the recurrence. If omitted, the default timezone is UTC. For more information about timezone values, visit appendix for timezone values.
Under Assets in the left navigation, select Data. On the Data import tab, select the imported data asset to which you want to attach a schedule. The Import jobs history page should appear, as shown in this screenshot:
At the Import jobs history page, select the latest Import job name hyperlink URL, to open the pipelines job details page as shown in this screenshot:
At the pipeline job details page of any data import, select Schedule -> Create new schedule to open the schedule creation wizard, as shown in this screenshot:
The Basic settings of the schedule creation wizard have the properties shown in this screenshot:
Name: the unique identifier of the schedule within the workspace.
Description: the schedule description.
Trigger: the recurrence pattern of the schedule, which includes these properties:
Time zone: the trigger time calculation is based on this time zone; (UTC) Coordinated Universal Time by default.
Recurrence or Cron expression: select recurrence to specify the recurring pattern. With Cron expression, you can specify a more flexible and customized recurrence pattern.
Start: the schedule first becomes active on this date. By default, the creation date of this schedule.
End: the schedule will become inactive after this date. By default, it's NONE, which means that the schedule remains active until you manually disable it.
Tags: the selected schedule tags.
After you configure the basic settings, you can select Review + Create, and the schedule will automatically submit the data import based on the recurrence pattern you specified. You can also select Next, and navigate through the wizard to select or update the data import parameters.
(Required)expression uses a standard crontab expression to express a recurring schedule. A single expression is composed of five space-delimited fields:
MINUTES HOURS DAYS MONTHS DAYS-OF-WEEK
A single wildcard (*), which covers all values for the field. A *, in days, means all days of a month (which varies with month and year).
The expression: "15 16 * * 1" in the sample above means the 16:15PM on every Monday.
This table lists the valid values for each field:
Field
Range
Comment
MINUTES
0-59
-
HOURS
0-23
-
DAYS
-
Not supported. The value is ignored and treated as *.
MONTHS
-
Not supported. The value is ignored and treated as *.
DAYS-OF-WEEK
0-6
Zero (0) means Sunday. Names of days also accepted.
DAYS and MONTH are not supported. If you pass one of these values, it will be ignored and treated as *.
(Optional) start_time specifies the start date and time with the timezone of the schedule. For example, start_time: "2022-05-10T10:15:00-04:00" means the schedule starts from 10:15:00AM on 2022-05-10 in the UTC-4 timezone. If start_time is omitted, the start_time equals the schedule creation time. For a start time in the past, the first job runs at the next calculated run time.
(Optional) end_time describes the end date, and time with a timezone. If end_time is omitted, the schedule continues to trigger jobs until the schedule is manually disabled.
(Optional) time_zonespecifies the time zone of the expression. If time_zone is omitted, the timezone is UTC by default. For more information about timezone values, visit appendix for timezone values.
Limitations:
Currently, Azure Machine Learning v2 scheduling doesn't support event-based triggers.
Use the Azure Machine Learning SDK/CLI v2 to specify a complex recurrence pattern that contains multiple trigger timestamps. The UI only displays the complex pattern and doesn't support editing.
If you set the recurrence as the 31st day of every month, the schedule won't trigger jobs in months with less than 31 days.
schedules = ml_client.schedules.list()
[s.name for s in schedules]
In the studio portal, under the Jobs extension, select the All schedules tab. That tab shows all your job schedules created by the SDK/cli/UI, in a single list. In the schedule list, you have an overview of all schedules in this workspace, as shown in this screenshot:
Update a data import definition to existing schedule
To change the import frequency, or to create a new association for the data import job, you can update the import definition of an existing schedule.
Note
To update an existing schedule, the association of the schedule with the old import definition will be removed. A schedule can have only one import job definition. However, multiple schedules can call one data import definition.
Under Assets in the left navigation, select Data. On the Data import tab, select the imported data asset to which you want to attach a schedule. The Import jobs history page should appear, as shown in this screenshot:
At the Import jobs history page, select the latest Import job name link, to open the pipelines job details page as shown in this screenshot:
At the pipeline job details page of any data import, select Schedule -> Updated to existing schedule, to open the Select schedule wizard, as shown in this screenshot:
Select an existing schedule from the list, as shown in this screenshot:
Important
Make sure you select the correct schedule to update. Once you finish the update, the schedule will trigger different data imports.
You can also modify the source, query and change the destination path, for future data imports that the schedule triggers.
Select Review + Update to finish the update process. The completed update will send a notification.
You can view the new data import definition in the schedule details page when the update is completed.
Update in schedule detail page
In the schedule details page, you can select Update settings to update both the basic settings and advanced settings, including the job input/output and runtime settings of the schedule, as shown in this screenshot:
At the schedule details page, you can enable the current schedule. You can also enable schedules at the All schedules tab.
Delete a schedule
Important
A schedule must be disabled before deletion. Deletion is a permanent, unrecoverable action. After a schedule is deleted, you can never access or recover it.
# Only disabled schedules can be deleted
ml_client.schedules.begin_disable(name=schedule_name).result()
ml_client.schedules.begin_delete(name=schedule_name).result()
You can delete a schedule from the schedule details page or the all schedules tab.
RBAC (Role-based-access-control) support
Schedules are generally used for production. To prevent problems, workspace admins may want to restrict schedule creation and management permissions within a workspace.
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.