Schedule data import jobs (preview)

Article
08/28/2024

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, you'll learn how to programmatically schedule data imports, using the schedule UI to do it. You can create a schedule based on elapsed time. Time-based schedules can handle routine tasks - for example, regular data imports to keep them up-to-date. After learning how to create schedules, you'll learn how to retrieve, update and deactivate them via CLI, SDK, and studio UI resources.

Prerequisites

You need an Azure subscription to use Azure Machine Learning. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning today.

Install the Azure CLI and the ml extension. Follow the installation steps in Install, set up, and use the CLI (v2).
Create an Azure Machine Learning workspace if you don't have one. For workspace creation, see Install, set up, and use the CLI (v2).

Schedule data import

To import data on a recurring basis, you must create a schedule. A Schedule associates a data import action with a trigger. The trigger can either be cron, which uses a cron expression to describe the delay between runs, or a recurrence, which specifies the frequency to trigger a job. In each case, you must first build an import data definition. An existing data import, or a data import that is defined inline, works for this. For more information, visit Create a data import in CLI, SDK and UI.

Create a schedule

Create a time-based schedule with recurrence pattern

APPLIES TO: Azure CLI ml extension v2 (current)

YAML: Schedule for data import with recurrence pattern

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: simple_recurrence_import_schedule
display_name: Simple recurrence import schedule
description: a simple hourly recurrence import schedule

trigger:
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 #every day
  schedule:
    hours: [4,5,10,11,12]
    minutes: [0,30]
  start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
  time_zone: "Pacific Standard Time" # optional - default will be UTC

import_data: ./my-snowflake-import-data.yaml

YAML: Schedule for data import definition inline with recurrence pattern on managed datastore

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: inline_recurrence_import_schedule
display_name: Inline recurrence import schedule
description: an inline hourly recurrence import schedule

trigger:
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 #every day
  schedule:
    hours: [4,5,10,11,12]
    minutes: [0,30]
  start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
  time_zone: "Pacific Standard Time" # optional - default will be UTC

import_data:
  type: mltable
  name: my_snowflake_ds
  path: azureml://datastores/workspacemanagedstore
  source:
    type: database
    query: select * from TPCH_SF1.REGION
    connection: azureml:my_snowflake_connection

A trigger contains these properties:

(Required) type specifies the schedule type, either recurrence or cron. The following section has more information.

Next, run this command in the CLI:

> az ml schedule create -f <file-name>.yml

APPLIES TO: Python SDK azure-ai-ml v2 (current)

from azure.ai.ml.data_transfer import Database
from azure.ai.ml.constants import TimeZone
from azure.ai.ml.entities import (
    ImportDataSchedule,
    RecurrenceTrigger,
    RecurrencePattern,
)
from datetime import datetime

source = Database(connection="azureml:my_sf_connection", query="select * from my_table")

path = "azureml://datastores/workspaceblobstore/paths/snowflake/schedule/${{name}}"


my_data = DataImport(
    type="mltable", source=source, path=path, name="my_schedule_sfds_test"
)

schedule_name = "my_simple_sdk_create_schedule_recurrence"

schedule_start_time = datetime.utcnow()

recurrence_trigger = RecurrenceTrigger(
    frequency="day",
    interval=1,
    schedule=RecurrencePattern(hours=1, minutes=[0, 1]),
    start_time=schedule_start_time,
    time_zone=TimeZone.UTC,
)

import_schedule = ImportDataSchedule(
    name=schedule_name, trigger=recurrence_trigger, import_data=my_data
)

ml_client.schedules.begin_create_or_update(import_schedule).result()

RecurrenceTrigger contains following properties:

(Required) For a better coding experience, use RecurrenceTrigger for the recurrence schedule.

Note

These properties apply to CLI and SDK:

(Required) frequency specifies the unit of time that describes how often the schedule fires. Can have values
- minute
- hour
- day
- week
- month
(Required) interval specifies how often the schedule fires based on the frequency, which is the number of time units to wait until the schedule fires again.
(Optional) schedule defines the recurrence pattern, containing hours, minutes, and weekdays.
- When frequency equals day, a pattern can specify hours and minutes.
- When frequency equals week and month, a pattern can specify hours, minutes and weekdays.
- hours should be an integer or a list, ranging between 0 and 23.
- minutes should be an integer or a list, ranging between 0 and 59.
- weekdays a string or list ranging from monday to sunday.
- If schedule is omitted, the job(s) triggers fire according to the logic of start_time, frequency and interval.
(Optional) start_time describes the start date and time, with a timezone. If start_time is omitted, start_time equals the job creation time. For a start time in the past, the first job runs at the next calculated run time.
(Optional) end_time describes the end date and time with a timezone. If end_time is omitted, the schedule continues to trigger jobs until the schedule is manually disabled.
(Optional) time_zone specifies the time zone of the recurrence. If omitted, the default timezone is UTC. For more information about timezone values, visit appendix for timezone values.

Create a time-based schedule with cron expression

YAML: Schedule for a data import with cron expression

APPLIES TO: Azure CLI ml extension v2 (current)

YAML: Schedule for data import with cron expression (preview)

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: simple_cron_import_schedule
display_name: Simple cron import schedule
description: a simple hourly cron import schedule

trigger:
  type: cron
  expression: "0 * * * *"
  start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
  time_zone: "Pacific Standard Time" # optional - default will be UTC

import_data: ./my-snowflake-import-data.yaml

YAML: Schedule for data import definition inline with cron expression (preview)

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: inline_cron_import_schedule
display_name: Inline cron import schedule
description: an inline hourly cron import schedule

trigger:
  type: cron
  expression: "0 * * * *"
  start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
  time_zone: "Pacific Standard Time" # optional - default will be UTC

import_data:
  type: mltable
  name: my_snowflake_ds
  path: azureml://datastores/workspaceblobstore/paths/snowflake/${{name}}
  source:
    type: database
    query: select * from TPCH_SF1.REGION
    connection: azureml:my_snowflake_connection

The trigger section defines the schedule details and contains these properties:

(Required) type specifies the cron schedule type.

> az ml schedule create -f <file-name>.yml

The list continues here:

APPLIES TO: Python SDK azure-ai-ml v2 (current)

from azure.ai.ml.data_transfer import Database
from azure.ai.ml.constants import TimeZone
from azure.ai.ml.entities import CronTrigger, ImportDataSchedule

source = Database(connection="azureml:my_sf_connection", query="select * from my_table")

path = "azureml://datastores/workspaceblobstore/paths/snowflake/schedule/${{name}}"


my_data = DataImport(
    type="mltable", source=source, path=path, name="my_schedule_sfds_test"
)

schedule_name = "my_simple_sdk_create_schedule_cron"

cron_trigger = CronTrigger(
    expression="15 10 * * 1",
    start_time=datetime.utcnow(),
    end_time="2023-12-03T18:40:00",
)
import_schedule = ImportDataSchedule(
    name=schedule_name, trigger=cron_trigger, import_data=my_data
)
ml_client.schedules.begin_create_or_update(import_schedule).result()

The CronTrigger section defines the schedule details and contains these properties:

(Required) For a better coding experience, use CronTrigger for the recurrence schedule.

The list continues here:

(Required) expression uses a standard crontab expression to express a recurring schedule. A single expression is composed of five space-delimited fields:

MINUTES HOURS DAYS MONTHS DAYS-OF-WEEK

A single wildcard (*), which covers all values for the field. A *, in days, means all days of a month (which varies with month and year).
The expression: "15 16 * * 1" in the sample above means the 16:15PM on every Monday.

This table lists the valid values for each field:

Field	Range	Comment
`MINUTES`	0-59	-
`HOURS`	0-23	-
`DAYS`	-	Not supported. The value is ignored and treated as `*`.
`MONTHS`	-	Not supported. The value is ignored and treated as `*`.
`DAYS-OF-WEEK`	0-6	Zero (0) means Sunday. Names of days also accepted.

For more information about crontab expressions, visit the Crontab Expression wiki resource on GitHub.

Important

DAYS and MONTH are not supported. If you pass one of these values, it will be ignored and treated as *.

(Optional) start_time specifies the start date and time with the timezone of the schedule. For example, start_time: "2022-05-10T10:15:00-04:00" means the schedule starts from 10:15:00AM on 2022-05-10 in the UTC-4 timezone. If start_time is omitted, the start_time equals the schedule creation time. For a start time in the past, the first job runs at the next calculated run time.
(Optional) end_time describes the end date, and time with a timezone. If end_time is omitted, the schedule continues to trigger jobs until the schedule is manually disabled.
(Optional) time_zonespecifies the time zone of the expression. If time_zone is omitted, the timezone is UTC by default. For more information about timezone values, visit appendix for timezone values.

Limitations:

Currently, Azure Machine Learning v2 scheduling doesn't support event-based triggers.
Use the Azure Machine Learning SDK/CLI v2 to specify a complex recurrence pattern that contains multiple trigger timestamps. The UI only displays the complex pattern and doesn't support editing.
If you set the recurrence as the 31st day of every month, the schedule won't trigger jobs in months with less than 31 days.

List schedules in a workspace

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule list

APPLIES TO: Python SDK azure-ai-ml v2 (current)

schedules = ml_client.schedules.list()
[s.name for s in schedules]

Check schedule detail

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule show -n simple_cron_data_import_schedule

APPLIES TO: Python SDK azure-ai-ml v2 (current)

created_schedule = ml_client.schedules.get(name=schedule_name)
[created_schedule.name]

Update a schedule

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule update -n simple_cron_data_import_schedule  --set description="new description" --no-wait

Note

To update more than just tags/description, we recommend use of az ml schedule create --file update_schedule.yml

APPLIES TO: Python SDK azure-ai-ml v2 (current)

job_schedule = ml_client.schedules.begin_create_or_update(
    schedule=job_schedule
).result()
print(job_schedule)

Disable a schedule

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule disable -n simple_cron_data_import_schedule --no-wait

job_schedule = ml_client.schedules.begin_disable(name=schedule_name).result()
job_schedule.is_enabled

Enable a schedule

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule enable -n simple_cron_data_import_schedule --no-wait

APPLIES TO: Python SDK azure-ai-ml v2 (current)

# Update trigger expression
job_schedule.trigger.expression = "10 10 * * 1"
job_schedule = ml_client.schedules.begin_create_or_update(
    schedule=job_schedule
).result()
print(job_schedule)

Delete a schedule

Important

A schedule must be disabled before deletion. Deletion is a permanent, unrecoverable action. After a schedule is deleted, you can never access or recover it.

APPLIES TO: Azure CLI ml extension v2 (current)

az ml schedule delete -n simple_cron_data_import_schedule

APPLIES TO: Python SDK azure-ai-ml v2 (current)

# Only disabled schedules can be deleted
ml_client.schedules.begin_disable(name=schedule_name).result()
ml_client.schedules.begin_delete(name=schedule_name).result()

RBAC (Role-based-access-control) support

Schedules are generally used for production. To prevent problems, workspace admins may want to restrict schedule creation and management permissions within a workspace.

There are currently three action rules related to schedules, and you can configure them in the Azure portal. For more information, visit how to manage access to an Azure Machine Learning workspace..

Action	Description	Rule
Read	Get and list schedules in Machine Learning workspace	Microsoft.MachineLearningServices/workspaces/schedules/read
Write	Create, update, disable and enable schedules in Machine Learning workspace	Microsoft.MachineLearningServices/workspaces/schedules/write
Delete	Delete a schedule in Machine Learning workspace	Microsoft.MachineLearningServices/workspaces/schedules/delete

Next steps

Learn more about the CLI (v2) data import schedule YAML schema.
Learn how to manage imported data assets.

Additional resources

Documentation

Manage imported data assets (preview) - Azure Machine Learning

Learn how to manage imported data assets also known as edit autodeletion.
Create connections to external data sources (preview) - Azure Machine Learning

Learn how to use connections to connect to External data sources for training with Azure Machine Learning.
Import data (preview) - Azure Machine Learning

Learn how to import data from external sources to the Azure Machine Learning platform.
Secure data access in the cloud v1 - Azure Machine Learning

Learn how to securely connect to your data storage on Azure with Azure Machine Learning datastores and datasets v1
Create Data Assets - Azure Machine Learning

Learn how to create Azure Machine Learning data assets
Access data from Azure cloud storage during interactive development - Azure Machine Learning

Access data from Azure cloud storage during interactive development
Use datastores - Azure Machine Learning

Learn how to use datastores to connect to Azure storage services during training with Azure Machine Learning.

Training

Module

Automate workloads with Azure Databricks Jobs - Training

Automate workloads with Azure Databricks Jobs

Certification

Microsoft Certified: Azure Data Scientist Associate - Certifications

Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.

The future is yours

Share via

Prerequisites

Schedule data import

Create a schedule

Create a time-based schedule with recurrence pattern

YAML: Schedule for data import with recurrence pattern

YAML: Schedule for data import definition inline with recurrence pattern on managed datastore

Create a time-based schedule with cron expression

YAML: Schedule for a data import with cron expression

YAML: Schedule for data import with cron expression (preview)

YAML: Schedule for data import definition inline with cron expression (preview)

List schedules in a workspace

Check schedule detail

Update a schedule

Update a data import definition to existing schedule

Update in schedule detail page

Disable a schedule

Enable a schedule

Delete a schedule

RBAC (Role-based-access-control) support

Next steps

Share via

Schedule data import jobs (preview)

Prerequisites

Schedule data import

Create a schedule

Create a time-based schedule with recurrence pattern

YAML: Schedule for data import with recurrence pattern

YAML: Schedule for data import definition inline with recurrence pattern on managed datastore

Create a time-based schedule with cron expression

YAML: Schedule for a data import with cron expression

YAML: Schedule for data import with cron expression (preview)

YAML: Schedule for data import definition inline with cron expression (preview)

List schedules in a workspace

Check schedule detail

Update a schedule

Disable a schedule

Enable a schedule

Delete a schedule

RBAC (Role-based-access-control) support

Next steps

Feedback

Additional resources