Azure Machine Learning provides multiple ways to submit ML training jobs. In this article, you'll learn how to submit jobs using the following methods:
Azure CLI extension for machine learning: The ml extension, also referred to as CLI v2.
Python SDK v2 for Azure Machine Learning.
REST API: The API that the CLI and SDK are built on.
The curl utility. The curl program is available in the Windows Subsystem for Linux or any UNIX distribution.
Tip
In PowerShell, curl is an alias for Invoke-WebRequest and curl -d "key=val" -X POST uri becomes Invoke-WebRequest -Body "key=val" -Method POST -Uri uri.
While it is possible to call the REST API from PowerShell, the examples in this article assume you are using Bash.
The jq utility for processing JSON. This utility is used to extract values from the JSON documents that are returned from REST API calls.
Clone the examples repository
The code snippets in this article are based on examples in the Azure Machine Learning examples GitHub repo. To clone the repository to your development environment, use the following command:
Use --depth 1 to clone only the latest commit to the repository, which reduces time to complete the operation.
Example job
The examples in this article use the iris flower dataset to train an MLFlow model.
Train in the cloud
When training in the cloud, you must connect to your Azure Machine Learning workspace and select a compute resource that will be used to run the training job.
1. Connect to the workspace
Tip
Use the tabs below to select the method you want to use to train a model. Selecting a tab will automatically switch all the tabs in this article to the same tab. You can select another tab at any time.
To connect to the workspace, you need identifier parameters - a subscription, resource group, and workspace name. You'll use these details in the MLClient from the azure.ai.ml namespace to get a handle to the required Azure Machine Learning workspace. To authenticate, you use the default Azure authentication. Check this example for more details on how to configure credentials and connect to a workspace.
#import required libraries
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
#Enter details of your Azure Machine Learning workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace = '<AZUREML_WORKSPACE_NAME>'
#connect to the workspace
ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)
When using the Azure CLI, you need identifier parameters - a subscription, resource group, and workspace name. While you can specify these parameters for each command, you can also set defaults that will be used for all the commands. Use the following commands to set default values. Replace <subscription ID>, <Azure Machine Learning workspace name>, and <resource group> with the values for your configuration:
az account set --subscription <subscription ID>
az configure --defaults workspace=<Azure Machine Learning workspace name> group=<resource group>
The REST API examples in this article use $SUBSCRIPTION_ID, $RESOURCE_GROUP, $LOCATION, and $WORKSPACE placeholders. Replace the placeholders with your own values as follows:
$SUBSCRIPTION_ID: Your Azure subscription ID.
$RESOURCE_GROUP: The Azure resource group that contains your workspace.
$LOCATION: The Azure region where your workspace is located.
$WORKSPACE: The name of your Azure Machine Learning workspace.
$COMPUTE_NAME: The name of your Azure Machine Learning compute cluster.
Administrative REST requests a service principal authentication token. You can retrieve a token with the following command. The token is stored in the $TOKEN environment variable:
The service provider uses the api-version argument to ensure compatibility. The api-version argument varies from service to service. Set the API version as a variable to accommodate future versions:
API_VERSION="2022-05-01"
When you train using the REST API, data and training scripts must be uploaded to a storage account that the workspace can access. The following example gets the storage information for your workspace and saves it into variables so we can use it later:
An Azure Machine Learning compute cluster is a fully managed compute resource that can be used to run the training job. In the following examples, a compute cluster named cpu-compute is created.
While a response is returned after a few seconds, this only indicates that the creation request has been accepted. It can take several minutes for the cluster creation to finish.
To run this script, you'll use a command that executes main.py Python script located under ./sdk/python/jobs/single-step/lightgbm/iris/src/. The command will be run by submitting it as a job to Azure Machine Learning.
# submit the command
returned_job = ml_client.jobs.create_or_update(command_job)
# get a URL for the status of the job
returned_job.studio_url
In the above examples, you configured:
code - path where the code to run the command is located
command - command that needs to be run
environment - the environment needed to run the training script. In this example, we use a curated or ready-made environment provided by Azure Machine Learning called AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu. We use the latest version of this environment by using the @latest directive. You can also use custom environments by specifying a base docker image and specifying a conda yaml on top of it.
inputs - dictionary of inputs using name value pairs to the command. The key is a name for the input within the context of the job and the value is the input value. Inputs are referenced in the command using the ${{inputs.<input_name>}} expression. To use files or folders as inputs, you can use the Input class. For more information, see SDK and CLI v2 expressions.
When you submit the job, a URL is returned to the job status in the Azure Machine Learning studio. Use the studio UI to view the job progress. You can also use returned_job.status to check the current status of the job.
The az ml job create command used in this example requires a YAML job definition file. The contents of the file used in this example are:
Note
To use serverless compute, delete compute: azureml:cpu-cluster" in this code.
code - path where the code to run the command is located
command - command that needs to be run
inputs - dictionary of inputs using name value pairs to the command. The key is a name for the input within the context of the job and the value is the input value. Inputs are referenced in the command using the ${{inputs.<input_name>}} expression. For more information, see SDK and CLI v2 expressions.
environment - the environment needed to run the training script. In this example, we use a curated or ready-made environment provided by Azure Machine Learning called AzureML-sklearn-0.24-ubuntu18.04-py37-cpu. We use the latest version of this environment by using the @latest directive. You can also use custom environments by specifying a base docker image and specifying a conda yaml on top of it.
To submit the job, use the following command. The run ID (name) of the training job is stored in the $run_id variable:
run_id=$(az ml job create -f jobs/single-step/scikit-learn/iris/job.yml --query name -o tsv)
You can use the stored run ID to return information about the job. The --web parameter opens the Azure Machine Learning studio web UI where you can drill into details on the job:
az ml job show -n $run_id --web
As part of job submission, the training scripts and data must be uploaded to a cloud storage location that your Azure Machine Learning workspace can access.
Use the following Azure CLI command to upload the training script. The command specifies the directory that contains the files needed for training, not an individual file. If you'd like to use REST to upload the data instead, see the Put Blob reference:
az storage blob upload-batch -d $AZUREML_DEFAULT_CONTAINER/testjob -s cli/jobs/single-step/scikit-learn/iris/src/ --account-name $AZURE_STORAGE_ACCOUNT
Create a versioned reference to the training data. In this example, the data is already in the cloud and located at https://azuremlexamples.blob.core.windows.net/datasets/iris.csv. For more information on referencing data, see Data in Azure Machine Learning:
Register a versioned reference to the training script for use with a job. In this example, the script location is the default storage account and container you uploaded to in step 1. The ID of the versioned training code is returned and stored in the $TRAIN_CODE variable:
Create the environment that the cluster will use to run the training script. In this example, we use a curated or ready-made environment provided by Azure Machine Learning called AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu. The following command retrieves a list of the environment versions, with the newest being at the top of the collection. jq is used to retrieve the ID of the latest ([0]) version, which is then stored into the $ENVIRONMENT variable.
Finally, submit the job. The following example shows how to submit the job, reference the training code ID, environment ID, URL for the input data, and the ID of the compute cluster. The job output location will be stored in the $JOB_OUTPUT variable:
Tip
The job name must be unique. In this example, uuidgen is used to generate a unique value for the name.
The name property returned by the training job is used as part of the path to the model.
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes
run_model = Model(
path="azureml://jobs/{}/outputs/artifacts/paths/model/".format(returned_job.name),
name="run-model-example",
description="Model created from run.",
type=AssetTypes.MLFLOW_MODEL
)
ml_client.models.create_or_update(run_model)
Tip
The name (stored in the $run_id variable) is used as part of the path to the model.
az ml model create -n sklearn-iris-example -v 1 -p runs:/$run_id/model --type mlflow_model
Tip
The name (stored in the $run_id variable) is used as part of the path to the model.