Run an R job to train a model

APPLIES TO: Azure CLI ml extension v2 (current)

This article explains how to take the R script that you adapted to run in production and set it up to run as an R job using the Azure Machine Learning CLI V2.

Note

Although the title of this article refers to training a model, you can actually run any kind of R script as long as it meets the requirements listed in the adapting article.

Prerequisites

Create a folder with this structure

Create this folder structure for your project:

📁 r-job-azureml
├─ src
│  ├─ azureml_utils.R
│  ├─ r-source.R
├─ job.yml

Important

All source code goes in the src directory.

  • The r-source.R file is the R script that you adapted to run in production. Make sure you follow the steps to crate and log your model in this script.
  • The azureml_utils.R file is necessary. Use this source code for the contents of the file.

Prepare the job YAML

Azure Machine Learning CLI v2 has different different YAML schemas for different operations. You use the job YAML schema to submit a job in the job.yml file that is a part of this project.

You need to gather specific pieces of information to put into the YAML:

  • The name of the registered data asset you use as the data input (with version): azureml:<REGISTERED-DATA-ASSET>:<VERSION>
  • The name of the environment you created (with version): azureml:<R-ENVIRONMENT-NAME>:<VERSION>
  • The name of the compute cluster: azureml:<COMPUTE-CLUSTER-NAME>

Tip

For Azure Machine Learning artifacts that require versions (data assets, environments), you can use the shortcut URI azureml:<AZUREML-ASSET>@latest to get the latest version of that artifact if you don't need to set a specific version.

Sample YAML schema to submit a job

Edit your job.yml file to contain the following. Make sure to replace values shown <IN-BRACKETS-AND-CAPS> and remove the brackets.

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
# the Rscript command goes in the command key below. Here you also specify 
# which parameters are passed into the R script and can reference the input
# keys and values further below
# Modify any value shown below <IN-BRACKETS-AND-CAPS> (remove the brackets)
command: >
Rscript <NAME-OF-R-SCRIPT>.R
--data_file ${{inputs.datafile}}  
--other_input_parameter ${{inputs.other}}
code: src   # this is the code directory
inputs:
  datafile: # this is a registered data asset
    type: uri_file
    path: azureml:<REGISTERED-DATA-ASSET>@latest
  other: 1  # this is a sample parameter, which is the number 1 (as text)
environment: azureml:<R-ENVIRONMENT-NAME>@latest
compute: azureml:<COMPUTE-CLUSTER-OR-INSTANCE-NAME>
experiment_name: <NAME-OF-EXPERIMENT>
description: <DESCRIPTION>

Submit the job

In the following commands in this section, you may need to know:

  • The Azure Machine Learning workspace name
  • The resource group name where the workspace is
  • The subscription where the workspace is

Find these values from Azure Machine Learning studio:

  1. Sign in and open your workspace.
  2. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
  3. You can copy the values from the section that appears.

Screenshot: Find the values to use in your CLI command.

To submit the job, run the following commands in a terminal window:

  1. Change directories into the r-job-azureml.

    cd r-job-azureml
    
  2. Sign in to Azure. If you're doing this from an Azure Machine Learning compute instance, use:

    az login --identity
    

    If you're not on the compute instance, omit --identity and follow the prompt to open a browser window to authenticate.

  3. Make sure you have the most recent versions of the CLI and the ml extension:

    az upgrade
    
  4. If you have multiple Azure subscriptions, set the active subscription to the one you're using for your workspace. (You can skip this step if you only have access to a single subscription.) Replace <SUBSCRIPTION-NAME> with your subscription name. Also remove the brackets <>.

    az account set --subscription "<SUBSCRIPTION-NAME>"
    
  5. Now use CLI to submit the job. If you're doing this on a compute instance in your workspace, you can use environment variables for the workspace name and resource group as show in the following code. If you aren't on a compute instance, replace these values with your workspace name and resource group.

    az ml job create -f job.yml  --workspace-name $CI_WORKSPACE --resource-group $CI_RESOURCE_GROUP
    

Once you've submitted the job, you can check the status and results in studio:

  1. Sign in to Azure Machine Learning studio.
  2. Select your workspace if it isn't already loaded.
  3. On the left navigation, select Jobs.
  4. Select the Experiment name that you used to train your model.
  5. Select the Display name of the job to view details and artifacts of the job, including metrics, images, child jobs, outputs, logs, and code used in the job.

Register model

Finally, once the training job is complete, register your model if you want to deploy it. Start in the studio from the page showing your job details.

  1. Once your job completes, select Outputs + logs to view outputs of the job.

  2. Open the models folder to verify that crate.bin and MLmodel are present. If not, check the logs to see if there was an error.

  3. On the toolbar at the top, select + Register model.

    Screenshot shows the Job section of studio with the Outputs section open.

  4. Don't use the MLflow model type, even though it's detected. Change Model type from the default MLflow to Unspecified type. Leaving it as MLflow will cause an error.

  5. For Job output, select models, the folder that contains the model.

  6. Select Next.

  7. Supply the name you wish to use for your model. Add Description, Version, and Tags if you wish.

  8. Select Next.

  9. Review the information.

  10. Select Register.

At the top of the page, you'll see a confirmation that the model is registered. The confirmation looks similar to this:

Screenshot shows example of successful registration.

Select Click here to go to this model. if you wish to view the registered model details.

Next steps

Now that you have a registered model, learn How to deploy an R model to an online (real time) endpoint.