Set up authentication between Azure Machine Learning and other services

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Azure Machine Learning is composed of multiple Azure services. There are multiple ways that authentication can happen between Azure Machine Learning and the services it relies on.

  • The Azure Machine Learning workspace uses a managed identity to communicate with other services. By default, this is a system-assigned managed identity. You can also use a user-assigned managed identity instead.
  • Azure Machine Learning uses Azure Container Registry (ACR) to store Docker images used to train and deploy models. If you allow Azure Machine Learning to automatically create ACR, it will enable the admin account.
  • The Azure Machine Learning compute cluster uses a managed identity to retrieve connection information for datastores from Azure Key Vault and to pull Docker images from ACR. You can also configure identity-based access to datastores, which will instead use the managed identity of the compute cluster.
  • Data access can happen along multiple paths depending on the data storage service and your configuration. For example, authentication to the datastore may use an account key, token, security principal, managed identity, or user identity.
  • Managed online endpoints can use a managed identity to access Azure resources when performing inference. For more information, see Access Azure resources from an online endpoint.

Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

  • An Azure Machine Learning workspace. If you don't have one, use the steps in the Quickstart: Create workspace resources article to create one.

  • The Azure CLI and the ml extension or the Azure Machine Learning Python SDK v2:

  • To assign roles, the login for your Azure subscription must have the Managed Identity Operator role, or other role that grants the required actions (such as Owner).

  • You must be familiar with creating and working with Managed Identities.

Azure Container Registry and identity types

The following table lists the support matrix when authenticating to Azure Container Registry, depending on the authentication method and the public network access workspace flag.

Authentication method Public network access
disabled
Public network access
enabled
Admin user
Workspace system-assigned managed identity
Workspace user-assigned managed identity
with the ACRPull role assigned to the identity

User-assigned managed identity

Workspace

You can add a user-assigned managed identity when creating an Azure Machine Learning workspace from the Azure portal. Use the following steps while creating the workspace:

  1. From the Basics page, select the Azure Storage Account, Azure Container Registry, and Azure Key Vault you want to use with the workspace.
  2. From the Identity page, select User-assigned identity and then select the managed identity to use.

The following Azure RBAC role assignments are required on your user-assigned managed identity for your Azure Machine Learning workspace to access data on the workspace-associated resources.

Resource Permission
Azure Machine Learning workspace Contributor
Azure Storage Contributor (control plane) + Storage Blob Data Contributor (data plane, optional, to enable data preview in the Azure Machine Learning studio)
Azure Key Vault (when using RBAC permission model) Contributor (control plane) + Key Vault Administrator (data plane)
Azure Key Vault (when using access policies permission model) Contributor + any access policy permissions besides purge operations
Azure Container Registry Contributor
Azure Application Insights Contributor

For automated creation of role assignments on your user-assigned managed identity, you may use this ARM template.

Tip

For a workspace with customer-managed keys for encryption, you can pass in a user-assigned managed identity to authenticate from storage to Key Vault. Use the user-assigned-identity-for-cmk-encryption (CLI) or user_assigned_identity_for_cmk_encryption (SDK) parameters to pass in the managed identity. This managed identity can be the same or different as the workspace primary user assigned managed identity.

To create a workspace with multiple user assigned identities, use one of the following methods:

APPLIES TO: Azure CLI ml extension v2 (current)

az ml workspace create -f workspace_creation_with_multiple_UAIs.yml --subscription <subscription ID> --resource-group <resource group name> --name <workspace name>

Where the contents of workspace_creation_with_multiple_UAIs.yml are as follows:

location: <region name>
identity:
   type: user_assigned
   user_assigned_identities:
    '<UAI resource ID 1>': {}
    '<UAI resource ID 2>': {}
storage_account: <storage acccount resource ID>
key_vault: <key vault resource ID>
image_build_compute: <compute(virtual machine) resource ID>
primary_user_assigned_identity: <one of the UAI resource IDs in the above list>

To update user assigned identities for a workspace, includes adding a new one or deleting the existing ones, use one of the following methods:

APPLIES TO: Azure CLI ml extension v2 (current)

az ml workspace update -f workspace_update_with_multiple_UAIs.yml --subscription <subscription ID> --resource-group <resource group name> --name <workspace name>

Where the contents of workspace_update_with_multiple_UAIs.yml are as follows:

identity:
   type: user_assigned
   user_assigned_identities:
    '<UAI resource ID 1>': {}
    '<UAI resource ID 2>': {}
primary_user_assigned_identity: <one of the UAI resource IDs in the above list>

Tip

To add a new UAI, you can specify the new UAI ID under the section user_assigned_identities in addition to the existing UAIs, it's required to pass all the existing UAI IDs.
To delete one or more existing UAIs, you can put the UAI IDs which needs to be preserved under the section user_assigned_identities, the rest UAI IDs would be deleted.
To update identity type from SAI to UAI|SAI, you can change type from "user_assigned" to "system_assigned, user_assigned".

Compute cluster

Note

Azure Machine Learning compute clusters support only one system-assigned identity or multiple user-assigned identities, not both concurrently.

The default managed identity is the system-assigned managed identity or the first user-assigned managed identity.

During a run there are two applications of an identity:

  1. The system uses an identity to set up the user's storage mounts, container registry, and datastores.

    • In this case, the system will use the default-managed identity.
  2. You apply an identity to access resources from within the code for a submitted job:

    • In this case, provide the client_id corresponding to the managed identity you want to use to retrieve a credential.
    • Alternatively, get the user-assigned identity's client ID through the DEFAULT_IDENTITY_CLIENT_ID environment variable.

    For example, to retrieve a token for a datastore with the default-managed identity:

    client_id = os.environ.get('DEFAULT_IDENTITY_CLIENT_ID')
    credential = ManagedIdentityCredential(client_id=client_id)
    token = credential.get_token('https://storage.azure.com/')
    

To configure a compute cluster with managed identity, use one of the following methods:

APPLIES TO: Azure CLI ml extension v2 (current)

az ml compute create -f create-cluster.yml

Where the contents of create-cluster.yml are as follows:

$schema: https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json 
name: basic-example
type: amlcompute
size: STANDARD_DS3_v2
min_instances: 0
max_instances: 2
idle_time_before_scale_down: 120
identity:
  type: user_assigned
  user_assigned_identities: 
    - resource_id: "identity_resource_id"

For comparison, the following example is from a YAML file that creates a cluster that uses a system-assigned managed identity:

$schema: https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json 
name: basic-example
type: amlcompute
size: STANDARD_DS3_v2
min_instances: 0
max_instances: 2
idle_time_before_scale_down: 120
identity:
  type: system_assigned

If you have an existing compute cluster, you can change between user-managed and system-managed identity. The following examples demonstrate how to change the configuration:

User-assigned managed identity

export MSI_NAME=my-cluster-identity
export COMPUTE_NAME=mycluster-msi

does_compute_exist()
{
  if [ -z $(az ml compute show -n $COMPUTE_NAME --query name) ]; then
    echo false
  else
    echo true
  fi
}

echo "Creating MSI $MSI_NAME"
# Get the resource id of the identity
IDENTITY_ID=$(az identity show --name "$MSI_NAME" --query id -o tsv | tail -n1 | tr -d "[:cntrl:]" || true)
if [[ -z $IDENTITY_ID ]]; then
    IDENTITY_ID=$(az identity create -n "$MSI_NAME" --query id -o tsv | tail -n1 | tr -d "[:cntrl:]")
fi
echo "MSI created: $MSI_NAME"
sleep 15 # Let the previous command finish: https://github.com/Azure/azure-cli/issues/8530


echo "Checking if compute $COMPUTE_NAME already exists"
if [ "$(does_compute_exist)" == "true" ]; then
  echo "Skipping, compute: $COMPUTE_NAME exists"
else
  echo "Provisioning compute: $COMPUTE_NAME"
  az ml compute create --name "$COMPUTE_NAME" --type amlcompute --identity-type user_assigned --user-assigned-identities "$IDENTITY_ID"
fi
az ml compute update --name "$COMPUTE_NAME" --identity-type user_assigned --user-assigned-identities "$IDENTITY_ID"

System-assigned managed identity

export COMPUTE_NAME=mycluster-sa

does_compute_exist()
{
  if [ -z $(az ml compute show -n $COMPUTE_NAME --query name) ]; then
    echo false
  else
    echo true
  fi
}

echo "Checking if compute $COMPUTE_NAME already exists"
if [ "$(does_compute_exist)" == "true" ]; then
  echo "Skipping, compute: $COMPUTE_NAME exists"
else
  echo "Provisioning compute: $COMPUTE_NAME"
  az ml compute create --name "$COMPUTE_NAME" --type amlcompute
fi

az ml compute update --name "$COMPUTE_NAME" --identity-type system_assigned

Data storage

When you create a datastore that uses identity-based data access, your Azure account (Microsoft Entra token) is used to confirm you have permission to access the storage service. In the identity-based data access scenario, no authentication credentials are saved. Only the storage account information is stored in the datastore.

In contrast, datastores that use credential-based authentication cache connection information, like your storage account key or SAS token, in the key vault that's associated with the workspace. This approach has the limitation that other workspace users with sufficient permissions can retrieve those credentials, which may be a security concern for some organization.

For more information on how data access is authenticated, see the Data administration article. For information on configuring identity based access to data, see Create datastores.

There are two scenarios in which you can apply identity-based data access in Azure Machine Learning. These scenarios are a good fit for identity-based access when you're working with confidential data and need more granular data access management:

  • Accessing storage services
  • Training machine learning models

The identity-based access allows you to use role-based access controls (RBAC) to restrict which identities, such as users or compute resources, have access to the data.

Accessing storage services

You can connect to storage services via identity-based data access with Azure Machine Learning datastores.

When you use identity-based data access, Azure Machine Learning prompts you for your Microsoft Entra token for data access authentication instead of keeping your credentials in the datastore. That approach allows for data access management at the storage level and keeps credentials confidential.

The same behavior applies when you work with data interactively via a Jupyter Notebook on your local computer or compute instance.

Note

Credentials stored via credential-based authentication include subscription IDs, shared access signature (SAS) tokens, and storage access key and service principal information, like client IDs and tenant IDs.

To help ensure that you securely connect to your storage service on Azure, Azure Machine Learning requires that you have permission to access the corresponding data storage.

Warning

Cross tenant access to storage accounts is not supported. If cross tenant access is needed for your scenario, please reach out to the Azure Machine Learning Data Support team alias at amldatasupport@microsoft.com for assistance with a custom code solution.

Identity-based data access supports connections to only the following storage services.

  • Azure Blob Storage
  • Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2

To access these storage services, you must have at least Storage Blob Data Reader access to the storage account. Only storage account owners can change your access level via the Azure portal.

Access data for training jobs on compute using managed identity

Certain machine learning scenarios involve working with private data. In such cases, data scientists may not have direct access to data as Microsoft Entra users. In this scenario, the managed identity of a compute can be used for data access authentication. In this scenario, the data can only be accessed from a compute instance or a machine learning compute cluster executing a training job. With this approach, the admin grants the compute instance or compute cluster managed identity Storage Blob Data Reader permissions on the storage. The individual data scientists don't need to be granted access.

To enable authentication with compute managed identity:

  • Create compute with managed identity enabled. See the compute cluster section, or for compute instance, the Assign managed identity section.

    Important

    If the compute instance is also configured for idle shutdown, the compute instance won't shut down due to inactivity unless the managed identity has contributor access to the Azure Machine Learning workspace. For more information on assigning permissions, see Manage access to Azure Machine Learning workspaces.

  • Grant compute managed identity at least Storage Blob Data Reader role on the storage account.

  • Create any datastores with identity-based authentication enabled. See Create datastores.

Note

The name of the created system managed identity for compute instance or cluster will be in the format /workspace-name/computes/compute-name in your Microsoft Entra ID.

Once the identity-based authentication is enabled, the compute managed identity is used by default when accessing data within your training jobs. Optionally, you can authenticate with user identity using the steps described in next section.

For information on using configuring Azure RBAC for the storage, see role-based access controls.

Access data for training jobs on compute clusters using user identity

APPLIES TO: Azure CLI ml extension v2 (current)

When training on Azure Machine Learning compute clusters, you can authenticate to storage with your user Microsoft Entra token.

This authentication mode allows you to:

  • Set up fine-grained permissions, where different workspace users can have access to different storage accounts or folders within storage accounts.
  • Let data scientists re-use existing permissions on storage systems.
  • Audit storage access because the storage logs show which identities were used to access data.

Important

This functionality has the following limitations

  • Feature is supported for experiments submitted via the Azure Machine Learning CLI and Python SDK V2, but not via ML Studio.
  • User identity and compute managed identity cannot be used for authentication within same job.
  • For pipeline jobs, we recommend setting user identity at the individual step level that will be executed on a compute, rather than at the root pipeline level. ( While identity setting is supported at both root pipeline and step levels, the step level setting takes precedence if both are set. However, for pipelines containing pipeline components, identity must be set on individual steps that will be executed. Identity set at the root pipeline or pipeline component level will not function. Therefore, we suggest setting identity at the individual step level for simplicity.)

The following steps outline how to set up data access with user identity for training jobs on compute clusters from CLI.

  1. Grant the user identity access to storage resources. For example, grant StorageBlobReader access to the specific storage account you want to use or grant ACL-based permission to specific folders or files in Azure Data Lake Gen 2 storage.

  2. Create an Azure Machine Learning datastore without cached credentials for the storage account. If a datastore has cached credentials, such as storage account key, those credentials are used instead of user identity.

  3. Submit a training job with property identity set to type: user_identity, as shown in following job specification. During the training job, the authentication to storage happens via the identity of the user that submits the job.

    Note

    If the identity property is left unspecified and datastore does not have cached credentials, then compute managed identity becomes the fallback option.

    command: |
    echo "--census-csv: ${{inputs.census_csv}}"
    python hello-census.py --census-csv ${{inputs.census_csv}}
    code: src
    inputs:
    census_csv:
        type: uri_file 
        path: azureml://datastores/mydata/paths/census.csv
    environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest
    compute: azureml:cpu-cluster
    identity:
    type: user_identity
    

The following steps outline how to set up data access with user identity for training jobs on compute clusters from Python SDK.

  1. Grant data access and create data store as described above for CLI.

  2. Submit a training job with identity parameter set to azure.ai.ml.UserIdentityConfiguration. This parameter setting enables the job to access data on behalf of user submitting the job.

    from azure.ai.ml import command
    from azure.ai.ml.entities import Data, UriReference
    from azure.ai.ml import Input
    from azure.ai.ml.constants import AssetTypes
    from azure.ai.ml import UserIdentityConfiguration
    
    # Specify the data location
    my_job_inputs = {
        "input_data": Input(type=AssetTypes.URI_FILE, path="<path-to-my-data>")
    }
    
    # Define the job
    job = command(
        code="<my-local-code-location>", 
        command="python <my-script>.py --input_data ${{inputs.input_data}}",
        inputs=my_job_inputs,
        environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:9",
        compute="<my-compute-cluster-name>",
        identity= UserIdentityConfiguration() 
    )
    # submit the command
    returned_job = ml_client.jobs.create_or_update(job)
    

Important

During job submission with authentication with user identity enabled, the code snapshots are protected against tampering by checksum validation. If you have existing pipeline components and intend to use them with authentication with user identity enabled, you may need to re-upload them. Otherwise the job may fail during checksum validation.

Work with virtual networks

By default, Azure Machine Learning can't communicate with a storage account that's behind a firewall or in a virtual network.

You can configure storage accounts to allow access only from within specific virtual networks. This configuration requires extra steps to ensure data isn't leaked outside of the network. This behavior is the same for credential-based data access. For more information, see How to prevent data exfiltration.

If your storage account has virtual network settings, that dictates what identity type and permissions access is needed. For example for data preview and data profile, the virtual network settings determine what type of identity is used to authenticate data access.

  • In scenarios where only certain IPs and subnets are allowed to access the storage, then Azure Machine Learning uses the workspace MSI to accomplish data previews and profiles.

  • If your storage is ADLS Gen 2 or Blob and has virtual network settings, customers can use either user identity or workspace MSI depending on the datastore settings defined during creation.

  • If the virtual network setting is "Allow Azure services on the trusted services list to access this storage account", then Workspace MSI is used.

Scenario: Azure Container Registry without admin user

When you disable the admin user for ACR, Azure Machine Learning uses a managed identity to build and pull Docker images. There are two workflows when configuring Azure Machine Learning to use an ACR with the admin user disabled:

  • Allow Azure Machine Learning to create the ACR instance and then disable the admin user afterwards.
  • Bring an existing ACR with the admin user already disabled.

Azure Machine Learning with auto-created ACR instance

  1. Create a new Azure Machine Learning workspace.

  2. Perform an action that requires Azure Container Registry. For example, the Tutorial: Train your first model.

  3. Get the name of the ACR created by the cluster.

    APPLIES TO: Azure CLI ml extension v2 (current)

    az ml workspace show -w <my workspace> \
    -g <my resource group>
    --query containerRegistry
    

    This command returns a value similar to the following text. You only want the last portion of the text, which is the ACR instance name:

    /subscriptions/<subscription id>/resourceGroups/<my resource group>/providers/MicrosoftContainerReggistry/registries/<ACR instance name>
    
  4. Update the ACR to disable the admin user:

    az acr update --name <ACR instance name> --admin-enabled false
    

Bring your own ACR

If ACR admin user is disallowed by subscription policy, you should first create ACR without admin user, and then associate it with the workspace. Also, if you have existing ACR with admin user disabled, you can attach it to the workspace.

Create ACR from Azure CLI without setting --admin-enabled argument, or from Azure portal without enabling admin user. Then, when creating Azure Machine Learning workspace, specify the Azure resource ID of the ACR. The following example demonstrates creating a new Azure Machine Learning workspace that uses an existing ACR:

Tip

To get the value for the --container-registry parameter, use the az acr show command to show information for your ACR. The id field contains the resource ID for your ACR.

APPLIES TO: Azure CLI ml extension v2 (current)

az ml workspace create -w <workspace name> \
-g <workspace resource group> \
-l <region> \
--container-registry /subscriptions/<subscription id>/resourceGroups/<acr resource group>/providers/Microsoft.ContainerRegistry/registries/<acr name>

Create compute with managed identity to access Docker images for training

To access the workspace ACR, create machine learning compute cluster with system-assigned managed identity enabled. You can enable the identity from Azure portal or Studio when creating compute, or from Azure CLI using the below. For more information, see using managed identity with compute clusters.

APPLIES TO: Azure CLI ml extension v2 (current)

az ml compute create --name cpu-cluster --type <cluster name>  --identity-type systemassigned

A managed identity is automatically granted ACRPull role on workspace ACR to enable pulling Docker images for training.

Note

If you create compute first, before workspace ACR has been created, you have to assign the ACRPull role manually.

Use Docker images for inference

Once you've configured ACR without admin user as described earlier, you can access Docker images for inference without admin keys from your Azure Kubernetes service (AKS). When you create or attach AKS to workspace, the cluster's service principal is automatically assigned ACRPull access to workspace ACR.

Note

If you bring your own AKS cluster, the cluster must have service principal enabled instead of managed identity.

Scenario: Use a private Azure Container Registry

By default, Azure Machine Learning uses Docker base images that come from a public repository managed by Microsoft. It then builds your training or inference environment on those images. For more information, see What are ML environments?.

To use a custom base image internal to your enterprise, you can use managed identities to access your private ACR. There are two use cases:

  • Use base image for training as is.
  • Build Azure Machine Learning managed image with custom image as a base.

Pull Docker base image to machine learning compute cluster for training as is

Create machine learning compute cluster with system-assigned managed identity enabled as described earlier. Then, determine the principal ID of the managed identity.

APPLIES TO: Azure CLI ml extension v2 (current)

az ml compute show --name <cluster name> -w <workspace> -g <resource group>

Optionally, you can update the compute cluster to assign a user-assigned managed identity:

APPLIES TO: Azure CLI ml extension v2 (current)

az ml compute update --name <cluster name> --user-assigned-identities <my-identity-id>

To allow the compute cluster to pull the base images, grant the managed service identity ACRPull role on the private ACR

APPLIES TO: Azure CLI ml extension v2 (current)

az role assignment create --assignee <principal ID> \
--role acrpull \
--scope "/subscriptions/<subscription ID>/resourceGroups/<private ACR resource group>/providers/Microsoft.ContainerRegistry/registries/<private ACR name>"

Finally, create an environment and specify the base image location in the environment YAML file.

APPLIES TO: Azure CLI ml extension v2 (current)

$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: docker-image-example
image: pytorch/pytorch:latest
description: Environment created from a Docker image.
az ml environment create --file <yaml file>

You can now use the environment in a training job.

Build Azure Machine Learning managed environment into base image from private ACR for training or inference

APPLIES TO: Azure CLI ml extension v2 (current)

In this scenario, Azure Machine Learning service builds the training or inference environment on top of a base image you supply from a private ACR. Because the image build task happens on the workspace ACR using ACR Tasks, you must perform more steps to allow access.

  1. Create user-assigned managed identity and grant the identity ACRPull access to the private ACR.

  2. Grant the workspace managed identity a Managed Identity Operator role on the user-assigned managed identity from the previous step. This role allows the workspace to assign the user-assigned managed identity to ACR Task for building the managed environment.

    1. Obtain the principal ID of workspace system-assigned managed identity:

      APPLIES TO: Azure CLI ml extension v2 (current)

      az ml workspace show -w <workspace name> -g <resource group> --query identityPrincipalId
      
    2. Grant the Managed Identity Operator role:

      az role assignment create --assignee <principal ID> --role managedidentityoperator --scope <user-assigned managed identity resource ID>
      

      The user-assigned managed identity resource ID is Azure resource ID of the user assigned identity, in the format /subscriptions/<subscription ID>/resourceGroups/<resource group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<user-assigned managed identity name>.

  3. Specify the external ACR and client ID of the user-assigned managed identity in workspace connections by using the az ml connection command. This command accepts a YAML file that provides information on the connection. The following example demonstrates the format for specifying a managed identity. Replace the client_id and resource_id values with the ones for your managed identity:

    APPLIES TO: Azure CLI ml extension v2 (current)

    name: test_ws_conn_cr_managed
    type: container_registry
    target: https://test-feed.com
    credentials:
      type: managed_identity
      client_id: client_id
      resource_id: resource_id
    

    The following command demonstrates how to use the YAML file to create a connection with your workspace. Replace <yaml file>, <workspace name>, and <resource group> with the values for your configuration:

    az ml connection create --file <yml file> --resource-group <resource group> --workspace-name <workspace>
    
  4. Once the configuration is complete, you can use the base images from private ACR when building environments for training or inference. The following code snippet demonstrates how to specify the base image ACR and image name in an environment definition:

    APPLIES TO: Python SDK azure-ai-ml v2 (current)

    $schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
    name: private-acr-example
    image: <acr url>/pytorch/pytorch:latest
    description: Environment created from private ACR.
    

Next steps