Enterprise security and governance for Azure Machine Learning
In this article, you learn about security and governance features available for Azure Machine Learning. These features are useful for administrators, DevOps, and MLOps who want to create a secure configuration that is compliant with your companies policies. With Azure Machine Learning and the Azure platform, you can:
- Restrict access to resources and operations by user account or groups
- Restrict incoming and outgoing network communications
- Encrypt data in transit and at rest
- Scan for vulnerabilities
- Apply and audit configuration policies
Restrict access to resources and operations
Azure Active Directory (Azure AD) is the identity service provider for Azure Machine Learning. It allows you to create and manage the security objects (user, group, service principal, and managed identity) that are used to authenticate to Azure resources. Multi-factor authentication is supported if Azure AD is configured to use it.
Here's the authentication process for Azure Machine Learning using multi-factor authentication in Azure AD:
- The client signs in to Azure AD and gets an Azure Resource Manager token.
- The client presents the token to Azure Resource Manager and to all Azure Machine Learning.
- Azure Machine Learning provides a Machine Learning service token to the user compute target (for example, Azure Machine Learning compute cluster or serverless compute). This token is used by the user compute target to call back into the Machine Learning service after the job is complete. The scope is limited to the workspace.
Each workspace has an associated system-assigned managed identity that has the same name as the workspace. This managed identity is used to securely access resources used by the workspace. It has the following Azure RBAC permissions on associated resources:
|Storage account||Storage Blob Data Contributor|
|Key vault||Access to all keys, secrets, certificates|
|Azure Container Registry||Contributor|
|Resource group that contains the workspace||Contributor|
The system-assigned managed identity is used for internal service-to-service authentication between Azure Machine Learning and other Azure resources. The identity token isn't accessible to users and they can't use it to gain access to these resources. Users can only access the resources through Azure Machine Learning control and data plane APIs, if they have sufficient RBAC permissions.
We don't recommend that admins revoke the access of the managed identity to the resources mentioned in the preceding table. You can restore access by using the resync keys operation.
If your Azure Machine Learning workspaces has compute targets (compute cluster, compute instance, Azure Kubernetes Service, etc.) that were created before May 14th, 2021, you may also have an additional Azure Active Directory account. The account name starts with
Microsoft-AzureML-Support-App- and has contributor-level access to your subscription for every workspace region.
If your workspace does not have an Azure Kubernetes Service (AKS) attached, you can safely delete this Azure AD account.
If your workspace has attached AKS clusters, and they were created before May 14th, 2021, do not delete this Azure AD account. In this scenario, you must first delete and recreate the AKS cluster before you can delete the Azure AD account.
You can provision the workspace to use user-assigned managed identity, and grant the managed identity additional roles, for example to access your own Azure Container Registry for base Docker images. You can also configure managed identities for use with Azure Machine Learning compute cluster. This managed identity is independent of workspace managed identity. With a compute cluster, the managed identity is used to access resources such as secured datastores that the user running the training job may not have access to. For more information, see Use managed identities for access control.
There are some exceptions to the use of Azure AD and Azure RBAC within Azure Machine Learning:
- You can optionally enable SSH access to compute resources such as Azure Machine Learning compute instance and compute cluster. SSH access is based on public/private key pairs, not Azure AD. SSH access is not governed by Azure RBAC.
- You can authenticate to models deployed as online endpoints using key or token-based authentication. Keys are static strings, while tokens are retrieved using an Azure AD security object. For more information, see How to authenticate online endpoints.
For more information, see the following articles:
- Authentication for Azure Machine Learning workspace
- Manage access to Azure Machine Learning
- Connect to storage services
- Use Azure Key Vault for secrets when training
- Use Azure AD managed identity with Azure Machine Learning
Network security and isolation
To restrict network access to Azure Machine Learning resources, you can use an Azure Machine Learning managed virtual network or Azure Virtual Network (VNet). Using a virtual network reduces the attack surface for your solution, and the chances of data exfiltration.
You don't have to pick one or the other. For example, you can use a managed virtual network to secure managed compute resources and an Azure Virtual Network for your unmanaged resources or to secure client access to the workspace.
Azure Machine Learning managed virtual network provides a fully managed solution that enables network isolation for your workspace and managed compute resources. You can use private endpoints to secure communication with other Azure services, and can restrict outbound communications. The following managed compute resources are secured with a managed network:
- Serverless compute (including Spark serverless)
- Compute cluster
- Compute instance
- Managed online endpoints
- Batch online endpoints
For more information, see Azure Machine Learning managed virtual network.
Azure Virtual Networks provides a more customizable virtual network offering. However, you're responsible for configuration and management. You may need to use network security groups, user-defined routing, or a firewall to restrict outbound communication.
For more information, see the following documents:
Azure Machine Learning uses various compute resources and data stores on the Azure platform. To learn more about how each of these resources supports data encryption at rest and in transit, see Data encryption with Azure Machine Learning.
Data exfiltration prevention
Azure Machine Learning has several inbound and outbound network dependencies. Some of these dependencies can expose a data exfiltration risk by malicious agents within your organization. These risks are associated with the outbound requirements to Azure Storage, Azure Front Door, and Azure Monitor. For recommendations on mitigating this risk, see the Azure Machine Learning data exfiltration prevention article.
Microsoft Defender for Cloud provides unified security management and advanced threat protection across hybrid cloud workloads. For Azure Machine Learning, you should enable scanning of your Azure Container Registry resource and Azure Kubernetes Service resources. For more information, see Azure Container Registry image scanning by Defender for Cloud and Azure Kubernetes Services integration with Defender for Cloud.
Audit and manage compliance
Azure Policy is a governance tool that allows you to ensure that Azure resources are compliant with your policies. You can set policies to allow or enforce specific configurations, such as whether your Azure Machine Learning workspace uses a private endpoint. For more information on Azure Policy, see the Azure Policy documentation. For more information on the policies specific to Azure Machine Learning, see Audit and manage compliance with Azure Policy.