Edit

Share via


Audit and manage Azure Machine Learning

When teams collaborate on Azure Machine Learning, they might face varying requirements to configure and organize resources. Machine learning teams might look for flexibility in how to organize workspaces for collaboration, or how to size compute clusters for the requirements of their use cases. In these scenarios, productivity could benefit if application teams can manage their own infrastructure.

As a platform administrator, you can use policies to lay out guardrails for teams to manage their own resources. Azure Policy helps audit and govern resource state. This article explains how you can use audit controls and governance practices for Azure Machine Learning.

Policies for Azure Machine Learning

Azure Policy is a governance tool that allows you to ensure that Azure resources are compliant with your policies.

Azure Policy provides a set of policies that you can use for common scenarios with Azure Machine Learning. You can assign these policy definitions to your existing subscription or use them as the basis to create your own custom definitions.

The following table lists the built-in policies you can assign with Azure Machine Learning. For a list of all Azure built-in policies, see Built-in policies.

Name
(Azure portal)
Description Effect(s) Version
(GitHub)
[Preview]: Azure Machine Learning Deployments should only use approved Registry Models Restrict the deployment of Registry models to control externally created models used within your organization Audit, Deny, Disabled 1.0.0-preview
[Preview]: Azure Machine Learning Model Registry Deployments are restricted except for the allowed Registry Only deploy Registry Models in the allowed Registry and that are not restricted. Deny, Disabled 1.0.0-preview
Azure Machine Learning Compute Instance should have idle shutdown. Having an idle shutdown schedule reduces cost by shutting down computes that are idle after a pre-determined period of activity. Audit, Deny, Disabled 1.0.0
Azure Machine Learning compute instances should be recreated to get the latest software updates Ensure Azure Machine Learning compute instances run on the latest available operating system. Security is improved and vulnerabilities reduced by running with the latest security patches. For more information, visit https://aka.ms/azureml-ci-updates/. [parameters('effects')] 1.0.3
Azure Machine Learning Computes should be in a virtual network Azure Virtual Networks provide enhanced security and isolation for your Azure Machine Learning Compute Clusters and Instances, as well as subnets, access control policies, and other features to further restrict access. When a compute is configured with a virtual network, it is not publicly addressable and can only be accessed from virtual machines and applications within the virtual network. Audit, Disabled 1.0.1
Azure Machine Learning Computes should have local authentication methods disabled Disabling local authentication methods improves security by ensuring that Machine Learning Computes require Azure Active Directory identities exclusively for authentication. Learn more at: https://aka.ms/azure-ml-aad-policy. Audit, Deny, Disabled 2.1.0
Azure Machine Learning workspaces should be encrypted with a customer-managed key Manage encryption at rest of Azure Machine Learning workspace data with customer-managed keys. By default, customer data is encrypted with service-managed keys, but customer-managed keys are commonly required to meet regulatory compliance standards. Customer-managed keys enable the data to be encrypted with an Azure Key Vault key created and owned by you. You have full control and responsibility for the key lifecycle, including rotation and management. Learn more at https://aka.ms/azureml-workspaces-cmk. Audit, Deny, Disabled 1.1.0
Azure Machine Learning Workspaces should disable public network access Disabling public network access improves security by ensuring that the Machine Learning Workspaces aren't exposed on the public internet. You can control exposure of your workspaces by creating private endpoints instead. Learn more at: https://learn.microsoft.com/azure/machine-learning/how-to-configure-private-link?view=azureml-api-2&tabs=azure-portal. Audit, Deny, Disabled 2.0.1
Azure Machine Learning workspaces should enable V1LegacyMode to support network isolation backward compatibility Azure ML is making a transition to a new V2 API platform on Azure Resource Manager and you can control API platform version using V1LegacyMode parameter. Enabling the V1LegacyMode parameter will enable you to keep your workspaces in the same network isolation as V1, though you won't have use of the new V2 features. We recommend turning on V1 Legacy Mode only when you want to keep the AzureML control plane data inside your private networks. Learn more at: https://aka.ms/V1LegacyMode. Audit, Deny, Disabled 1.0.0
Azure Machine Learning workspaces should use private link Azure Private Link lets you connect your virtual network to Azure services without a public IP address at the source or destination. The Private Link platform handles the connectivity between the consumer and services over the Azure backbone network. By mapping private endpoints to Azure Machine Learning workspaces, data leakage risks are reduced. Learn more about private links at: https://docs.microsoft.com/azure/machine-learning/how-to-configure-private-link. Audit, Disabled 1.0.0
Azure Machine Learning workspaces should use user-assigned managed identity Manange access to Azure ML workspace and associated resources, Azure Container Registry, KeyVault, Storage, and App Insights using user-assigned managed identity. By default, system-assigned managed identity is used by Azure ML workspace to access the associated resources. User-assigned managed identity allows you to create the identity as an Azure resource and maintain the life cycle of that identity. Learn more at https://docs.microsoft.com/azure/machine-learning/how-to-use-managed-identities?tabs=python. Audit, Deny, Disabled 1.0.0
Configure Azure Machine Learning Computes to disable local authentication methods Disable location authentication methods so that your Machine Learning Computes require Azure Active Directory identities exclusively for authentication. Learn more at: https://aka.ms/azure-ml-aad-policy. Modify, Disabled 2.1.0
Configure Azure Machine Learning workspace to use private DNS zones Use private DNS zones to override the DNS resolution for a private endpoint. A private DNS zone links to your virtual network to resolve to Azure Machine Learning workspaces. Learn more at: https://docs.microsoft.com/azure/machine-learning/how-to-network-security-overview. DeployIfNotExists, Disabled 1.1.0
Configure Azure Machine Learning Workspaces to disable public network access Disable public network access for Azure Machine Learning Workspaces so that your workspaces aren't accessible over the public internet. This helps protect the workspaces against data leakage risks. You can control exposure of your workspaces by creating private endpoints instead. Learn more at: https://learn.microsoft.com/azure/machine-learning/how-to-configure-private-link?view=azureml-api-2&tabs=azure-portal. Modify, Disabled 1.0.3
Configure Azure Machine Learning workspaces with private endpoints Private endpoints connect your virtual network to Azure services without a public IP address at the source or destination. By mapping private endpoints to your Azure Machine Learning workspace, you can reduce data leakage risks. Learn more about private links at: https://docs.microsoft.com/azure/machine-learning/how-to-configure-private-link. DeployIfNotExists, Disabled 1.0.0
Configure diagnostic settings for Azure Machine Learning Workspaces to Log Analytics workspace Deploys the diagnostic settings for Azure Machine Learning Workspaces to stream resource logs to a Log Analytics Workspace when any Azure Machine Learning Workspace which is missing this diagnostic settings is created or updated. DeployIfNotExists, Disabled 1.0.1
Resource logs in Azure Machine Learning Workspaces should be enabled Resource logs enable recreating activity trails to use for investigation purposes when a security incident occurs or when your network is compromised. AuditIfNotExists, Disabled 1.0.1

Policies can be set at different scopes, such as at the subscription or resource group level. For more information, see the Azure Policy documentation.

Assigning built-in policies

To view the built-in policy definitions related to Azure Machine Learning, use the following steps:

  1. Go to Azure Policy in the Azure portal.
  2. Select Definitions.
  3. For Type, select Built-in. For Category, select Machine Learning.

From here, you can select policy definitions to view them. While viewing a definition, you can use the Assign link to assign the policy to a specific scope, and configure the parameters for the policy. For more information, see Create a policy assignment to identify non-compliant resources using Azure portal.

You can also assign policies by using Azure PowerShell, Azure CLI, or templates.

Conditional access policies

To control who can access your Azure Machine Learning workspace, use Microsoft Entra Conditional Access. To use Conditional Access for Azure Machine Learning workspaces, assign the Conditional Access policy to the app named Azure Machine Learning. The app ID is 0736f41a-0425-bdb5-1563eff02385.

Enable self-service using landing zones

Landing zones are an architectural pattern that accounts for scale, governance, security, and productivity when setting up Azure environments. A data landing zone is an administator-configured environment that an application team uses to host a data and analytics workload.

The purpose of the landing zone is to ensure that all infrastructure configuration work is done when a team starts in the Azure environment. For instance, security controls are set up in compliance with organizational standards and network connectivity is set up.

When you use the landing zones pattern, machine learning teams can deploy and manage their own resources on a self-service basis. By using Azure policy as an administrator, you can audit and manage Azure resources for compliance.

Azure Machine Learning integrates with data landing zones in the Cloud Adoption Framework data management and analytics scenario. This reference implementation provides an optimized environment to migrate machine learning workloads onto Azure Machine Learning and includes preconfigured policies.

Configure built-in policies

Compute instance should have idle shutdown

This policy controls whether an Azure Machine Learning compute instance should have idle shutdown enabled. Idle shutdown automatically stops the compute instance when it's idle for a specified period of time. This policy is useful for cost savings and to ensure that resources aren't being used unnecessarily.

To configure this policy, set the effect parameter to Audit, Deny, or Disabled. If set to Audit, you can create a compute instance without idle shutdown enabled and a warning event is created in the activity log.

Compute instances should be recreated to get software updates

Controls whether Azure Machine Learning compute instances should be audited to make sure they're running the latest available software updates. This policy is useful to ensure that compute instances are running the latest software updates to maintain security and performance. For more information, see Vulnerability management for Azure Machine Learning.

To configure this policy, set the effect parameter to Audit or Disabled. If set to Audit, a warning event is created in the activity log when a compute isn't running the latest software updates.

Compute cluster and instance should be in a virtual network

Controls auditing of compute cluster and instance resources behind a virtual network.

To configure this policy, set the effect parameter to Audit or Disabled. If set to Audit, you can create a compute that isn't configured behind a virtual network and a warning event is created in the activity log.

Computes should have local authentication methods disabled.

Controls whether an Azure Machine Learning compute cluster or instance should disable local authentication (SSH).

To configure this policy, set the effect parameter to Audit, Deny, or Disabled. If set to Audit, you can create a compute with SSH enabled and a warning event is created in the activity log.

If the policy is set to Deny, then you can't create a compute unless SSH is disabled. Attempting to create a compute with SSH enabled results in an error. The error is also logged in the activity log. The policy identifier is returned as part of this error.

Workspaces should be encrypted with customer-managed key

Controls whether a workspace should be encrypted with a customer-managed key, or with a Microsoft-managed key to encrypt metrics and metadata. For more information on using customer-managed key, see the Azure Cosmos DB section of the data encryption article.

To configure this policy, set the effect parameter to Audit or Deny. If set to Audit, you can create a workspace without a customer-managed key and a warning event is created in the activity log.

If the policy is set to Deny, then you can't create a workspace unless it specifies a customer-managed key. Attempting to create a workspace without a customer-managed key results in an error similar to Resource 'clustername' was disallowed by policy and creates an error in the activity log. The policy identifier is also returned as part of this error.

Configure workspaces to disable public network access

Controls whether a workspace should disable network access from the public internet.

To configure this policy, set the effect parameter to Audit, Deny, or Disabled. If set to Audit, you can create a workspace with public access and a warning event is created in the activity log.

If the policy is set to Deny, then you can't create a workspace that allows network access from the public internet.

Workspaces should enable V1LegacyMode to support network isolation backward compatibility

Controls whether a workspace should enable V1LegacyMode to support network isolation backward compatibility. This policy is useful if you want to keep Azure Machine Learning control plane data inside your private networks. For more information, see Network isolation change with our new API platform.

To configure this policy, set the effect parameter to Audit or Deny, or Disabled. If set to Audit, you can create a workspace without enabling V1LegacyMode and a warning event is created in the activity log.

If the policy is set to Deny, then you can't create a workspace unless it enables V1LegacyMode.

Controls whether a workspace should use Azure Private Link to communicate with Azure Virtual Network. For more information on using private link, see Configure a private endpoint for an Azure Machine Learning workspace.

To configure this policy, set the effect parameter to Audit or Deny. If set to Audit, you can create a workspace without using private link and a warning event is created in the activity log.

If the policy is set to Deny, then you can't create a workspace unless it uses a private link. Attempting to create a workspace without a private link results in an error. The error is also logged in the activity log. The policy identifier is returned as part of this error.

Workspaces should use user-assigned managed identity

Controls whether a workspace is created using a system-assigned managed identity (default) or a user-assigned managed identity. The managed identity for the workspace is used to access associated resources such as Azure Storage, Azure Container Registry, Azure Key Vault, and Azure Application Insights. For more information, see Set up authentication between Azure Machine Learning and other services.

To configure this policy, set the effect parameter to Audit, Deny, or Disabled. If set to Audit, you can create a workspace without specifying a user-assigned managed identity. A system-assigned identity is used, and a warning event is created in the activity log.

If the policy is set to Deny, then you can't create a workspace unless you provide a user-assigned identity during the creation process. Attempting to create a workspace without providing a user-assigned identity results in an error. The error is also logged to the activity log. The policy identifier is returned as part of this error.

Configure computes to modify/disable local authentication

This policy modifies any Azure Machine Learning compute cluster or instance creation request to disable local authentication (SSH).

To configure this policy, set the effect parameter to Modify or Disabled. If set Modify, any creation of a compute cluster or instance within the scope where the policy applies automatically has local authentication disabled.

Configure workspace to use private DNS zones

This policy configures a workspace to use a private DNS zone, overriding the default DNS resolution for a private endpoint.

To configure this policy, set the effect parameter to DeployIfNotExists. Set the privateDnsZoneId to the Azure Resource Manager ID of the private DNS zone to use.

Configure workspaces to disable public network access

Configures a workspace to disable network access from the public internet. Disabling public network access helps protect the workspaces against data leakage risks. You can instead access your workspace by creating private endpoints. For more information, see Configure a private endpoint for an Azure Machine Learning workspace.

To configure this policy, set the effect parameter to Modify or Disabled. If set to Modify, any creation of a workspace within the scope where the policy applies automatically has public network access disabled.

Configure workspaces with private endpoints

Configures a workspace to create a private endpoint within the specified subnet of an Azure Virtual Network.

To configure this policy, set the effect parameter to DeployIfNotExists. Set the privateEndpointSubnetID to the Azure Resource Manager ID of the subnet.

Configure diagnostic workspaces to send logs to log analytics workspaces

Configures the diagnostic settings for an Azure Machine Learning workspace to send logs to a Log Analytics workspace.

To configure this policy, set the effect parameter to DeployIfNotExists or Disabled. If set to DeployIfNotExists, the policy creates a diagnostic setting to send logs to a Log Analytics workspace if it doesn't already exist.

Resource logs in workspaces should be enabled

Audits whether resource logs are enabled for an Azure Machine Learning workspace. Resource logs provide detailed information about operations performed on resources in the workspace.

To configure this policy, set the effect parameter to AuditIfNotExists or Disabled. If set to AuditIfNotExists, the policy audits if resource logs aren't enabled for the workspace.

Create custom definitions

When you need to create custom policies for your organization, you can use the Azure Policy definition structure to create your own definitions. You can use the Azure Policy Visual Studio Code extension to author and test your policies.

To discover the policy aliases you can use in your definition, use the following Azure CLI command to list the aliases for Azure Machine Learning:

az provider show --namespace Microsoft.MachineLearningServices --expand "resourceTypes/aliases" --query "resourceTypes[].aliases[].name"

To discover the allowed values for a specific alias, visit the Azure Machine Learning REST API reference.

For a tutorial (not Azure Machine Learning specific) on how to create custom policies, visit Create a custom policy definition.

Example: Block serverless spark compute jobs

{
    "properties": {
        "displayName": "Deny serverless Spark compute jobs",
        "description": "Deny serverless Spark compute jobs",
        "mode": "All",
        "policyRule": {
            "if": {
                "allOf": [
                    {
                        "field": "Microsoft.MachineLearningServices/workspaces/jobs/jobType",
                        "in": [
                            "Spark"
                        ]
                    }
                ]
            },
            "then": {
                "effect": "Deny"
            }
        },
        "parameters": {}
    }
}

Example: Configure no public IP for managed computes

{
    "properties": {
        "displayName": "Deny compute instance and compute cluster creation with public IP",
        "description": "Deny compute instance and compute cluster creation with public IP",
        "mode": "all",
        "parameters": {
            "effectType": {
                "type": "string",
                "defaultValue": "Deny",
                "allowedValues": [
                    "Deny",
                    "Disabled"
                ],
                "metadata": {
                    "displayName": "Effect",
                    "description": "Enable or disable the execution of the policy"
                }
            }
        },
        "policyRule": {
            "if": {
                "allOf": [
                  {
                    "field": "type",
                    "equals": "Microsoft.MachineLearningServices/workspaces/computes"
                  },
                  {
                    "allOf": [
                      {
                        "field": "Microsoft.MachineLearningServices/workspaces/computes/computeType",
                        "notEquals": "AKS"
                      },
                      {
                        "field": "Microsoft.MachineLearningServices/workspaces/computes/enableNodePublicIP",
                        "equals": true
                      }
                    ]
                  }
                ]
              },
            "then": {
                "effect": "[parameters('effectType')]"
            }
        }
    }
}