Use Azure Machine Learning studio in an Azure virtual network

In this article, you learn how to use Azure Machine Learning studio in a virtual network. The studio includes features like AutoML, the designer, and data labeling.

Some of the studio's features are disabled by default in a virtual network. To re-enable these features, you must enable managed identity for storage accounts you intend to use in the studio.

The following operations are disabled by default in a virtual network:

  • Preview data in the studio.
  • Visualize data in the designer.
  • Deploy a model in the designer.
  • Submit an AutoML experiment.
  • Start a labeling project.

The studio supports reading data from the following datastore types in a virtual network:

  • Azure Storage Account (blob & file)
  • Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2
  • Azure SQL Database

In this article, you learn how to:

  • Give the studio access to data stored inside of a virtual network.
  • Access the studio from a resource inside of a virtual network.
  • Understand how the studio impacts storage security.

Tip

This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series:

For a tutorial on creating a secure workspace, see Tutorial: Create a secure workspace or Tutorial: Create a secure workspace using a template.

Prerequisites

Limitations

Azure Storage Account

  • When the storage account is in the VNet, there are extra validation requirements when using studio:

    • If the storage account uses a service endpoint, the workspace private endpoint and storage service endpoint must be in the same subnet of the VNet.
    • If the storage account uses a private endpoint, the workspace private endpoint and storage private endpoint must be in the same VNet. In this case, they can be in different subnets.

Designer sample pipeline

There's a known issue where user cannot run sample pipeline in Designer homepage. This is the sample dataset used in the sample pipeline is Azure Global dataset, and it cannot satisfy all virtual network environment.

To resolve this issue, you can use a public workspace to run sample pipeline to get to know how to use the designer and then replace the sample dataset with your own dataset in the workspace within virtual network.

Datastore: Azure Storage Account

Use the following steps to enable access to data stored in Azure Blob and File storage:

Tip

The first step is not required for the default storage account for the workspace. All other steps are required for any storage account behind the VNet and used by the workspace, including the default storage account.

  1. If the storage account is the default storage for your workspace, skip this step. If it is not the default, Grant the workspace managed identity the 'Storage Blob Data Reader' role for the Azure storage account so that it can read data from blob storage.

    For more information, see the Blob Data Reader built-in role.

  2. Grant the workspace managed identity the 'Reader' role for storage private endpoints. If your storage service uses a private endpoint, grant the workspace's managed identity Reader access to the private endpoint. The workspace's managed identity in Azure AD has the same name as your Azure Machine Learning workspace.

    Tip

    Your storage account may have multiple private endpoints. For example, one storage account may have separate private endpoint for blob, file, and dfs (Azure Data Lake Storage Gen2). Add the managed identity to all these endpoints.

    For more information, see the Reader built-in role.

  3. Enable managed identity authentication for default storage accounts. Each Azure Machine Learning workspace has two default storage accounts, a default blob storage account and a default file store account, which are defined when you create your workspace. You can also set new defaults in the Datastore management page.

    Screenshot showing where default datastores can be found

    The following table describes why managed identity authentication is used for your workspace default storage accounts.

    Storage account Notes
    Workspace default blob storage Stores model assets from the designer. Enable managed identity authentication on this storage account to deploy models in the designer. If managed identity authentication is disabled, the user's identity is used to access data stored in the blob.

    You can visualize and run a designer pipeline if it uses a non-default datastore that has been configured to use managed identity. However, if you try to deploy a trained model without managed identity enabled on the default datastore, deployment will fail regardless of any other datastores in use.
    Workspace default file store Stores AutoML experiment assets. Enable managed identity authentication on this storage account to submit AutoML experiments.
  4. Configure datastores to use managed identity authentication. After you add an Azure storage account to your virtual network with either a service endpoint or private endpoint, you must configure your datastore to use managed identity authentication. Doing so lets the studio access data in your storage account.

    Azure Machine Learning uses datastore to connect to storage accounts. When creating a new datastore, use the following steps to configure a datastore to use managed identity authentication:

    1. In the studio, select Datastores.

    2. To update an existing datastore, select the datastore and select Update credentials.

      To create a new datastore, select + New datastore.

    3. In the datastore settings, select Yes for Use workspace managed identity for data preview and profiling in Azure Machine Learning studio.

      Screenshot showing how to enable managed workspace identity

    4. In the Networking settings for the Azure Storage Account, add the Microsoft.MachineLearningService/workspaces Resource type, and set the Instance name to the workspace.

    These steps add the workspace's managed identity as a Reader to the new storage service using Azure RBAC. Reader access allows the workspace to view the resource, but not make changes.

Datastore: Azure Data Lake Storage Gen1

When using Azure Data Lake Storage Gen1 as a datastore, you can only use POSIX-style access control lists. You can assign the workspace's managed identity access to resources just like any other security principal. For more information, see Access control in Azure Data Lake Storage Gen1.

Datastore: Azure Data Lake Storage Gen2

When using Azure Data Lake Storage Gen2 as a datastore, you can use both Azure RBAC and POSIX-style access control lists (ACLs) to control data access inside of a virtual network.

To use Azure RBAC, follow the steps in the Datastore: Azure Storage Account section of this article. Data Lake Storage Gen2 is based on Azure Storage, so the same steps apply when using Azure RBAC.

To use ACLs, the workspace's managed identity can be assigned access just like any other security principal. For more information, see Access control lists on files and directories.

Datastore: Azure SQL Database

To access data stored in an Azure SQL Database with a managed identity, you must create a SQL contained user that maps to the managed identity. For more information on creating a user from an external provider, see Create contained users mapped to Azure AD identities.

After you create a SQL contained user, grant permissions to it by using the GRANT T-SQL command.

Intermediate component output

When using the Azure Machine Learning designer intermediate component output, you can specify the output location for any component in the designer. Use this to store intermediate datasets in separate location for security, logging, or auditing purposes. To specify output, use the following steps:

  1. Select the component whose output you'd like to specify.
  2. In the component settings pane that appears to the right, select Output settings.
  3. Specify the datastore you want to use for each component output.

Make sure that you have access to the intermediate storage accounts in your virtual network. Otherwise, the pipeline will fail.

Enable managed identity authentication for intermediate storage accounts to visualize output data.

Access the studio from a resource inside the VNet

If you are accessing the studio from a resource inside of a virtual network (for example, a compute instance or virtual machine), you must allow outbound traffic from the virtual network to the studio.

For example, if you are using network security groups (NSG) to restrict outbound traffic, add a rule to a service tag destination of AzureFrontDoor.Frontend.

Firewall settings

Some storage services, such as Azure Storage Account, have firewall settings that apply to the public endpoint for that specific service instance. Usually this setting allows you to allow/disallow access from specific IP addresses from the public internet. This is not supported when using Azure Machine Learning studio. It is supported when using the Azure Machine Learning SDK or CLI.

Tip

Azure Machine Learning studio is supported when using the Azure Firewall service. For more information, see Use your workspace behind a firewall.

Next steps

This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series: