Configure secure access with managed identities and virtual networks

This content applies to: checkmark v4.0 (preview) checkmark v3.1 (GA) checkmark v3.0 (GA) checkmark v2.1 (GA)

This how-to guide walks you through the process of enabling secure connections for your Document Intelligence resource. You can secure the following connections:

  • Communication between a client application within a Virtual Network (VNET) and your Document Intelligence Resource.

  • Communication between Document Intelligence Studio and your Document Intelligence resource.

  • Communication between your Document Intelligence resource and a storage account (needed when training a custom model).

You're setting up your environment to secure the resources:

Screenshot of secure configuration with managed identity and virtual networks.

Prerequisites

To get started, you need:

Configure resources

Configure each of the resources to ensure that the resources can communicate with each other:

  • Configure the Document Intelligence Studio to use the newly created Document Intelligence resource by accessing the settings page and selecting the resource.

  • Validate that the configuration works by selecting the Read API and analyzing a sample document. If the resource was configured correctly, the request successfully completes.

  • Add a training dataset to a container in the Storage account you created.

  • Select the custom model tile to create a custom project. Ensure that you select the same Document Intelligence resource and the storage account you created in the previous step.

  • Select the container with the training dataset you uploaded in the previous step. Ensure that if the training dataset is within a folder, the folder path is set appropriately.

  • If you have the required permissions, the Studio sets the CORS setting required to access the storage account. If you don't have the permissions, you need to ensure that the CORS settings are configured on the Storage account before you can proceed.

  • Validate that the Studio is configured to access your training data, if you can see your documents in the labeling experience, all the required connections are established.

You now have a working implementation of all the components needed to build a Document Intelligence solution with the default security model:

Screenshot of default security configuration.

Next, complete the following steps:

  • Setup managed identity on the Document Intelligence resource.

  • Secure the storage account to restrict traffic from only specific virtual networks and IP addresses.

  • Configure the Document Intelligence managed identity to communicate with the storage account.

  • Disable public access to the Document Intelligence resource and create a private endpoint to make it accessible from only specific virtual networks and IP addresses.

  • Add a private endpoint for the storage account in a selected virtual network.

  • Validate that you can train models and analyze documents from within the virtual network.

Setup managed identity for Document Intelligence

Navigate to the Document Intelligence resource in the Azure portal and select the Identity tab. Toggle the System assigned managed identity to On and save the changes:

Screenshot of configure managed identity.

Secure the Storage account to limit traffic

Start configuring secure communications by navigating to the Networking tab on your Storage account in the Azure portal.

  1. Under Firewalls and virtual networks, choose Enabled from selected virtual networks and IP addresses from the Public network access list.

  2. Ensure that Allow Azure services on the trusted services list to access this storage account is selected from the Exceptions list.

  3. Save your changes.

Screenshot of configure storage firewall.

Note

Your storage account won't be accessible from the public internet.

Refreshing the custom model labeling page in the Studio will result in an error message.

Enable access to storage from Document Intelligence

To ensure that the Document Intelligence resource can access the training dataset, you need to add a role assignment for your managed identity.

  1. Staying on the storage account window in the Azure portal, navigate to the Access Control (IAM) tab in the left navigation bar.

  2. Select the Add role assignment button.

    Screenshot of add role assignment window.

  3. On the Role tab, search for and select the Storage Blob Data Reader permission and select Next.

    Screenshot of choose a role tab.

  4. On the Members tab, select the Managed identity option and choose + Select members

  5. On the Select managed identities dialog window, select the following options:

    • Subscription. Select your subscription.

    • Managed Identity. Select Form Recognizer.

    • Select. Choose the Document Intelligence resource you enabled with a managed identity.

    Screenshot of managed identities dialog window.

  6. Close the dialog window.

  7. Finally, select Review + assign to save your changes.

Great! You configured your Document Intelligence resource to use a managed identity to connect to a storage account.

Tip

When you try the Document Intelligence Studio, you'll see the READ API and other prebuilt models don't require storage access to process documents. However, training a custom model requires additional configuration because the Studio can't directly communicate with a storage account. You can enable storage access by selecting Add your client IP address from the Networking tab of the storage account to configure your machine to access the storage account via IP allowlisting.

Configure private endpoints for access from VNETs

Note

  • The resources are only accessible from the virtual network.

  • Some Document Intelligence features in the Studio like auto label require the Document Intelligence Studio to have access to your storage account.

  • Add our Studio IP address, 20.3.165.95, to the firewall allowlist for both Document Intelligence and Storage Account resources. This is Document Intelligence Studio's dedicated IP address and can be safely allowed.

When you connect to resources from a virtual network, adding private endpoints ensures both the storage account, and the Document Intelligence resource are accessible from the virtual network.

Next, configure the virtual network to ensure only resources within the virtual network or traffic router through the network have access to the Document Intelligence resource and the storage account.

Enable your firewalls and virtual networks

  1. In the Azure portal, navigate to your Document Intelligence resource.

  2. Select the Networking tab from the left navigation bar.

  3. Enable the Selected Networking and Private Endpoints option from the Firewalls and virtual networks tab and select save.

Note

If you try accessing any of the Document Intelligence Studio features, you'll see an access denied message. To enable access from the Studio on your machine, select the Add your client IP address checkbox and Save to restore access.

Screenshot showing how to disable public access to Document Intelligence.

Configure your private endpoint

  1. Navigate to the Private endpoint connections tab and select the + Private endpoint. You're navigated to the Create a private endpoint dialog page.

  2. On the Create private endpoint dialog page, select the following options:

    • Subscription. Select your billing subscription.

    • Resource group. Select the appropriate resource group.

    • Name. Enter a name for your private endpoint.

    • Region. Select the same region as your virtual network.

    • Select Next: Resource.

    Screenshot showing how to set-up a private endpoint

Configure your virtual network

  1. On the Resource tab, accept the default values and select Next: Virtual Network.

  2. On the Virtual Network tab, make sure that you select the virtual network that you created.

  3. If you have multiple subnets, select the subnet where you want the private endpoint to connect. Accept the default value to Dynamically allocate IP address.

  4. Select Next: DNS

  5. Accept the default value Yes to integrate with private DNS zone.

    Screenshot showing how to configure private endpoint

  6. Accept the remaining defaults and select Next: Tags.

  7. Select Next: Review + create .

Well done! Your Document Intelligence resource now is only accessible from the virtual network and any IP addresses in the IP allowlist.

Configure private endpoints for storage

Navigate to your storage account on the Azure portal.

  1. Select the Networking tab from the left navigation menu.

  2. Select the Private endpoint connections tab.

  3. Choose add + Private endpoint.

  4. Provide a name and choose the same region as the virtual network.

  5. Select Next: Resource.

    Screenshot showing how to create a private endpoint

  6. On the resource tab, select blob from the Target sub-resource list.

  7. select Next: Virtual Network.

    Screenshot showing how to configure a private endpoint for a blob.

  8. Select the Virtual network and Subnet. Make sure Enable network policies for all private endpoints in this subnet is selected and the Dynamically allocate IP address is enabled.

  9. Select Next: DNS.

  10. Make sure that Yes is enabled for Integrate with private DNS zone.

  11. Select Next: Tags.

  12. Select Next: Review + create.

Great work! You now have all the connections between the Document Intelligence resource and storage configured to use managed identities.

Note

The resources are only accessible from the virtual network and allowed IPs.

Studio access and analyze requests to your Document Intelligence resource will fail unless the request originates from the virtual network or is routed via the virtual network.

Validate your deployment

To validate your deployment, you can deploy a virtual machine (VM) to the virtual network and connect to the resources.

  1. Configure a Data Science VM in the virtual network.

  2. Remotely connect into the VM from your desktop to launch a browser session to access Document Intelligence Studio.

  3. Analyze requests and the training operations should now work successfully.

That's it! You can now configure secure access for your Document Intelligence resource with managed identities and private endpoints.

Common error messages

  • Failed to access Blob container:

    Screenshot of error message when CORS config is required

    Resolution:

    1. Configure CORS.

    2. Make sure the client computer can access Document Intelligence resource and storage account, either they are in the same VNET, or client IP address is allowed in Networking > Firewalls and virtual networks setting page of both Document Intelligence resource and storage account.

  • AuthorizationFailure:

    Screenshot of authorization failure error.

    Resolution: Make sure the client computer can access Document Intelligence resource and storage account, either they are in the same VNET, or client IP address is allowed in Networking > Firewalls and virtual networks setting page of both Document Intelligence resource and storage account.

  • ContentSourceNotAccessible:

    Screenshot of content source not accessible error.

    Resolution: Make sure you grant your Document Intelligence managed identity the role of Storage Blob Data Reader and enabled Trusted services access or Resource instance rules on the networking tab.

  • AccessDenied:

    Screenshot of an access denied error.

    Resolution: Check to make sure there's connectivity between the computer accessing the Document Intelligence Studio and the Document Intelligence service. For example, you might need to add the client IP address to the Document Intelligence service's networking tab.

Next steps