Connect to Azure Blob storage in Microsoft Purview

This article outlines the process to register and govern Azure Blob Storage accounts in Microsoft Purview including instructions to authenticate and interact with the Azure Blob Storage source

Supported capabilities

Metadata Extraction Full Scan Incremental Scan Scoped Scan Classification Access Policy Lineage Data Sharing
Yes Yes Yes Yes Yes Yes (preview) Limited** Yes

** Lineage is supported if dataset is used as a source/sink in Data Factory Copy activity

For file types such as csv, tsv, psv, ssv, the schema is extracted when the following logics are in place:

  • First row values are non-empty
  • First row values are unique
  • First row values aren't a date or a number

Prerequisites

** Lineage is supported if dataset is used as a source/sink in Data Factory Copy activity

Register

This section will enable you to register the Azure Blob storage account for scan and data share in Purview.

Prerequisites for register

  • You'll need to be a Data Source Admin and one of the other Purview roles (for example, Data Reader or Data Share Contributor) to register a source and manage it in the Microsoft Purview governance portal. See our Microsoft Purview Permissions page for details.

Steps to register

It is important to register the data source in Microsoft Purview prior to setting up a scan for the data source.

  1. Go to the Azure portal, and navigate to the Microsoft Purview accounts page and select your Purview account

  2. Open Microsoft Purview governance portal and navigate to the Data Map --> Sources

    Screenshot that shows the link to open Microsoft Purview governance portal

    Screenshot that navigates to the Sources link in the Data Map

  3. Create the Collection hierarchy using the Collections menu and assign permissions to individual subcollections, as required

    Screenshot that shows the collection menu to create collection hierarchy

  4. Navigate to the appropriate collection under the Sources menu and select the Register icon to register a new Azure Blob data source

    Screenshot that shows the collection used to register the data source

  5. Select the Azure Blob Storage data source and select Continue

    Screenshot that allows selection of the data source

  6. Provide a suitable Name for the data source, select the relevant Azure subscription, existing Azure Blob Storage account name and the collection and select Apply. Leave the Data Use Management toggle on the disabled position until you have a chance to carefully go over this document.

    Screenshot that shows the details to be entered in order to register the data source

  7. The Azure Blob storage account will be shown under the selected Collection

    Screenshot that shows the data source mapped to the collection to initiate scanning

Scan

For file types such as csv, tsv, psv, ssv, the schema is extracted when the following logics are in place:

  • First row values are non-empty
  • First row values are unique
  • First row values are not a date or a number

Authentication for a scan

Your Azure network may allow for communications between your Azure resources, but if you've set up firewalls, private endpoints, or virtual networks within Azure, you'll need to follow one of these configurations below.

Networking constraints Integration runtime type Available credential types
No private endpoints or firewalls Azure IR Managed identity (Recommended), service principal, or account key
Firewall enabled but no private endpoints Azure IR Managed identity
Private endpoints enabled *Self-Hosted IR Service principal, account key

*To use a self-hosted integration runtime, you'll first need to create one and confirm your network settings for Microsoft Purview

Using a system or user assigned managed identity for scanning

There are two types of managed identity you can use:

  • System-assigned managed identity (Recommended) - As soon as the Microsoft Purview Account is created, a system-assigned managed identity (SAMI) is created automatically in Azure AD tenant. Depending on the type of resource, specific RBAC role assignments are required for the Microsoft Purview system-assigned managed identity (SAMI) to perform the scans.

  • User-assigned managed identity (preview) - Similar to a system managed identity, a user-assigned managed identity (UAMI) is a credential resource that can be used to allow Microsoft Purview to authenticate against Azure Active Directory. For more information, you can see our User-assigned managed identity guide. It's important to give your Microsoft Purview account the permission to scan the Azure Blob data source. You can add access for the SAMI or UAMI at the Subscription, Resource Group, or Resource level, depending on what level scan permission is needed.

Note

If you have firewall enabled for the storage account, you must use managed identity authentication method when setting up a scan.

Note

You need to be an owner of the subscription to be able to add a managed identity on an Azure resource.

  1. From the Azure portal, find either the subscription, resource group, or resource (for example, an Azure Blob storage account) that you would like to allow the catalog to scan.

    Screenshot that shows the storage account

  2. Select Access Control (IAM) in the left navigation and then select + Add --> Add role assignment

    Screenshot that shows the access control for the storage account

  3. Set the Role to Storage Blob Data Reader and enter your Microsoft Purview account name or user-assigned managed identity under Select input box. Then, select Save to give this role assignment to your Microsoft Purview account.

    Screenshot that shows the details to assign permissions for the Microsoft Purview account

  4. Go into your Azure Blob storage account in Azure portal

  5. Navigate to Security + networking > Networking

  6. Choose Selected Networks under Allow access from

  7. In the Exceptions section, select Allow trusted Microsoft services to access this storage account and hit Save

    Screenshot that shows the exceptions to allow trusted Microsoft services to access the storage account

Note

For more details, please see steps in Authorize access to blobs and queues using Azure Active Directory

Using Account Key for scanning

When authentication method selected is Account Key, you need to get your access key and store in the key vault:

  1. Navigate to your Azure Blob storage account

  2. Select Security + networking > Access keys

    Screenshot that shows the access keys in the storage account

  3. Copy your key and save it separately for the next steps

    Screenshot that shows the access keys to be copied

  4. Navigate to your key vault

    Screenshot that shows the key vault

  5. Select Settings > Secrets and select + Generate/Import

    Screenshot that shows the key vault option to generate a secret

  6. Enter the Name and Value as the key from your storage account

    Screenshot that shows the key vault option to enter the secret values

  7. Select Create to complete

  8. If your key vault isn't connected to Microsoft Purview yet, you'll need to create a new key vault connection

  9. Finally, create a new credential using the key to set up your scan

Using Service Principal for scanning

Creating a new service principal

If you need to Create a new service principal, it's required to register an application in your Azure AD tenant and provide access to Service Principal in your data sources. Your Azure AD Global Administrator or other roles such as Application Administrator can perform this operation.

Getting the Service Principal's Application ID
  1. Copy the Application (client) ID present in the Overview of the Service Principal already created

    Screenshot that shows the Application (client) ID for the Service Principal

Granting the Service Principal access to your Azure Blob account

It's important to give your service principal the permission to scan the Azure Blob data source. You can add access for the service principal at the Subscription, Resource Group, or Resource level, depending on what level scan access is needed.

Note

You need to be an owner of the subscription to be able to add a service principal on an Azure resource.

  1. From the Azure portal, find either the subscription, resource group, or resource (for example, an Azure Blob Storage storage account) that you would like to allow the catalog to scan.

    Screenshot that shows the storage account

  2. Select Access Control (IAM) in the left navigation and then select + Add --> Add role assignment

    Screenshot that shows the access control for the storage account

  3. Set the Role to Storage Blob Data Reader and enter your service principal under Select input box. Then, select Save to give this role assignment to your Microsoft Purview account.

    Screenshot that shows the details to provide storage account permissions to the service principal

Creating the scan

  1. Open your Microsoft Purview account and select the Open Microsoft Purview governance portal

  2. Navigate to the Data map --> Sources to view the collection hierarchy

  3. Select the New Scan icon under the Azure Blob data source registered earlier

    Screenshot that shows the screen to create a new scan

If using a system or user assigned managed identity

Provide a Name for the scan, select the Microsoft Purview accounts SAMI or UAMI under Credential, choose the appropriate collection for the scan, and select Test connection. On a successful connection, select Continue

Screenshot that shows the managed identity option to run the scan

If using Account Key

Provide a Name for the scan, select the Azure IR or your Self-Hosted IR depending on your configuration, choose the appropriate collection for the scan, and select Authentication method as Account Key and select Create

Screenshot that shows the Account Key option for scanning

If using Service Principal

  1. Provide a Name for the scan, select the Azure IR or your Self-Hosted IR depending on your configuration, choose the appropriate collection for the scan, and select the + New under Credential

    Screenshot that shows the option for service principal to enable scanning

  2. Select the appropriate Key vault connection and the Secret name that was used while creating the Service Principal. The Service Principal ID is the Application (client) ID copied earlier

    Screenshot that shows the service principal option

  3. Select Test connection. On a successful connection, select Continue

Scoping and running the scan

  1. You can scope your scan to specific folders and subfolders by choosing the appropriate items in the list.

    Scope your scan

  2. Then select a scan rule set. You can choose between the system default, existing custom rule sets, or create a new rule set inline.

    Scan rule set

  3. If creating a new scan rule set, select the file types to be included in the scan rule.

    Scan rule set file types

  4. You can select the classification rules to be included in the scan rule

    Scan rule set classification rules

    Scan rule set selection

  5. Choose your scan trigger. You can set up a schedule or run the scan once.

    scan trigger

  6. Review your scan and select Save and run.

    review scan

Viewing Scan

  1. Navigate to the data source in the Collection and select View Details to check the status of the scan

    view scan

  2. The scan details indicate the progress of the scan in the Last run status and the number of assets scanned and classified

    view scan details

  3. The Last run status will be updated to In progress and then Completed once the entire scan has run successfully

    view scan in progress

    view scan completed

Managing Scan

Scans can be managed or run again on completion

  1. Select the Scan name to manage the scan

    manage scan

  2. You can run the scan again, edit the scan, delete the scan

    manage scan options

  3. You can run an incremental scan or a full scan again.

    full or incremental scan

Data sharing

Microsoft Purview Data Sharing (preview) enables sharing of data in-place from Azure Blob storage account to Azure Blob storage account. This section provides details about the Azure Blob storage account specific requirements for sharing and receiving data in-place. Refer to How to share data and How to receive share for step by step guide on how to use data share.

Storage accounts supported for in-place data sharing

The following storage accounts are supported for in-place data sharing:

  • Regions: Canada Central, Canada East, UK South, UK West, Australia East, Japan East, Korea South, and South Africa North
  • Redundancy options: LRS, GRS, RA-GRS
  • Tiers: Hot, Cool

Only use storage account without production workload for the preview.

Note

Source and target storage accounts must be in the same region as each other. They don't need to be in the same region as the Microsoft Purview account.

Storage account permissions required to share data

To add or update a storage account asset to a share, you need ONE of the following permissions:

  • Microsoft.Authorization/roleAssignments/write - This permission is available in the Owner role.
  • Microsoft.Storage/storageAccounts/blobServices/containers/blobs/modifyPermissions/ - This permission is available in the Blob Storage Data Owner role.

Storage account permissions required to receive shared data

To map a storage account asset in a received share, you need ONE of the following permissions:

  • Microsoft.Storage/storageAccounts/write - This permission is available in the Contributor and Owner role.
  • Microsoft.Storage/storageAccounts/blobServices/containers/write - This permission is available in the Contributor, Owner, Storage Blob Data Contributor and Storage Blob Data Owner role.

Update shared data in source storage account

Updates you make to shared files or data in the shared folder from source storage account will be made available to recipient in target storage account in near real time. When you delete subfolder or files within the shared folder, they'll disappear for recipient. To delete the shared folder, file or parent folders or containers, you need to first revoke access to all your shares from the source storage account.

Access shared data in target storage account

The target storage account enables recipient to access the shared data read-only in near real time. You can connect analytics tools such as Synapse Workspace and Databricks to the shared data to perform analytics. Cost of accessing the shared data is charged to the target storage account.

Service limit

Source storage account can support up to 20 targets, and target storage account can support up to 100 sources. If you require an increase in limit, contact Support.

Access policy

Supported policies

The following types of policies are supported on this data resource from Microsoft Purview:

Access policy pre-requisites on Azure Storage accounts

Configure the subscription where the Azure Storage account resides for policies from Microsoft Purview

To enable Microsoft Purview to manage policies for one or more Azure Storage accounts, execute the following PowerShell commands in the subscription where you'll deploy your Azure Storage account. These PowerShell commands will enable Microsoft Purview to manage policies on all newly created Azure Storage accounts in that subscription.

If you’re executing these commands locally, be sure to run PowerShell as an administrator. Alternatively, you can use the Azure Cloud Shell in the Azure portal: https://shell.azure.com.

# Install the Az module
Install-Module -Name Az -Scope CurrentUser -Repository PSGallery -Force
# Login into the subscription
Connect-AzAccount -Subscription <SubscriptionID>
# Register the feature
Register-AzProviderFeature -FeatureName AllowPurviewPolicyEnforcement -ProviderNamespace Microsoft.Storage

If the output of the last command shows RegistrationState as Registered, then your subscription is enabled for access policies. If the output is Registering, wait at least 10 minutes, and then retry the command. Do not continue unless the RegistrationState shows as Registered.

Region support

  • All Microsoft Purview regions are supported.
  • Microsoft Purview access policies can only be enforced in the following Azure Storage regions:
    • East US
    • East US2
    • South Central US
    • West US
    • West US2
    • Canada Central
    • North Europe
    • West Europe
    • France Central
    • UK South
    • East Asia
    • Southeast Asia
    • Japan East
    • Japan West
    • Australia East
  • The following regions support only new Azure Storage accounts. That is, Storage accounts created in the subscription after the feature AllowPurviewPolicyEnforcement is Registered.
    • West US
    • East Asia
    • Japan East
    • Japan West

If needed, you can also create a new Storage account by following this guide.

Configure the Microsoft Purview account for policies

Configure permissions to enable Data use management on the data source

Before a policy can be created in Microsoft Purview for a resource, you must configure permissions. To enable the Data use management toggle for a data source, resource group, or subscription, the same user must have both specific identity and access management (IAM) privileges on the resource and specific Microsoft Purview privileges:

  • The user must have either one of the following IAM role combinations on the resource's Azure Resource Manager path or any parent of it (that is, using IAM permission inheritance):

    • IAM Owner
    • Both IAM Contributor and IAM User Access Administrator

    To configure Azure role-based access control (RBAC) permissions, follow this guide. The following screenshot shows how to access the Access Control section in the Azure portal for the data resource to add a role assignment.

    Screenshot that shows the section in the Azure portal for adding a role assignment.

  • The same user needs to have the Microsoft Purview Data source admin role for the collection or a parent collection (if inheritance is enabled). For more information, see the guide on managing Microsoft Purview role assignments.

    The following screenshot shows how to assign the Data source admin role at the root collection level.

    Screenshot that shows selections for assigning the Data source admin role at the root collection level.

Configure Microsoft Purview permissions to create, update, or delete access policies

The following permissions are needed in Microsoft Purview at the root collection level:

  • The Policy author role can create, update, and delete DevOps and Data Owner policies.
  • The Policy author role can delete self-service access policies.

For more information about managing Microsoft Purview role assignments, see Create and manage collections in the Microsoft Purview Data Map.

Note

Currently, Microsoft Purview roles related to creating, updating, and deleting policies must be configured at the root collection level.

In addition, to easily search Azure AD users or groups when creating or updating the subject of a policy, the Policy Author may greatly benefit from having the Directory Readers permission in Azure AD. This is a common permission for users in an Azure tenant. Without the Directory Reader permission, the Policy Author will have to type the complete username or email for all the principals included in the subject.

Configure Microsoft Purview permissions for publishing Data Owner policies

Data Owner policies allow for checks and balances if you assign the Microsoft Purview Policy author and Data source admin roles to different people in the organization. Before a data policy takes effect, a second person (Data source admin) must review it and explicitly approve it by publishing it. Publishing is automatic after DevOps or self-service access policies are created or updated, so it doesn't apply to these types of policies.

The following permissions are needed in Microsoft Purview at the root collection level:

  • The Data source admin role can publish a policy.

For more information about managing Microsoft Purview role assignments, see Create and manage collections in the Microsoft Purview Data Map.

Note

Currently, Microsoft Purview roles related to publishing Data Owner policies must be configured at the root collection level.

Delegate access provisioning responsibility to roles in Microsoft Purview

After a resource has been enabled for Data use management, any Microsoft Purview user with the Policy author role at the root collection level can provision access to that data source from Microsoft Purview.

The IAM Owner role for a data resource can be inherited from a parent resource group, a subscription, or a subscription management group. Check which Azure AD users, groups, and service principals hold or are inheriting the IAM Owner role for the resource.

Note

Any Microsoft Purview root Collection admin can assign new users to root Policy author roles. Any Collection admin can assign new users to a Data source admin role under the collection. Minimize and carefully vet the users who hold Microsoft Purview Collection admin, Data source admin, or Policy author roles.

If a Microsoft Purview account with published policies is deleted, such policies will stop being enforced within an amount of time that depends on the specific data source. This change can have implications on both security and data access availability. The Contributor and Owner roles in IAM can delete Microsoft Purview accounts.

You can check these permissions by going to the Access control (IAM) section for your Microsoft Purview account and selecting Role Assignments. You can also use a lock to prevent the Microsoft Purview account from being deleted through Resource Manager locks.

Register the data source in Microsoft Purview for Data Use Management

The Azure Storage resource needs to be registered first with Microsoft Purview before you can create access policies. To register your resource, follow the Prerequisites and Register sections of this guide:

After you've registered the data source, you'll need to enable Data Use Management. This is a pre-requisite before you can create policies on the data source. Data Use Management can impact the security of your data, as it delegates to certain Microsoft Purview roles managing access to the data sources. Go through the secure practices related to Data Use Management in this guide: How to enable Data Use Management

Once your data source has the Data Use Management option set to Enabled, it will look like this screenshot: Screenshot shows how to register a data source for policy with the option Data use management set to enable

Create a policy

To create an access policy for Azure Blob Storage, follow this guide: Data owner policy on a single storage account.

To create policies that cover all data sources inside a resource group or Azure subscription you can refer to this section.

Next steps

Follow the below guides to learn more about Microsoft Purview and your data.