Connect to and manage Azure Synapse Analytics workspaces in Microsoft Purview

This article outlines how to register Azure Synapse Analytics workspaces and how to authenticate and interact with Azure Synapse Analytics workspaces in Microsoft Purview. For more information about Microsoft Purview, read the introductory article.

Supported capabilities

Metadata Extraction Full Scan Incremental Scan Scoped Scan Classification Labeling Access Policy Lineage Data Sharing
Yes Yes Yes No Yes No No Yes- Synapse pipelines No

Note

Currently, Azure Synapse lake databases are not supported.

Prerequisites

Register

This section describes how to register Azure Synapse Analytics workspaces in Microsoft Purview using the Microsoft Purview governance portal.

Authentication for registration

Only a user with at least a Reader role on the Azure Synapse workspace and who is also data source administrators in Microsoft Purview can register an Azure Synapse workspace.

Steps to register

  1. Open the Microsoft Purview governance portal by:

  2. On the left pane, select Sources.

  3. Select Register.

  4. Under Register sources, select Azure Synapse Analytics (multiple).

  5. Select Continue.

    Screenshot of a selection of sources in Microsoft Purview, including Azure Synapse Analytics.

  6. On the Register sources (Azure Synapse Analytics) page, do the following:

    a. Enter a Name for the data source to be listed in the data catalog.
    b. Optionally, choose a subscription to filter down to.
    c. In the Workspace name dropdown list, select the workspace that you're working with.
    d. In the endpoints dropdown lists, the SQL endpoints are automatically filled in based on your workspace selection.
    e. In the Select a collection dropdown list, choose the collection you're working with or, optionally, create a new one.
    f. Select Register to finish registering the data source.

    Screenshot of the 'Register sources (Azure Synapse Analytics)' page for entering details about the Azure Synapse source.

Scan

Follow the steps below to scan Azure Synapse Analytics workspaces to automatically identify assets and classify your data. For more information about scanning in general, see our introduction to scans and ingestion.

  1. You'll first need to set up authentication for enumerating for either your dedicated or serverless resources. This will allow Microsoft Purview to enumerate your workspace assets and perform scans.
  2. Then, you'll need to apply permissions to scan the contents of the workspace.
  3. Lastly, confirm your network is set up to allow access for Microsoft Purview.

Tip

To troubleshoot any issues with scanning:

  1. Confirm you have followed all prerequisites.
  2. Confirm you have set up enumeration authentication for your resources.
  3. Confirm authentication is properly set up.
  4. Check network by confirming firewall settings.
  5. Review our scan troubleshooting documentation.

Enumeration authentication

Authentication for enumerating dedicated SQL database resources

  1. In the Azure portal, go to the Azure Synapse workspace resource.

  2. On the left pane, select Access Control (IAM).

    Note

    You must be an owner or user access administrator to add a role on the resource.

  3. Select the Add button.

  4. Set the Reader role and enter your Microsoft Purview account name, which represents its managed service identity (MSI).

  5. Select Save to finish assigning the role.

Note

If you're planning to register and scan multiple Azure Synapse workspaces in your Microsoft Purview account, you can also assign the role from a higher level, such as a resource group or a subscription.

Authentication for enumerating serverless SQL database resources

There are three places you'll need to set authentication to allow Microsoft Purview to enumerate your serverless SQL database resources:

The steps below will set permissions for all three.

Azure Synapse workspace
  1. In the Azure portal, go to the Azure Synapse workspace resource.

  2. On the left pane, select Access Control (IAM).

    Note

    You must be an owner or user access administrator to add a role on the resource.

  3. Select the Add button.

  4. Set the Reader role and enter your Microsoft Purview account name, which represents its managed service identity (MSI).

  5. Select Save to finish assigning the role.

Storage account
  1. In the Azure portal, go to the Resource group or Subscription that the storage account associated with the Azure Synapse workspace is in.

  2. On the left pane, select Access Control (IAM).

    Note

    You must be an owner or user access administrator to add a role in the Resource group or Subscription fields.

  3. Select the Add button.

  4. Set the Storage blob data reader role and enter your Microsoft Purview account name (which represents its MSI) in the Select box.

  5. Select Save to finish assigning the role.

Azure Synapse serverless database
  1. Go to your Azure Synapse workspace and open the Synapse Studio.
  2. Select the Data tab on the left menu.
  3. Select the ellipsis (...) next to one of your databases, and then start a new SQL script.
  4. Add the Microsoft Purview account MSI (represented by the account name) on the serverless SQL databases. You do so by running the following command in your SQL script:
    CREATE LOGIN [PurviewAccountName] FROM EXTERNAL PROVIDER;
    

Apply permissions to scan the contents of the workspace

You can set up authentication for an Azure Synapse source any of the following options. Select your scenario below for steps to apply permissions.

  • Use a managed identity
  • Use a service principal
  • Use SQL Authentication

Important

These steps for serverless databases do not apply to replicated databases. Currently in Synapse, serverless databases that are replicated from Spark databases are read-only. For more information, go here.

Note

You must set up authentication on each SQL database that you intended to register and scan from your Azure Synapse workspace.

Use a managed identity for dedicated SQL databases

  1. Go to your Azure Synapse workspace.

  2. Go to the Data section, and then look for one of your dedicated SQL databases.

  3. Select the ellipsis (...) next to it, and then start a new SQL script.

    Note

    To run the commands in the following procedure, you must be an Azure Synapse administrator on the workspace. For more information about Azure Synapse Analytics permissions, see: Set up access control for your Azure Synapse workspace.

  4. Add the Microsoft Purview account MSI (represented by the account name) as db_datareader on the dedicated SQL database. You do so by running the following command in your SQL script:

    CREATE USER [PurviewAccountName] FROM EXTERNAL PROVIDER
    GO
    
    EXEC sp_addrolemember 'db_datareader', [PurviewAccountName]
    GO
    
  5. Follow the same steps for each database you want to scan.

Use a managed identity for serverless SQL databases

  1. Go to your Azure Synapse workspace.

  2. Go to the Data section, and select one of your SQL databases.

  3. Select the ellipsis (...) next to your database, and then start a new SQL script.

  4. Add the Microsoft Purview account MSI (represented by the account name) as db_datareader on the serverless SQL databases. You do so by running the following command in your SQL script:

    CREATE USER [PurviewAccountName] FOR LOGIN [PurviewAccountName];
    ALTER ROLE db_datareader ADD MEMBER [PurviewAccountName]; 
    
  5. Follow the same steps for each database you want to scan.

Grant permission to use credentials for external tables

If the Azure Synapse workspace has any external tables, the Microsoft Purview managed identity must be given References permission on the external table scoped credentials. With the References permission, Microsoft Purview can read data from external tables.

GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::[scoped_credential] TO [PurviewAccountName];

Set up Azure Synapse workspace firewall access

  1. In the Azure portal, go to the Azure Synapse workspace.

  2. On the left pane, select Networking.

  3. For Allow Azure services and resources to access this workspace control, select ON.

  4. Select Save.

Important

Currently, if you cannot enable Allow Azure services and resources to access this workspace on your Azure Synapse workspaces, when set up scan on Microsoft Purview governance portal, you will hit serverless DB enumeration failure. In this case, you can choose the "Enter manually" option to specify the database names that you want to scan, and proceed. Learn more from Create and run scan.

Create and run scan

To create and run a new scan, do the following:

  1. Select the Data Map tab on the left pane in the Microsoft Purview governance portal.

  2. Select the data source that you registered.

  3. Select View details, and then select New scan. Alternatively, you can select the Scan quick action icon on the source tile.

  4. On the Scan details pane, in the Name box, enter a name for the scan.

  5. In the Credential dropdown list, select the credential to connect to the resources within your data source.

  6. For Database selection method, choose From Synapse workspace or Enter manually. By default, Microsoft Purview tries to enumerate the databases under the workspace, and you can select the ones you want to scan. In case you hit error that Microsoft Purview fails to load the serverless databases, you can choose "Enter manually" to specify the type of database (dedicated or serverless) and the corresponding database name.

    Screenshot of the details pane for the Azure Synapse source scan.

    Option of "Enter manually":

    Screenshot of the section of manually enter database names when setting up scan.

  7. Select Test connection to validate the settings. In case of any error, in the report page, hover on the "Connection status" to see details.

  8. Select Continue to proceed.

  9. Select Scan rule sets of type Azure Synapse SQL. You can also create scan rule sets inline.

  10. Choose your scan trigger. You can schedule it to run weekly/monthly or once.

  11. Review your scan, and then select Save to complete the setup.

View your scans and scan runs

To view existing scans:

  1. Go to the Microsoft Purview governance portal. Select the Data map tab on the left pane.
  2. Select the desired data source. You can view a list of existing scans on that data source under Recent scans, or you can view all scans on the Scans tab.
  3. Select the scan that has results you want to view. The page shows you all of the previous scan runs, along with the status and metrics for each scan run.
  4. Click the run ID to check more about the scan run details.

Manage your scans - edit, delete, or cancel

To manage or delete a scan:

  1. Go to the Microsoft Purview governance portal. Select the Data Map tab on the left pane.

  2. Select the desired data source. You can view a list of existing scans on that data source under Recent scans, or you can view all scans on the Scans tab.

  3. Select the scan that you want to manage. You can then:

    • Edit the scan by selecting Edit scan.
    • Cancel an in-progress scan by selecting Cancel scan run.
    • Delete your scan by selecting Delete scan.

Note

  • Deleting your scan does not delete catalog assets created from previous scans.
  • The asset will no longer be updated with schema changes if your source table has changed and you re-scan the source table after editing the description on the Schema tab of Microsoft Purview.

Set up scan using API

Here's an example of creating scan for serverless DB using API. Replace the {place_holder} and enum_option_1 | enum_option_2 (note) value with your actual settings. Learn more from Microsoft Purview REST API - Scans - Create Or Update.

PUT https://{purview_account_name}.purview.azure.com/scan/datasources/<data_source_name>/scans/{scan_name}?api-version=2022-02-01-preview

Important

The collection_id is not the friendly name for the collection, a 5-character ID. For the root collection, the collection_id will be the name of the collection. For all sub-collections, it is instead the ID that can be found in one of two places:

  1. The URL in the Microsoft Purview governance portal. Select the collection, and check the URL to find where it says collection=. That will be your ID. So, for our example below, the Investments collection has the ID 50h55c. Screenshot of the collection ID in the URL.
  2. You can list child collection names of the root collection to list the collections, and you'll use the 'name' instead of the 'friendly name'.
{
    "properties":{
        "resourceTypes":{
            "AzureSynapseServerlessSql":{
                "scanRulesetName":"AzureSynapseSQL",
                "scanRulesetType":"System",
                "resourceNameFilter":{
                    "resources":[ "{serverless_database_name_1}", "{serverless_database_name_2}", ...]
                }
            }
        },
        "credential":{
            "referenceName":"{credential_name}",
            "credentialType":"SqlAuth | ServicePrincipal | ManagedIdentity (if UAMI authentication)"
        },
        "collection":{
            "referenceName":"{collection_id}",
            "type":"CollectionReference"
        },
        "connectedVia":{
            "referenceName":"{integration_runtime_name}",
            "integrationRuntimeType":"SelfHosted (if self-hosted IR) | Managed (if VNet IR)"
        }
    },
    "kind":"AzureSynapseWorkspaceCredential | AzureSynapseWorkspaceMsi (if system-assigned managed identity authentication)"
}

To schedule the scan, additionally create a trigger for it after scan creation, refer to Triggers - Create Trigger.

Next steps

Now that you've registered your source, follow the below guides to learn more about Microsoft Purview and your data.