Microsoft Purview is not able to scan serverless table's fields.

Prayag kumar 20 Reputation points
2023-05-27T18:20:36.7033333+00:00

While scanning Synapse Serverless SQL tables from Microsoft Purview, not able to reach till column-level details.

Trying to figure out what could be the reason for not scanning the table fields.

Any leads for this issue would be appreciable.

Please find below the screenshot of the scan and the result. Scan gives the Synapse Workspace > Serverless SL Database > Schema > Table

Purview Error

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,562 questions
Azure Storage
Azure Storage
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,540 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,378 questions
Microsoft Security | Microsoft Purview
0 comments No comments
{count} votes

Accepted answer
  1. Konstantinos Passadis 19,591 Reputation points MVP
    2023-05-27T18:34:28.2966667+00:00

    Hello

    Welcome to Microsoft QnA !

    I understand you are having trouble scanning Synapse Serverless with Purview

    There are specific prerequsites to do that !

    https://learn.microsoft.com/en-us/azure/purview/register-scan-synapse-workspace?tabs=MI

    For Scanning : (https://www.serverlesssql.com/integrating-azure-purview-with-synapse-analytics-serverless-sql-pools/)

    Scan Data Source to Discover & Classify Assets

    Once we have registered a data source, we can initiate a scan and instruct Purview what to scan. By clicking the New Scan icon on the Synapse data source, we can now enter a name for the scan and select the Serverless SQL Pools databases we would like to scan. However, we are instructed to add the Managed Identity of the Purview account to the Synapse Analytics workspace.

    To allow Purview to connect to and scan the Serverless SQL Pools databases, we must do the following. Please note that the instructions below are taken from Microsoft’s official documentation available here. The steps below are relevant to Serverless SQL Pools.

    Add Purview Account to the Synapse Workspace

    • Make a note of the Managed Identify, in the case of this particular account it is called dhpurviewone
    • In the Azure portal https://portal.azure.com, go to the relevant Azure Synapse workspace resource you added to Purview.
    • On the left pane, select Access Control (IAM).
    • Select the Add button.
    • Set the Reader role and enter your Azure Purview account name (the Managed Identity name)
    • Select Save

    Add Purview Account as a Storage Blob Data Reader

    Purview must have access to any storage accounts that contain data that Serverless SQL Pools references.

    • In the Azure portal, go to the Subscription that the Azure Synapse workspace is in. You can also go to Resource Groups to localise permissions, but in this blog post we are assigning permissions at the subscription level.
    • On the left pane, select Access Control (IAM).
    • Select the Add button.
    • Select the Storage blob data reader role and enter your Azure Purview account name (the Managed Identity name).
    • Select Save

    Create Login for Purview Account

    • Go to your Azure Synapse workspace and open the Synapse Studio.
    • Select the Data tab on the left menu.
    • Create a new SQL Script and run the command below on the master database, change the LOGIN name to the appropriate Purview account you created.
    CREATE LOGIN [dhpurviewone] FROM EXTERNAL PROVIDER;
    

    Add Purview Account as User to Databases

    We must now add the Purview Login as a User to each database we wish to be scanned. To do this:

    • Go to your Azure Synapse workspace and open the Synapse Studio.
    • Select the Data tab on the left menu.
    • Create a new SQL Script and run the command below on any appropriate database
    CREATE USER [dhpurviewone] FOR LOGIN [dhpurviewone];
    
    ALTER ROLE db_datareader ADD MEMBER [dhpurviewone]; 
    

    If there are a number of databases to scan then use the following script to enumerate over each database to assign permissions:

    CREATE TABLE #Databases (DatabaseRowNumber INT, DatabaseName VARCHAR(255))
    
    INSERT INTO #Databases
    SELECT row_number() OVER (ORDER BY database_id), [name] 
    FROM sys.databases 
    WHERE [name] NOT IN ('master','default')
    AND is_auto_create_stats_on = 1
    ORDER BY database_id
    
    DECLARE @mindatabase INT,
            @maxdatabase INT,
            @sqltext NVARCHAR(4000)
    
    SELECT @mindatabase = MIN(DatabaseRowNumber) FROM #Databases
    SELECT @maxdatabase = MAX(DatabaseRowNumber) FROM #Databases
    
    WHILE @mindatabase <= @maxdatabase
    BEGIN
        SET @sqltext = '; CREATE USER [dhpurviewone] FOR LOGIN [dhpurviewone]; ALTER ROLE db_datareader ADD MEMBER [dhpurviewone];'
    
        SELECT @sqltext = CONCAT('USE ',DatabaseName,@sqltext)
        FROM #Databases
        WHERE DatabaseRowNumber = @mindatabase
    
        PRINT @sqltext
    
        EXEC sp_executesql @tsl = @sqltext
    
        SET @mindatabase = @mindatabase +1
    END;
    
    DROP TABLE #Databases
    

    Once permissions have been set, we should now be able to select the databases we wish to scan. In this example we have a database called sqldatapurviewtest which contains several Views and an External Table that references an Azure Data Lake Gen2 storage account.

    We’ll then click Continue, accept the default scan rule set set, the scan trigger to Once, then finally click Save and Run. Once the scan is running we can click the View Details link on the Synapse Analytics data source in the Data Map view.

    Show Results of Scan

    Once the scan has completed we can see a results page showing us the success/failure status of scans, how many scans have been initiated, how many assets (eg External Tables and Views) have been scanned, and how many of those assets have been automatically classified.

    Serverless SQL Pools Data Processed

    As Purview is scanning Serverless SQL Pools databases for objects to register, it will incur data processed charges so please be aware of this. All queries have a minimum of 10MB data processed. In our example, we have a total of 94MB data processed.

    I hope this helps!

    Kindly mark the answer as Accepted and Upvote in case it helped!

    Regards

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.