Classification issue Microsoft Purview

VISHAL SHARMA 5 Reputation points
2024-01-03T10:28:42.54+00:00

I have performed multiple scans on a Snowflake source in Microsoft Purview. To test whether classification is working as expected or not. I have put dummy SSN, Email, Name in my snowflake table. At the time of of Scan I am using default snowflake classification but purview is not classify any column in schema.

Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
1,375 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Debarchan Sarkar - MSFT 1,131 Reputation points Microsoft Employee
    2024-01-08T10:56:54.92+00:00

    Hello -

    If Purview is not classifying Social Security Numbers (SSNs) in your Snowflake data source, there could be several reasons for this. Here are some factors to consider and steps you can take to troubleshoot the issue:

    Scan rule set configuration: Ensure that the scan rule set applied during the scanning process includes the "Social Security Number" classification or a custom classification designed to match SSN patterns.

    Data pattern recognition: Verify that the actual data stored in the Snowflake columns contains valid SSN patterns. Purview uses pattern matching to classify data, so if the data deviates from expected patterns, it may not be classified correctly.

    Column naming convention: Make sure that the column names containing SSNs are consistent with naming conventions typically associated with Social Security Numbers. Although Purview primarily relies on data patterns, the naming of columns can help improve classification accuracy.

    Rescan the data source: After adjusting any configurations or correcting any discrepancies, you might need to rescan the Snowflake data source to see if the classifications are now applied correctly.

    Adjust classification rules: If necessary, you can create custom classification rules tailored to your specific data patterns or requirements.

    Keep in mind that automatic classification can be influenced by multiple factors, including data quality, scan rule set configuration, and data source characteristics. Double-checking these aspects will help ensure accurate classification within your Snowflake data source.

    Also, the scan agents could be sampling rows of your dummy data, so it is better to ensure to insert quite a few number of records in your database before your test run.

    If you still have issues with SSN classification after considering these factors, I would recommend you to open a support ticket with Microsoft as this may need a deeper investigation.

    0 comments No comments

  2. VISHAL SHARMA 5 Reputation points
    2024-01-09T11:23:42.5666667+00:00

    Hi Debarchan Sarkar

    I have tried all the different options, whether it's relate to custom pattern, increase the dummy data records and putting simple email data so that email data pattern will work but it seems nothing works.

    To make sure I have not missed any classification, I used standard snowflake scan rule set which include email in personal classification.

    When I change the data source as SQL server with same set of data with my own classification pattern, classification was able to classify the data but same data is not working with Snowflake.

    As I am using free tier, I can't create support ticket.

    Thanks

    Vishal

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.