SQL Server 2019 Data discovery and classification .

sakuraime 2,351 Reputation points
2021-07-08T06:04:19.717+00:00

SSMS has the ability to do the data classification MANUALLY .. may I know the discovery of the data also need to do in manually ?? I can't find any buttons allow me to do the data discovery , I can only see 'Data classification'

Actually apart from doing some labeling , classification , auditing ( data_sensitivity_information) , give some reports of how many data classified manually... what's the actual other benefit for this feature ?

and anyone knows how to parse the audit event column 'data_sensitive_information' from the audit file ? the result is xml and would like to further expand to column

112862-image.png

SQL Server | Other
{count} votes

2 answers

Sort by: Most helpful
  1. Martin Cairney 2,266 Reputation points
    2021-07-08T06:43:34.143+00:00

    It depends on what you are interpreting as the meaning of "Discovery"?

    The "Discovery" happens behind the scenes and looks for pre-defined patterns in the column names to identify those columns that may need a Sensitivity Label applied. To see what rules it is using and to customise these, you need to export the configuration as follows:
    112779-classification.png

    This exports as JSON - you can then edit this and use the same menu to Import the new rules and execute against that.

    The "Classification" then groups those into the useful nomenclature (which you CAN customise for your own organisation).

    The settings and storage of the data depends upon what version of SQL Server you have as well as the version of SSMS that you use. For SQL Server 2019, the metadata can also be added through TSQL using
    ADD SENSITIVITY CLASSIFICATION
    command - see here

    The benefits of using this arise from some of the aspects you already called out:

    AUDITING - you can see who has accessed sensitive data and whether they SHOULD have accessed this data. It can help you tighten up your security or help to identify a Data Breach - obviously a good thing.

    METADATA - I have also used the ability to extract columns that are sensitive to produce further TSQL scripts to add either Data Masking or Encryption to all columns that have specific Sensitivity Labels - saves a lot of manual work.

    To parse the column you need something like:

    WITH AuditWithXML as (  
        SELECT event_time, action_id, database_name, statement, CAST(data_sensitivity_information as xml) as d  
        FROM sys.fn_get_audit_file ('path_to_your_audit_file',default,default)  
    )  
    SELECT event_time, action_id, database_name, statement,  
    	   h.ep.value('@label','nvarchar(100)') as [Label],  
           h.ep.value('@information_type', 'nvarchar(100)') as [InformationType]  
    FROM AuditWithXML  
           OUTER APPLY d.nodes('/sensitivity_attributes/sensitivity_attribute') as h(ep);  
    GO  
    

  2. Seeya Xi-MSFT 16,676 Reputation points
    2021-07-08T08:18:32.933+00:00

    Hi @sakuraime ,

    Try to understand discovery and classification as a continuous process.
    You can apply SQL Data Discovery and Classification as written by Martin.
    You can also manually classify columns as an alternative, or in addition, to the recommendation-based classification:
    112880-1.png
    For more information, please refer to MS docs: SQL Data Discovery and Classification
    And this article also can help you unstand the benefit for this feature.

    Best regards,
    Seeya


    If the response is helpful, please click "Accept Answer" and upvote it, as this could help other community members looking for similar queries.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.