SQL Server 2019 Data discovery and classification .

Question

SQL Server 2019 Data discovery and classification .

sakuraime 2,351

SSMS has the ability to do the data classification MANUALLY .. may I know the discovery of the data also need to do in manually ?? I can't find any buttons allow me to do the data discovery , I can only see 'Data classification'

Actually apart from doing some labeling , classification , auditing ( data_sensitivity_information) , give some reports of how many data classified manually... what's the actual other benefit for this feature ?

and anyone knows how to parse the audit event column 'data_sensitive_information' from the audit file ? the result is xml and would like to further expand to column

Seeya Xi-MSFT 16,676 Reputation points

2021-07-13T01:40:01.183+00:00

Hi @sakuraime ,

We have not received a response from you. Did the reply could help you? If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. By doing so, it will benefit all community members who are having this similar issue. Your contribution is highly appreciated.

2 answers

Your answer

Seeya Xi-MSFT 16,676 Reputation points

2021-07-13T01:40:01.183+00:00

Hi @sakuraime ,

We have not received a response from you. Did the reply could help you? If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. By doing so, it will benefit all community members who are having this similar issue. Your contribution is highly appreciated.

Answer 1

It depends on what you are interpreting as the meaning of "Discovery"?

The "Discovery" happens behind the scenes and looks for pre-defined patterns in the column names to identify those columns that may need a Sensitivity Label applied. To see what rules it is using and to customise these, you need to export the configuration as follows:

This exports as JSON - you can then edit this and use the same menu to Import the new rules and execute against that.

The "Classification" then groups those into the useful nomenclature (which you CAN customise for your own organisation).

The settings and storage of the data depends upon what version of SQL Server you have as well as the version of SSMS that you use. For SQL Server 2019, the metadata can also be added through TSQL using
ADD SENSITIVITY CLASSIFICATION
command - see here

The benefits of using this arise from some of the aspects you already called out:

AUDITING - you can see who has accessed sensitive data and whether they SHOULD have accessed this data. It can help you tighten up your security or help to identify a Data Breach - obviously a good thing.

METADATA - I have also used the ability to extract columns that are sensitive to produce further TSQL scripts to add either Data Masking or Encryption to all columns that have specific Sensitivity Labels - saves a lot of manual work.

To parse the column you need something like:

WITH AuditWithXML as (  
    SELECT event_time, action_id, database_name, statement, CAST(data_sensitivity_information as xml) as d  
    FROM sys.fn_get_audit_file ('path_to_your_audit_file',default,default)  
)  
SELECT event_time, action_id, database_name, statement,  
	   h.ep.value('@label','nvarchar(100)') as [Label],  
       h.ep.value('@information_type', 'nvarchar(100)') as [InformationType]  
FROM AuditWithXML  
       OUTER APPLY d.nodes('/sensitivity_attributes/sensitivity_attribute') as h(ep);  
GO

sakuraime 2,351 Reputation points

2021-07-10T13:12:47.51+00:00

"The "Discovery" happens behind the scenes and looks for pre-defined patterns in the column names to identify those columns that may need a Sensitivity Label applied."

you mean behind the scenes , which mean how frequent ? daily or hourly ?
It just discover by the column name , instead of the 'actual data'??

Answer 2

Seeya Xi-MSFT 16,676

Hi @sakuraime ,

Try to understand discovery and classification as a continuous process.
You can apply SQL Data Discovery and Classification as written by Martin.
You can also manually classify columns as an alternative, or in addition, to the recommendation-based classification:

For more information, please refer to MS docs: SQL Data Discovery and Classification
And this article also can help you unstand the benefit for this feature.

Best regards,
Seeya

If the response is helpful, please click "Accept Answer" and upvote it, as this could help other community members looking for similar queries.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

sakuraime 2,351 Reputation points

2021-07-13T09:50:49.747+00:00

as a continuous process.?? it's a thread inside sql server? is it possible to stop it ?

imaging the database have many tables/schema (more than hundread thousands of columns) , I question this kind of 'continuous process' will slower down the database.

tthanks
Seeya Xi-MSFT 16,676 Reputation points

2021-07-13T10:06:39.35+00:00

Hi @sakuraime ,

What I mean is that they are all the capabilities of Data Discovery & Classification rather than a a continuous process. And sorry, i didn't express it clear.
Discovering and classifying your most sensitive data (business, financial, healthcare, etc.) can play a pivotal role in your organizational information protection stature. It can serve as infrastructure for:
Helping meet data privacy standards.
Monitoring access to databases/columns containing highly sensitive data.

Best regards,
Seeya
sakuraime 2,351 Reputation points

2021-07-13T13:14:35.62+00:00

so may I confirm the discovery start once I trigger 'Classfy data'???
Seeya Xi-MSFT 16,676 Reputation points

2021-07-14T02:05:29.893+00:00

Hi @sakuraime ,

Yes. Classifying data requires knowing the location, volume, and context of data. Before you can perform data classification, you must perform accurate and comprehensive data discovery. Automated tools can help discover sensitive data at large scale.
There is information about data discovery in this link, which may be helpful.

Best regards,
Seeya

Share via

SQL Server 2019 Data discovery and classification .

2 answers

Your answer