Classify data using sensitive information types

14 minutes

Identifying and classifying sensitive items that are under your organization's control is the first step in the Information Protection discipline. Microsoft Purview provides three ways of identifying items so that they can be classified:

Manually by users
Automated pattern recognition, like sensitive information types
Machine learning

Sensitive information types (SIT) are pattern-based classifiers. They detect sensitive information like social security, credit card, or bank account numbers to identify sensitive items.

Microsoft provides a large number of preconfigured SITs or you can create your own.

Sensitive information types are used in

Categories of sensitive information types

Built-in sensitive information types

Microsoft provides a range of preconfigured SITs that are readily available within Microsoft Purview. These built-in SITs cover commonly recognized sensitive information types, such as social security numbers, credit card numbers, and email addresses. While these SITs can't be edited, they can serve as templates for creating custom sensitive information types.

Named entity sensitive information types

Named entity SITs automatically identify specific types of named entities, such as person names, physical addresses, or medical terms and conditions. These SITs are also predefined and can't be edited or copied.

Custom sensitive information types

If the preconfigured sensitive information types don't meet your needs, you can create your own custom SITs that you fully define. You can also copy one of the preconfigured SITs and modify it.

Exact data match sensitive information types

Exact data match (EDM)-based SITs are built from scratch. EDM-based classification enables you to create custom sensitive information types that refer to exact values in a database of sensitive information.

Fundamental parts of a sensitive information type

Every sensitive information type consists of:

Name: A descriptive name that identifies the sensitive information type.
Description: An explanation of what the sensitive information type aims to detect.
Pattern: A pattern defines the specific characteristics or criteria that indicate the presence of sensitive information. Patterns consist of these elements.
- Primary element – the main element that the sensitive information type is looking for. It can be a regular expression with or without a checksum validation, a keyword list, a keyword dictionary, or a function.
- Supporting element – an element that acts as supporting evidence that help in increasing the confidence of the match. For example, keyword "SSN" in proximity to a Social Security Number (SSN). It can be a regular expression with or without a checksum validation, keyword list, keyword dictionary.
- Confidence Level - confidence levels (high, medium, low) reflect how much supporting evidence is detected along with the primary element. The more supporting evidence an item contains, the higher the confidence that a matched item contains the sensitive info you're looking for.
- Proximity – the number of characters between the primary and supporting elements.

Creating custom sensitive information types

Microsoft Purview provides multiple options for creating custom sensitive information types to cater to your organization's unique needs:

Use the UI - You can set up a custom sensitive information type using the Microsoft Purview compliance portal UI. With this method, you can use regular expressions, keywords, and keyword dictionaries.
Use EDM - You can set up custom sensitive information types using Exact Data Match (EDM)-based classification. This method enables you to create a dynamic sensitive information type using a secure database that you can refresh periodically.
Use PowerShell - You can set up custom sensitive information types using PowerShell. Although this method is more complex than using the UI, you have more configuration options.

Provide match/not a match accuracy feedback in sensitive info types

Microsoft Purview allows you to view the number of matches a sensitive information type has and provide feedback on whether an item is correctly classified or not. This feedback mechanism helps fine-tune the accuracy of sensitive information types, ensuring more precise and reliable classification results.

Sensitive information types interactive guide

Use the Classify data using sensitive info types interactive guide to learn more about builtin and custom sensitive information types.