Get started with exact data match based sensitive information types

Tip

If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview compliance portal trials hub. Learn details about signing up and trial terms.

Applies to:

Creating and making an exact data match (EDM) based sensitive information type (SIT) available is a multi-phase process. You can use the new experience the existing classic experience, or PowerShell. This article helps you understand the differences between the two experiences and helps you pick the right one for your needs.

EDM SITs can be used in:

  • Microsoft Purview Data Loss Prevention
  • Auto-labeling (service and client side)
  • Microsoft Purview Insider Risk Management policies
  • Microsoft Purview eDiscovery
  • Microsoft Purview Insider Risk Management
  • Microsoft Defender for Cloud Apps

Before you begin

Familiarize yourself with the concepts and terminology in these articles:

Supported regions

The exact data match feature is available in these regions:

  • Asia Pacific
  • Australia
  • Brazil
  • Canada
  • Europe
  • France
  • Germany
  • India
  • Japan
  • Korea
  • Norway
  • South Africa
  • Switzerland
  • United Arab Emirates
  • United Kingdom
  • United States
  • US DoD
  • US GCC
  • US GCCH

You can find out where your tenant is hosting data-at-rest in by following these procedures Where your Microsoft 365 customer data is stored and referring to the data center city locations in that article.

Required licenses and permissions

You must be a Global Admin, Compliance Admin, or Exchange Admin to perform the tasks described in this article. To learn more about DLP permissions, see Permissions in the Microsoft Purview compliance portal.

Important

Microsoft recommends that you use roles with the fewest permissions. This helps improve security for your organization. Global Administrator is a highly privileged role that should only be used in scenarios where a lesser privileged role can't be used.

See the data loss prevention service description for complete licensing information

Portal World-wide / GCC GCC-High DOD
Office SCC compliance.microsoft.com scc.office365.us scc.protection.apps.mil
Microsoft Defender portal security.microsoft.com security.microsoft.us security.apps.mil
Microsoft Purview Compliance Portal compliance.microsoft.com compliance.microsoft.us compliance.apps.mil

Select the appropriate tab for the portal you're using. To learn more about the Microsoft Purview portal, see Microsoft Purview portal. To learn more about the Compliance portal, see Microsoft Purview compliance portal.

New EDM experience

The new EDM experience combines the functionality of the EDM schema and EDM sensitive information type tools into a single user experience. The new experience offers the following benefits:

For more information on these advantages, read on.

Simplified workflow

With the new experience, the schema and SIT are created via one user experience. This means there are fewer clicks, better guidance on mapping primary elements to default SITs, and clearer descriptions of default confidence levels for the rules.

When you need to see the status of an EDM SIT in the creation process, the new experience reports on this in the UI.

  • Data not yet uploaded
  • Data upload percent
  • Data upload complete
  • Indexing complete
  • Data upload failed
  • Data indexing failed

Automated schema and SIT creation

In the new experience you can provide a sample data file that has the same header values and enough rows (10-20) of representative data to the system. The system validates the format and creates the schema based on the headers. You then identify the primary fields in the schema and the system recommends the SITs that best match the primary fields to associate them with. If you don't want to upload the file, you can enter the same values manually in the UI.

Important

Be sure to use sample data values that aren't sensitive; however, also make sure that the sample values are in the same format as your actual sensitive data. Using non-sensitive data is essential because the sample data file doesn't get encrypted and hashed when you upload it in the same way that your actual sensitive information table does. The data from the sample data file is not retained nor accessible once the EDM SIT is created.

The system generates the EDM SIT detection rules, one for each primary field. Based on detection of the primary fields, the system creates high and medium confidence rules using all the other fields as corroborative evidence. You can manually add low confidence rules if needed.

Additional guardrails to ensure better performance

The system warns you if it finds a primary field mapped to a SIT that detects a broad range of values, called a loosely defined SIT. This can cause the system to perform lookups on large numbers of strings that aren't related to the kind of content that you're looking for. Mapping between these types of SITs and primary fields can result in false negatives and decrease performance.

Note

A loosely defined SIT, such as a custom SIT that looks for all personal identification numbers, has detection rules that allow for greater variability in the items detected. A strongly defined SIT, such as a U.S. Social Security Number, has detection rules that only allow a narrow, well-defined set of items to be detected.

The system will also warn you if the values in the primary field you select occur multiple times in a large number of rows. This can cause large numbers of result sets to be returned and processed, which could cause a time out. Time outs can result in missed detections and poor performance.

Choosing the right EDM SIT creation experience for you

You can toggle back and forth between the new and classic experiences, but we recommend using the new experience unless your needs fall into one or more of these four use cases, as described below.

To choose the best method of creating EDM SITs for your needs:

  1. Read through this section
  2. Choose the experience that you want to use
  3. Select the link for the next step for the experience you want.

Mapping multiple EDM SITS to the same schema

In EDM, you can create a maximum of 10 schemas. Each time you create an EDM SIT using the new experience, a new schema is created. This results in a 1:1 mapping between EDM schema and EDM SIT. The new experience doesn't support mapping multiple SITs to the same schema.

Creating or managing more than 10 EDM SITs

Because the new experience doesn't support mapping multiple SITs to the same schema, you are limited to creating and managing 10 EDM SITS. In the classic experience, you can map multiple EDM SITs to the same schema and so have more than 10 EDM SITs. Using the new flow, you'll receive an error if you try to create an eleventh EDM schema and you won't be able to view more than 10 EDM SITs.

Specifying the name of your EDM schema

If you need to specify a name for your EDM SIT schemas, you have to use the classic experience to create and manage them. Because the new experience automatically creates the schema, you don't get the opportunity to give your schema a custom name. The auto-generated name is a concatenation of the EDM SIT name and the word schema. For example, if the EDM SIT name is PatientNumber, the schema name would be PatientNumberschema.

Editing EDM schemas created in the Classic experience

All schemas that are created using the classic experience or uploaded as an XML file using PowerShell are not viewable or manageable in the new experience.

Next steps

or

See also