Create the schema for exact data match based sensitive information types
If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview compliance portal trials hub. Learn details about signing up and trial terms.
- Classic experience
If you aren't familiar with EDM-based SITS or their implementation, you should familiarize yourself with:
- Learn about sensitive information types
- Learn about exact data match based sensitive information types
- Get started with exact data match based sensitive information types
A single EDM schema can be used in multiple sensitive information types that use the same sensitive data table. You can create up to 10 different EDM schemas in a Microsoft 365 tenant.
Use the Exact Data Match Schema and Sensitive Information Type Tool
You can use this tool to help simplify the schema file creation process.
- Perform the steps in Export source data for exact data match based sensitive information type.
Use the exact data match schema and sensitive information type pattern tool
In the Microsoft Purview compliance portal for your tenant, go to Data classification > Exact data matches > EDM schemas.
Choose Create EDM schema to open the schema tool configuration flyout.
Fill in an appropriate Name and Description.
Choose Ignore delimiters and punctuation for all schema fields if you want that behavior for the entire schema. For more information about configuring EDM to ignore case or delimiters, see Using the caseInsensitive and ignoredDelimiters fields for more details on this feature.
Fill in your desired values for your Schema field #1 and add more fields as needed. Each schema field must be identical to the column headers in your sensitive information source file.
If you want, set the per field values for:
- Field is searchable
- Field is case-insensitive
- Choose delimiters and punctuation to ignore for this field
- Enter custom delimiters and punctuation for this field
At least one, but no more than ten of your schema fields must be designated as searchable.
Choose Save. Your schema is now listed and available for use.
If you want to remove a schema, and it is already associated with an EDM sensitive info type, you must first delete the EDM sensitive info type, then you can delete the schema. Deleting a schema that has a data store associated with it also deletes the data store within 24 hours.
Export of the EDM schema file in XML format
If you created the EDM schema in the EDM schema tool, you must export the EDM schema file in XML format. You'll need it in the Hash and upload the sensitive information source table for exact data match sensitive information types phase.
To export the EDM schema file, use this syntax:
$Schema = Get-DlpEdmSchema -Identity "[your EDM Schema name]" Set-Content -Path ".\Schemafile.xml" -Value $Schema.EdmSchemaXML
Save this file for later use.
Create exact data match schema manually and upload
In the schema file, configure an entry for each column in the sensitive information source table, using the syntax:
<Field name="FieldName" searchable="true/false" caseInsensitive="true/false" ignoredDelimiters="delimiter characters" />
Using the caseInsensitive and ignoredDelimiters fields
The following schema XML sample makes use of the caseInsensitive and the ignoredDelimiters fields.
When you include the caseInsensitive field set to the value of
true in your schema definition, EDM won't exclude an item based on case differences. For example, EDM sees the values FOO-1234 and fOo-1234 as being identical for the
When you include the ignoredDelimiters field with supported characters, EDM ignores those characters. So EDM sees the values FOO-1234 and FOO#1234 as being identical for the
In this example, where both
ignoredDelimiters are used, EDM would see FOO-1234 and fOo#1234 as identical and classify the item as a patient record sensitive information type.
Both these parameters are used on a per field basis.
If you configure spaces to be ignored, this will only be effective for primary field columns and for which a sensitive information type that can detect multi-word strings is defined. Otherwise the comparison will be made against each individual word in the content being analyzed.
The ignoredDelimiters flag supports any nonalphanumeric character, here are some examples:
ignoredDelimiters flag doesn't support:
- characters 0-9
When defining your EDM sensitive information type, ignoreDelimiters will not affect how the Classification sensitive information type associated with the primary element in an EDM pattern identifies content in an item. So if you configure ignoreDelimiters for a searchable field you need to make sure the sensitive information type used for a primary element based on that field will pick strings both with and without those characters present.
The number of columns in your sensitive information source table and the number of fields in your schema must match, order doesn't matter.
The characters that are used as token separators behave differently than the other delimiters. Here are some examples:
- \ (space)
When you include a token separator, EDM breaks the token where the separator is. For example, EDM sees the value Middle-Last Name into Middle-Last and Name for the
LastName field. If the ignoredDelimiters is included for the
LastName field with the character '-', that action only happens after the value is broken. In the end, EDM would see the following values MiddleLast and Name.
To use the following characters as ignoredDelimiters and not token separators, a SIT that matches the corresponding format needs to be associated with the field. For example, a SIT that detects a multi-word string with dashes in it needs to be associated with the
It's possible to associate SITs to secondary elements using PowerShell.
Define the schema in XML format (similar to the following example). Name this schema file edm.xml, and configure it such that for each column in the sensitive information source table, there's a line that uses the syntax:
\<Field name="" searchable=""/\>.
- Use column names for Field name values.
- Use searchable="true" for the fields that you want to be searchable and primary fields up to a maximum of five fields. At least one field must be searchable.
As an example, the following XML file defines the schema for a patient records database, with five fields specified as searchable: PatientID, MRN, SSN, Phone, and DOB.
(You can copy, modify, and use our example.)
<EdmSchema xmlns="http://schemas.microsoft.com/office/2018/edm"> <DataStore name="PatientRecords" description="Schema for patient records" version="1"> <Field name="PatientID" searchable="true" caseInsensitive="true" ignoredDelimiters="-,/,*,#,^" /> <Field name="MRN" searchable="true" /> <Field name="FirstName" /> <Field name="LastName" /> <Field name="SSN" searchable="true" /> <Field name="Phone" searchable="true" /> <Field name="DOB" searchable="true" /> <Field name="Gender" /> <Field name="Address" /> </DataStore> </EdmSchema>
Once you have created the EDM schema file in XML format, you have to upload it to the cloud service.
To upload the database schema, run the following command:
New-DlpEdmSchema -FileData ([System.IO.File]::ReadAllBytes('.\\edm.xml')) -Confirm:$true
You'll be prompted to confirm, as follows:
Are you sure you want to perform this action?
New EDM Schema for the data store 'patientrecords' will be imported.
[Y] Yes [A] Yes to All [N] No [L] No to All [?] Help (default is "Y"):
If you want your changes to occur without confirmation, don't use
-Confirm:$truein Step 3.
It can take between 10-60 minutes to update the EDMSchema with additions. The update must complete before you execute steps that use the additions.