Modify Exact Data Match schema to use configurable matching

Tip

If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview compliance portal trials hub. Learn details about signing up and trial terms.

Applies to

  • Exact data match (EDM) sensitive information type (SIT) creation using PowerShell.

Exact Data Match (EDM) based classification enables you to create custom sensitive information types that refer to exact values in a database of sensitive information. When you need to allow for variants of an exact string, you can use configurable matching to tell Microsoft Purview to ignore case and some delimiters.

Important

Use this procedure to modify an existing EDM schema and data file.

  1. Uninstall the EdmUploadAgent.exe from the computer that you use to connect to Microsoft 365 for EDM schema and data file upload purposes

  2. Download the appropriate EdmUploadAgent.exe file for your subscription using the following links:

    • Commercial + GCC - Most commercial customers should use this option
    • GCC-High - This option is specifically for high-security government cloud subscribers
    • DoD - This option is specifically for United States Department of Defense cloud customers
  3. Authorize the EDM Upload Agent, open a Command Prompt window (as an administrator) and run the following command:

    EdmUploadAgent.exe /Authorize
    
  4. If you don't have a current copy of the existing schema, you need to download a copy of the existing schema. To do so, run this command:

    EdmUploadAgent.exe /SaveSchema /DataStoreName <dataStoreName> [/OutputDir [Output dir location]]
    
  5. Customize the schema so each column utilizes caseInsensitive and / or ignoredDelimiters. The default value for caseInsensitive is "false" and for ignoredDelimiters, the default is an empty string.

    Note

    The underlying custom sensitive information type or built-in sensitive information type used to detect the general regex pattern must support detection of the input variations listed with ignoredDelimiters. For example, the built-in U.S. Social Security Number (SSN) sensitive information type can detect variations in the data that include dashes, spaces, or lack of spaces between the grouped numbers that make up the SSN. As a result, the only delimiters that are relevant to include in EDM’s ignoredDelimiters for SSN data are: dash and space.

    Here's a sample schema that simulates case-insensitive matching by creating the extra columns needed to recognize case variations in the sensitive data.

    <EdmSchema xmlns="http://schemas.microsoft.com/office/2018/edm">
      <DataStore name="PatientRecords" description="Schema for patient records policy" version="1">
               <Field name="PolicyNumber" searchable="true" />
               <Field name="PolicyNumberLowerCase" searchable="true" />
               <Field name="PolicyNumberUpperCase" searchable="true" />
               <Field name="PolicyNumberCapitalLetters" searchable="true" />
      </DataStore>
    </EdmSchema>
    

    In the above example, the variations of the original PolicyNumber column aren't necessary if both caseInsensitive and ignoredDelimiters are added.

    To update this schema so that EDM uses configurable matching, use the caseInsensitive and ignoredDelimiters flags. Here's how that looks:

    <EdmSchema xmlns="http://schemas.microsoft.com/office/2018/edm">
      <DataStore name="PatientRecords" description="Schema for patient records policy" version="1">
             <Field name="PolicyNumber" searchable="true" caseInsensitive="true" ignoredDelimiters="-,/,*,#,^" />
      </DataStore>
    </EdmSchema>
    

    For information on the characters supported by the ignoredDelimiters flag, see Using the caseInsensitive and ignoredDelimiters fields.

  6. Connect to Security & Compliance PowerShell

    Note

    If your organization has set up Customer Key for Microsoft 365 at the tenant level, Exact Data Match will make use of its encryption functionality automatically. This is available only to E5 licensed tenants in the Commercial cloud. For more information, see Overview of Customer Key.

  7. Update your schema by running the following command:

    Set-DlpEdmSchema -FileData ([System.IO.File]::ReadAllBytes('.\\edm.xml')) -Confirm:$true
    
  8. If necessary, update the data file to match the new schema version.

    Tip

    Optionally, you can run a validation against your CSV file before uploading it by running:

    EdmUploadAgent.exe /ValidateData /DataFile [data file] /Schema [schema file]

    For example: EdmUploadAgent.exe /ValidateData /DataFile C:\data\testdelimiters.csv /Schema C:\EDM\patientrecords.xml

    For more information on all of the parameters supported by EdmUploadAgent.exe, run

    EdmUploadAgent.exe /?

  9. Open a Command Prompt window (as an administrator) and run the following command to hash and upload your sensitive data:

    EdmUploadAgent.exe /UploadData /DataStoreName [DS Name] /DataFile [data file] /HashLocation [hash file location] /Salt [custom salt] /Schema [Schema file]