Hash and upload the sensitive information source table for exact data match sensitive information types

This article shows you how to hash and upload your sensitive information source table.

Tip

If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview compliance portal trials hub. Learn details about signing up and trial terms.

Applies to

Hash and upload the sensitive information source table

In this phase, you:

  1. Set up a custom security group and user account.
  2. Set up the EDM Upload Agent tool.
  3. Use the EDM Upload Agent tool to hash, with a salt value, the sensitive information source table, and upload it.

The hashing and uploading can be done using one computer or you can separate the hash step from the upload step for greater security.

If you want to hash and upload from one computer, you need to do it from a computer that can directly connect to your Microsoft 365 tenant. This requires that your clear-text sensitive information source table file is on that computer for hashing.

If you don't want to expose your clear-text sensitive information source table file on the direct access computer, you can hash it on a computer that's in a secure location. In this scenario, the same version of the EDM Upload Agent must be installed on both computers. You can then copy the hash file and the salt file from the secure machine to a computer that can connect directly to your Microsoft 365 tenant.

Important

If you used the Exact Data Match schema and sensitive information type tool to create your schema file, you must download the schema for this procedure if you haven't already done so. See, Exporting the EDM schema file in XML format.

Note

If your organization has set up Customer Key for Microsoft 365 at the tenant level, an exact data match will use the encryption functionality automatically. This is available only to E5 licensed tenants in the Commercial cloud.

Best practices

Separate the processes of hashing and uploading the sensitive data so you can more easily isolate any issues in the process.

Once in production, keep the two steps separate in most cases. To ensure that your actual data never available in clear text form on a computer that might be compromised due to its connection to the internet, run the hashing process on an isolated computer. Then, transfer the file to an internet-facing computer to upload it.

Ensure your sensitive data table doesn't have formatting issues

Before you hash and upload your sensitive data, do a search to validate the presence of special characters that might cause problems in parsing the content.

You can validate that the table is in a suitable format to use with EDM by using the EDM Upload Agent with the following syntax:

EdmUploadAgent.exe /ValidateData /DataFile [data file] /Schema [schema file]

If the tool indicates a mismatch in number of columns, it might be due to the presence of commas or quote characters within values in the table that are being confused with column delimiters. Unless they're surrounding a whole value, single and double quotes can cause the tool to misidentify where an individual column starts or ends.

If you find single or double quote characters surrounding full values: you can leave them as they are.

If you find single quote characters or commas inside a value: for example the person's name Tom O'Neil or the city 's-Gravenhage, which starts with an apostrophe character, you need to modify the data export process used to generate the sensitive information table and surround such columns with double quotes.

If double quote characters are found inside values, it might be preferable to use the Tab-delimited format for the table, which is less susceptible to such issues.

Prerequisites

  • a work or school account for Microsoft 365 to add to the EDM_DataUploaders security group
  • a Windows 10, Windows Server 2016 with .NET version 4.6.2, or a Windows Server 2019 machine for running the EDM Upload Agent
  • a directory on your upload machine for the following:
    • the EDM Upload Agent
    • your sensitive item file in .csv, .tsv or pipe (|) format, PatientRecords.csv in our examples
    • the output hash and salt files created when completing this procedure
    • the datastore name from the edm.xml file, for this example its PatientRecords

Important

  1. If you're using Windows Server 2016 or earler, you must also install Visual C++ prior to installing the EDM Upload Agent.

Set up the security group and user account

  1. As a global administrator, go to the admin center using the appropriate link for your subscription and create a security group called EDM_DataUploaders.

  2. Add one or more users to the EDM_DataUploaders security group. (These users manage the database of sensitive information.)

Hash and upload from one computer

This computer must have direct access to your Microsoft 365 tenant.

Note

Before you begin this procedure, make sure that you are a member of the EDM_DataUploaders security group.

Tip

Optionally, you can run a validation against your sensitive information source table file to check it for errors before uploading by running:

EdmUploadAgent.exe /ValidateData /DataFile [data file] /Schema [schema file]

For more information on all the parameters supported by the EdmUploadAgent.exe, run

EdmUploadAgent.exe /?

  • EDM Upload Agent
  • Commercial + GCC - Most commercial customers should use this option.
  • GCC-High - This option is specifically for high-security government cloud subscribers.
  • DoD - This option is specifically for United States Department of Defense cloud customers.

Note

The EDM Upload Agent at the above links has been updated to automatically add a salt value to the hashed data. Alternately, you can provide your own salt value. Once you have used this version, you will not be able to use the previous version of the EDM Upload Agent.

You can upload data with the EDM Upload Agent to any given data store up to five times per day.

  1. Authorize the EDM Upload Agent, open Command Prompt window as an administrator, switch to the C:\EDM\Data directory, and then run the following command:

    EDM Upload Agent.exe /Authorize

    Important

    You must run the EDM Upload Agent application from the folder where it's installed, and indicate the full path to your data files.

  2. Sign in with your work or school account for Microsoft 365 that was added to the EDM_DataUploaders security group. Your tenant information is extracted from the user account to make the connection.

    IMPORTANT: If you used the Exact Data Match schema and sensitive information type tool to create your schema, you must download it for use in this procedure if you haven't already. Run this command in a Command Prompt window:

    EdmUploadAgent.exe /SaveSchema /DataStoreName <schema name> /OutputDir <path to output folder>
    
  3. To hash and upload the sensitive data, run the following command in Command Prompt window:

    EdmUploadAgent.exe /UploadData /DataStoreName [DS Name] /DataFile [data file] /HashLocation [hash file location] /Schema [Schema file] /AllowedBadLinesPercentage [value]
    

    Note

    The default format for the sensitive data file is comma-separated values. You can specify a tab-separated file by indicating the "{Tab}" option with the /ColumnSeparator parameter, or you can specify a pipe-separated file by indicating the "|" option.

    Example: EdmUploadAgent.exe /UploadData /DataStoreName PatientRecords /DataFile C:\Edm\Hash\PatientRecords.csv /HashLocation C:\Edm\Hash /Schema edm.xml /AllowedBadLinesPercentage 5

EDM and double-byte character set languages

Exact data match supports double-byte characters, such as those used in Chinese, Japanese, and Korean. However, it does not support string matches for corroborative evidence encoded as double byte characters. Neither does it match multi-token CJK text detected in the classified content, unless globalization for EDM has been enabled as described below. In all cases, a SIT must be mapped to any multi-token text, both for the primary field and for corroborative evidence fields.

Important

To invoke exact data matching for double-byte characters, you need to take the following steps:

  1. Create an EDM Sensitive Information Type (SIT) that’s intended to match on the double-byte character set language, such as Japanese kanji.

  2. Ensure you have downloaded and installed version 17.01.0495.0 (or later) of the EDM Upload Agent

  3. Update the EdmUploadAgent.exe.config file’s globalization parameter to true: <add key=" IsGlobalizationEnabled" value="true">

  4. Hash and upload a source table with the data to be matched.

Separate Hash and upload

Perform the hash on a computer in a secure environment. You must have the same version of the EDM Upload Agent installed on both computers.

OPTIONAL: If you created your schema file using the Exact Data Match schema and SIT tool, run the following command in a Command Prompt window to download the file in XML format:

EdmUploadAgent.exe /SaveSchema /DataStoreName <schema name> /OutputDir <path to output folder>
  1. On the computer in the secure environment, run the following command in a Command Prompt window:

    EdmUploadAgent.exe /CreateHash /DataFile [data file] /HashLocation [hash file location] /Schema [Schema file] /AllowedBadLinesPercentage [value]
    

    For example:

    EdmUploadAgent.exe /CreateHash /DataFile C:\Edm\Data\PatientRecords.csv /HashLocation C:\Edm\Hash /Schema edm.xml /AllowedBadLinesPercentage 5
    

    Note

    The default format for the sensitive data file is comma-separated values. You can specify a tab-separated file by indicating the "{Tab}" option with the /ColumnSeparator parameter, or you can specify a pipe-separated file by indicating the "|" option.

    This outputs a hashed file and a salt file with these extensions if you didn't specify the /Salt <saltvalue> option:

    • .EdmHash
    • .EdmSalt
  2. Copy these files in a secure fashion to the computer you use to upload your sensitive information source table file (PatientRecords) to your tenant.

  3. Authorize the EDM Upload Agent, open Command Prompt window as an administrator, switch to the C:\EDM\Data directory, and then run the following command:

    EdmUploadAgent.exe /Authorize
    

    Important

    You must run the EDM Upload Agent application from the folder where it's installed and indicate the full path to your data files.

  4. Sign in with your work or school account for Microsoft 365 that was added to the EDM_DataUploaders security group. Your tenant information is extracted from the user account to make the connection.

  5. To upload the hashed data, run the following command in Windows Command Prompt:

    EdmUploadAgent.exe /UploadHash /DataStoreName \<DataStoreName\> /HashFile \<HashedSourceFilePath\ /ColumnSeparator ["{Tab}"|"|"]
    

    For example:

    EdmUploadAgent.exe /UploadHash /DataStoreName PatientRecords /HashFile C:\\Edm\\Hash\\**PatientRecords.EdmHash**
    
  6. To verify that the upload of your sensitive data was successful, run the following command in a Command Prompt window:

    EdmUploadAgent.exe /GetDataStore
    

    A list of data stores and when they were last updated displays.

  7. To display all of the data uploads to a particular store, and when they were updated, run the following command in a Command Prompt window:

    EdmUploadAgent.exe /GetSession /DataStoreName <DataStoreName>
    

Note

To automate the hash and upload process after you have created it the first time, see Refresh your exact data match sensitive information source table file.

Next steps

or