Data matching for Subject Rights Requests
With data matching, organizations can enable Microsoft Priva to identify data subjects based on exact supplied data values. This can help increase the accuracy of locating data subject content that corresponds with those data values both for your internal personnel and for external users you interact with. It also simplifies the need to supply fields manually during subject rights request creation, and provides context within subject rights requests and for the Overview tile that showcases your items with the most data subject content. To learn more about that view, see Find and visualize personal data in Priva.
To use the data matching feature, you'll need to be a member of the Privacy Management role group. From within Priva in the Microsoft Purview compliance portal, select Settings in the top nav and then Data matching. From here, you'll need to define the personal data schema and provide a personal data upload as shown below. Note that you can add items, and you can delete items you add, but you can't modify an item.
Prepare for data import
Before defining the schema or uploading data, you will need to identify the source of your data subject information. The required file format is .csv, which can be read by an application such as Microsoft Excel. Structure this export so that your column headers appear in the first row. These headers should include the names of the attributes for your personal data schema. Check the format of the data in each field. If any of the data contains commas, surround these values with double quotes to ensure it will not be parsed into separate fields.
Define the personal data schema
The first step in setting up data matching is defining the personal data schema, which will describe the attributes for your data subjects. You'll upload this schema on the first tab in the data matching settings area. The required files include a personal data schema XML file and a rule package XML file.
Personal data schema XML
The personal data schema file is an XML file that will define what column names are expected.
- Name this schema file pdm.xml.
- Define each column name using the Field Name tag as seen in the example below.
- Use searchable = “true” for fields you want to be searchable, up to a maximum of five fields. At least one of your field names must be searchable. Sample syntax:
\<Field name="" searchable=""/>
. - The personal data schema has a DataStore tag section. Four mandatory fields must be mapped to your field names: primaryKeyField, upnField, firstNameField, lastNameField.
As an example, the following XML file defines a sample schema, with five fields specified as searchable: PatientID, MRN, SSN, Phone, and DOB. The primaryKeyField is mapped to PatientID, upnField is mapped to MRN, firstNameField is mapped to FirstName, and lastNameField is mapped to LastName.
You can copy, modify, and use our example.
<PdmSchema xmlns="http://schemas.microsoft.com/office/2020/pdm">
<DataStore name="Patientrecords" description="Schema for patient records" version="1" primaryKeyField="PatientID" upnField="MRN" firstNameField="FirstName" lastNameField="LastName">
<Field name="PatientID" searchable="true"/>
<Field name="MRN" searchable="true" />
<Field name="FirstName" />
<Field name="LastName" />
<Field name="SSN" searchable="true" />
<Field name="Phone" searchable="true" />
<Field name="DOB" searchable="true" />
<Field name="Gender" />
<Field name="Address" />
</DataStore>
</PdmSchema>
Rule package XML
When you set up your rule package, make sure to correctly reference your personal data schema file created above: pdm.xml. In the following sample rule package XML, the following fields need to be customized to create your data match sensitive type:
- RulePack id & PrivacyMatch id: Use New-GUID to generate a GUID.
- Datastore: This field specifies the personal data match lookup data store to be used. Provide the defined DataStore name of a configured personal data schema.
- idMatch: This field points to the primary element for the personal data match.
- Matches: Specifies the field to be used in exact lookup. Provide a searchable field name from the personal data schema.
- Classification: This field specifies the sensitive type match that triggers personal data match lookup. You can provide the Name or GUID of an existing built-in or custom sensitive information type. In order to avoid causing performance issues, if you use a custom sensitive information type as the Classification element in personal data match, do not use a custom sensitive information type that will match a large percentage of content (such as "any number" or "any five-letter word"). We recommend adding supporting keywords or including formatting in the definition of the custom classification sensitive information type.
- Match: This field points to additional evidence found in proximity of idMatch.
- Matches: Provide any field name in the personal data schema for DataStore.
- Resource: This section specifies the name and description for sensitive type in multiple locales.
- idRef: Provide GUID for ExactMatch ID.
- Name & descriptions: customize as required.
In our rule package XML example below, we are referencing the pdm.xml example file from the previous step that creates the personal data schema XML:
- Datastore: The dataStore name references the schema file we created earlier: dataStore = "PatientRecords".
- idMatch: The idMatch value references a searchable field that is listed in the pdm.xml file we created earlier: idMatch matches = "SSN".
- Classification: The classification value references an existing or custom sensitive information type: classification = "U.S. Social Security Number (SSN)". (In this case, we use the existing sensitive information type of U.S. Social Security Number.)
Create a rule package in XML format (with Unicode encoding), like in the following example code. You can copy, modify, and use this example.
<RulePackage xmlns="http://schemas.microsoft.com/office/2020/pdm">
<RulePack id="fd098e03-1796-41a5-8ab6-198c93c62b21">
<Version build="0" major="2" minor="0" revision="0" />
<Publisher id="eb553734-8306-44b4-9ad5-c388ad970528" />
<Details defaultLangCode="en-us">
<LocalizedDetails langcode="en-us">
<PublisherName>IP DLP</PublisherName>
<Name>Health Care PDM Rulepack</Name>
<Description>This rule package contains the Personal Data Match sensitive type for health care sensitive types.</Description>
</LocalizedDetails>
</Details>
</RulePack>
<Rules>
<PrivacyMatch id = "E1CC861E-3FE9-4A58-82DF-4BD259EAB381" patternsProximity = "300" dataStore ="PatientRecords" recommendedConfidence = "65" >
<Pattern confidenceLevel="65">
<idMatch matches = "SSN" classification = "U.S. Social Security Number (SSN)" />
</Pattern>
<Pattern confidenceLevel="75">
<idMatch matches = "SSN" classification = "U.S. Social Security Number (SSN)" />
<Any minMatches ="3" maxMatches ="6">
<match matches="PatientID" />
<match matches="MRN"/>
<match matches="FirstName"/>
<match matches="LastName"/>
<match matches="Phone"/>
<match matches="DOB"/>
</Any>
</Pattern>
</PrivacyMatch>
<LocalizedStrings>
<Resource idRef="E1CC861E-3FE9-4A58-82DF-4BD259EAB381">
<Name default="true" langcode="en-us">Patient SSN Exact Match.</Name>
<Description default="true" langcode="en-us">PDM Sensitive type for detecting Patient SSN.</Description>
</Resource>
</LocalizedStrings>
</Rules>
</RulePackage>
Sensitive info types
The second step in setting up data matching is to create unique sensitive info types for the personal data match (PDM). Sensitive info types (SITs), are pattern-based classifiers that detect sensitive information like Social Security or credit card numbers. Setting up a PDM sensitive info type allows you to use exact data values rather than generic values to detect matches. To begin this step, select Create PDM sensitive info type to start the creation wizard.
Upload personal data
After defining the personal data schema and sensitive info types, the third step is to upload personal data. Go to the Personal data upload tab, select Add, and choose the personal schema that you defined in the first step, then upload the file containing the personal data.
You can upload this personal data by choosing a local file, or by supplying an SAS URL to an existing Microsoft Azure Storage location containing your personal data file. If you prepared a file as the first step in this process that conforms to the schema created, you can use that file for the upload.