Create custom sensitive information types in the Microsoft Purview compliance portal

If the preconfigured sensitive information types (SITs) don't meet your needs, you can create and define customized SITs that meet your needs. You can also copy and then edit a built-in SIT.

The custom SITs are added to the Microsoft.SCCManaged.CustomRulePack rule package.

There are two methods for creating a new SIT:

Tip

If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview compliance portal trials hub. Learn details about signing up and trial terms.

Before you begin

Important

Microsoft Customer Service & Support can't assist with creating custom classifications or regular expression patterns. Support engineers can provide limited support for the feature, such as, providing sample regular expression patterns for simulation purposes, or helping to troubleshoot an existing regular expression pattern that's not triggering as expected. However, they can't provide assurances that any custom content-matching development will fulfill your requirements or obligations.

Create a custom SIT from scratch

Note

Microsoft Purview supports creating custom SITs that use double-byte character languages, such as Chinese, Japanese, and Korean. Because these languages do not use delimiters the way that single-byte languages do, Purview adds a space between each word in languages that use double-byte characters. It also removes special characters, such as punctuation.

Use the following procedure to fully define a brand new sensitive information type.

  1. In the Microsoft Purview compliance portal, navigate to Data classification > Classifiers > Sensitive info types and choose Create sensitive info type.

  2. Fill in values for Name and Description and choose Next.

  3. Choose Create pattern. You can create multiple patterns, each with different elements and confidence levels, as you define your new sensitive information type.

  4. Choose the default confidence level for the pattern. The values are Low confidence, Medium confidence, and High confidence.

  5. Choose and define the Primary element. The primary element can be a Regular expression with an optional validator, a Keyword list, a Keyword dictionary, or one of the pre-configured Functions. For more information on the SIT functions used for data loss prevention, see Sensitive information type functions. For more information on the date and the checksum validators, see Sensitive Information Type regular expression validators.

  6. Fill in a value for Character proximity.

  7. (Optional) Add supporting elements if you have any. Supporting elements can be a regular expression with an optional validator, a keyword list, a keyword dictionary or one of the predefined functions. Supporting elements can have their own Character proximity configuration.

  8. (Optional) Add any additional checks from the list of available checks.

  9. Choose Create.

  10. Choose Next.

  11. Choose the recommended confidence level for this sensitive information type.

  12. Check your settings and choose Save.

    Important

    Microsoft 365 uses the search crawler to identify and classify sensitive information in SharePoint and OneDrive sites. To identify your new custom sensitive information type in existing content, the content must be re-crawled. Content is crawled based on a schedule, but you can manually re-crawl content for a site collection, list, or library. For more information, see Manually request crawling and re-indexing of a site, a library or a list.

  13. The Sensitive info types tab of the Classifiers page, lists all of the sensitive information types. Choose Refresh and then or use the search tool or browse the list to find your new SIT.

Copy and modify an existing SIT

The procedure that follows explains how to copy and modify an existing SIT using the Compliance Portal.

Alternatively, you can copy and modify custom SITs using PowerShell and leveraging Purview's Exact Data Match (EDM) capabilities. To learn more about those methods, see:

Note

These SITs can't be copied:

  • Canada driver's license number
  • EU driver's license number
  • EU national identification number
  • EU passport number
  • EU social security number or equivalent identification
  • EU tax identification number
  • International classification of diseases (ICD-10-CM)
  • International classification of diseases (ICD-9-CM)
  • U.S. driver's license number

Copy and modify an existing SIT using the compliance portal

  1. In the compliance portal, navigate to Data classification > Classifiers > Sensitive info types and select the sensitive information type that you want to copy.

  2. The overview page for the sensitive information type opens. Choose Copy. When the copy is ready, a message stating that the copy was created appears with an option to edit it. Choose Yes.

  3. Give your new sensitive information type a new Name and Description.

  4. You can choose to create a new pattern, or edit or remove some or all of the existing patterns.

    1. To create a new pattern, choose Create.
    2. To edit an existing pattern, choose the Edit (pencil) icon next to the pattern you want to change.
    3. To remove a pattern, choose the Delete icon next to the pattern you want to remove.
  5. When creating or editing a pattern, choose the default confidence level for the pattern. The values are Low confidence, Medium confidence, and High confidence.

  6. Choose and define Primary element. The primary element can be a Regular expression, a Keyword list, a Keyword dictionary, or one of the preconfigured Functions. See, Sensitive information type functions.

  7. Fill in a value for Character proximity.

  8. (Optional) If you have Supporting elements or any additional checks you want to run, add them. If needed, you can organize your Supporting elements into groups.

  9. If you're creating a new pattern, choose Create. If you are editing an existing pattern, choose Update.

  10. Choose Next.

  11. Confirm the confidence level selection for this sensitive information type and then choose Next.

  12. Review your settings and then choose Save.

  13. Your new sensitive information type is created. At the confirmation message, choose *Done

Simulate the effects of a sensitive information type

You can test the effects of any sensitive information type in the list. We suggest that you run a simulation for each sensitive information type that you create before using it in a policy.

  1. Prepare two files, for example, two Word documents. One should have content that matches the elements you specified in your sensitive information type. The other should have content that doesn't match.

  2. In the compliance portal, navigate to Data classification > Classifiers > Sensitive info types and choose the sensitive information type from the list to open the details pane. Choose Simulate.

  3. Upload a file and choose Simulate. (You can only upload and run a simulation for one file at a time.)

  4. On the Match results page, review the results and choose Finish.

Permissions and role groups

The account you use to test how a sensitive information type performs must be a member of one of the following role groups:

  • Primary role groups

    • Compliance Administrator
    • Compliance Data Administrator
    • Security Administrator
  • Secondary role groups

    • Communication Compliance Admins
    • Information Protection Admins
    • Information Protection Investigators
    • Organization Management

Enable access to simulation mode for primary role groups

Applies to:

  • Compliance Administrator
  • Compliance Data Administrator
  • Security Administrator

To enable access to the simulation feature for these role groups, you must assign each user to the appropriate group via Microsoft Entra ID (formerly Azure Active Directory).

You can assign these role groups either through the Microsoft Purview compliance portal, or through the Microsoft 365 Admin Center.

Assign role groups via the Microsoft Purview compliance portal

Prerequisite

Assign role groups for a new user.

Assign role groups

  1. Navigate to Roles & Scopes > Permissions > Microsoft Entra > Roles.
  2. Select the Admin role from the list of roles and then choose Manage members in Microsoft Entra in the flyout pane.
  3. Select the intended user from the list of users and then choose Assigned roles in the left navigation pane.
  4. In the central pane, choose Add assignments.
  5. Search for and select the appropriate role.
  6. Choose Add.

Assign role groups via the Microsoft 365 Admin Center

Assign role groups for a new user

  1. Navigate to Roles > Role assignments > Microsoft Entra .
  2. Select New user and then Create new user.
  3. Work through the New User wizard.
  4. On the Assignments page, choose + Add role.
  5. Search for and select the appropriate role.
  6. Choose Next: Review + create.
  7. Choose Create.

Assign role groups for an existing user

  1. Navigate to Roles > Role assignments > Microsoft Entra.
  2. Select the Admin role from the list of roles and then choose Assigned tab in the flyout pane.
  3. Choose Add users.
  4. Select the intended user from the list of users and then choose Add.

Enable access to simulation mode for secondary users

Applies to:

  • Communication Compliance Admins
  • Information Protection Admins
  • Information Protection Investigators
  • Organization Management

To enable access to the simulation feature for these role groups, you must assign each user permission through both the compliance portal and the Microsoft 365 Admin Center.

Assign role groups via the compliance portal

  1. Navigate to Roles & Scopes > Permissions > Microsoft Purview Solutions > Roles.
  2. Select the admin role from the list of roles and then choose Edit in the flyout pane.
  3. Select Choose users.
  4. Select the intended user from the list of users and then choose Select in the flyout pane.
  5. Choose Next and then Save.

Assign role groups via the Microsoft 365 Admin Center

  1. Navigate to Roles > Role assignments > Exchange.
  2. Select the admin role from the list of roles and then choose the Assigned tab in the flyout pane.
  3. Choose Add.
  4. Select the intended user from the list of users and then choose Add.

Note

Microsoft Purview information protection supports double byte character set languages for:

  • Chinese (simplified)
  • Chinese (traditional)
  • Korean
  • Japanese

This support is available for sensitive information types. For more information, seeInformation protection support for double byte character sets release notes (preview).

Tip

To detect patterns containing Chinese/Japanese characters and single byte characters, or to detect patterns containing Chinese/Japanese and English, define two variants of the keyword or regex.

  • For example, to detect a keyword like "机密的document", use two variants of the keyword; one with a space between the Japanese and English text and another without a space between the Japanese and English text. So, the keywords to be added in the SIT should be "机密的 document" and "机密的document". Similarly, to detect a phrase "東京オリンピック2020", two variants should be used; "東京オリンピック 2020" and "東京オリンピック2020".

Along with Chinese/Japanese/double byte characters, if the list of keywords/phrases also contain non Chinese/Japanese words also (for instance, English only), creating two dictionaries/keyword lists is recommended. Create one for keywords containing Chinese/Japanese/double byte characters and another for English-only.

  • For example, if you want to create a keyword dictionary/list with three phrases "Highly confidential", "機密性が高い", and "机密的document", you should create two keyword lists.
    1. Highly confidential
    2. 機密性が高い, 机密的document and 机密的 document

While creating a regex using a double byte hyphen or a double byte period, make sure to escape both the characters in the same way that you would escape a hyphen or period in a regex. Here is a sample regex for reference:

(?<!\d)([4][0-9]{3}[\-?\-\t]*[0-9]{4})

Double-byte special characters should not be used in the keyword.

We recommend using a string match instead of a word match in a keyword list.