@Divya K Nair Yes, you can configure a custom sensitive information type (SIT) in Microsoft Purview to detect the presence of multiple specific keywords in close proximity within a document. Here's a brief explanation: Primary Element: This is the main keyword that the SIT is looking for. It can be a regular expression with or without a checksum validation, a keyword list, a keyword dictionary, or a function. Supporting Elements: These are additional keywords that serve as supporting evidence to increase confidence in the match. They can also be a regular expression, keyword list, or keyword dictionary. Confidence Level and Proximity: Confidence levels (high, medium, low) reflect how much supporting evidence is detected along with the primary element. If an item contains more supporting evidence, it increases the confidence that a matched item contains the sensitive info you're looking for. Proximity defines the number of characters between the primary and supporting elements.
Here's a general example of what this might look like in practice:
{
"name": "CustomSIT",
"description": "Detects if all three keywords are present",
"rulePackage": "RulePackageId",
"patterns": [
{
"pattern": "keyword1",
"supportingElements": [
{"pattern": "keyword2", "proximity": 300, "confidenceLevel": "high"},
{"pattern": "keyword3", "proximity": 300, "confidenceLevel": "high"}
]
}
]
}
In this example, "keyword1" is the primary element, and "keyword2" and "keyword3" are the supporting elements. The proximity value of 300 means that these keywords should be within 300 characters of each other to be considered a match.
Remember to replace RulePackageId
with the actual ID of your rule package, and keyword1
, keyword2
, and keyword3
with your actual keywords.
Please note that you cannot perform any operations on the keyword list itself. You can only specify the proximity in which multiple keywords should appear to one another in order to trigger detection.
Let me know if you have any more questions!