Hi @curious7
What am I doing wrong and how can I achieve this?
You're on the right track but need to refine your regex and logic to avoid false positives, especially in email reply scenarios. Here’s a refined approach to help you achieve this:
Basic UPN Detection - To match standard UPNs in email format, we can use the following regex pattern: \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
This pattern matches typical email addresses (UPNs), ensuring that they follow the general format of ******@domain.com
.
Avoid Matching Within Angle Brackets (< >
) - To ensure that we don’t capture email addresses within angle brackets (such as when replying to emails), we can add negative lookahead and lookbehind assertions: (?<!<)\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b(?!>)
Here:
(?<!<)
ensures that the email is not preceded by a <
(i.e., not in reply format).
(?!>)
ensures that the email is not followed by a >
(i.e., not in a quoted reply format).
This effectively prevents matching email addresses like <******@domain.com>
.
Domain Whitelist Filtering - If you need to restrict matches to specific domains, you can extend the regex to only capture emails from your allowed domain list:
(?<!<)\b[A-Za-z0-9._%+-]+@(yourdomain1\.com|yourdomain2\.org|yourdomain3\.edu)\b(?!>)
This will match UPNs only from the specified domains (replace with your actual domains) and still avoid false positives from email replies.
Implementation in Microsoft Purview - In Microsoft Purview, you can use the above regex patterns in your sensitive information types (SITs) within DLP policies to identify UPNs being sent via email or shared documents. Additionally, if domain filtering is needed, you can either handle it within the regex (as shown above) or use a separate condition in the policy for more flexibility.
Testing and Validation - It's important to test the regex with various email formats to ensure it matches the desired UPNs and excludes those in reply format. This will help you fine-tune the solution based on your exact needs.
For more details:
- Sensitive information type REGEX validators and additional check
- Working with the RegEx engine
- Learn about using regular expressions (regex) in data loss prevention policies
Hope this helps. Do let us know if you have any further queries.