What REGEX can I use to detect a UPN being sent in Email/Shared Document in Onedrive/Sharepoint

Question

What REGEX can I use to detect a UPN being sent in Email/Shared Document in Onedrive/Sharepoint

curious7 271

In Microsoft Purview Information Protection I need to create a REGEX for a sensitive info type that will detect if a UPN is being sent in email or shared with external users in a document.

I created a primary element with following for single level (Eg - user@localhost) and 2 level domains (Eg - ******@domain.com):-
Single level- <?\w+?.?\w+@\w+>?

2 level- <?\w+?.?\w+@\w+.\w+>?

I have added Secondary element to match minimum of 1 domain from our domain list (keyword List).

And then another secondary element to not match following REGEX element (as I don't want to match something like this which is used when replying to any email "<******@domain.com":-
Single level- <\w+?.?\w+@\w+>

2 level- <\w+?.?\w+@\w+.\w+>

Also, I added additional checks for this because I don't want to catch email address in the format "<******@domain.com" while replying to any email:

"not start with" - "<"
"not ends with" - ">"

But if a user responds to external user then it still ends up catching the UPN inside the less than and greater than sign in the following string - "******@domain.com". Because "******@domain.com" will come up in all email replies to external user, so I don't want to catch it with the SIT.

What am I doing wrong and how can I achieve this? This SIT will be used inside DLP policy.

Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator

2025-02-19T17:04:39.14+00:00

@curious7 - We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Accepted answer

0 additional answers

Your answer

Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator

2025-02-19T17:04:39.14+00:00

@curious7 - We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Hi @curious7

What am I doing wrong and how can I achieve this?

You're on the right track but need to refine your regex and logic to avoid false positives, especially in email reply scenarios. Here’s a refined approach to help you achieve this:

Basic UPN Detection - To match standard UPNs in email format, we can use the following regex pattern: \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b

This pattern matches typical email addresses (UPNs), ensuring that they follow the general format of ******@domain.com.

Avoid Matching Within Angle Brackets (< >) - To ensure that we don’t capture email addresses within angle brackets (such as when replying to emails), we can add negative lookahead and lookbehind assertions: (?<!<)\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b(?!>)

Here:

(?<!<) ensures that the email is not preceded by a < (i.e., not in reply format).

(?!>) ensures that the email is not followed by a > (i.e., not in a quoted reply format).

This effectively prevents matching email addresses like <******@domain.com>.

Domain Whitelist Filtering - If you need to restrict matches to specific domains, you can extend the regex to only capture emails from your allowed domain list:

(?<!<)\b[A-Za-z0-9._%+-]+@(yourdomain1\.com|yourdomain2\.org|yourdomain3\.edu)\b(?!>)

This will match UPNs only from the specified domains (replace with your actual domains) and still avoid false positives from email replies.

Implementation in Microsoft Purview - In Microsoft Purview, you can use the above regex patterns in your sensitive information types (SITs) within DLP policies to identify UPNs being sent via email or shared documents. Additionally, if domain filtering is needed, you can either handle it within the regex (as shown above) or use a separate condition in the policy for more flexibility.

Testing and Validation - It's important to test the regex with various email formats to ensure it matches the desired UPNs and excludes those in reply format. This will help you fine-tune the solution based on your exact needs.

For more details:

Hope this helps. Do let us know if you have any further queries.

Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator

2025-02-20T21:01:05.55+00:00

@curious7 - Following up to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
curious7 271 Reputation points

2025-03-14T01:51:02.4633333+00:00

Hi Ganesh,
Thanks for you help. This worked great.

Share via

What REGEX can I use to detect a UPN being sent in Email/Shared Document in Onedrive/Sharepoint

0 additional answers

Your answer