Regular expressions in mail flow rules in Exchange Online
Article
You can use regular expressions (RegEx) in conditions and exceptions in mail flow rules (also known as transport rules) to match text patterns in different parts of a message (for example, message headers, sender, recipients, subject, and the message body). Conditions and exceptions determine whether the action in the rule should be applied to an email message.
Note
Due to the variances in customer environments, Microsoft Customer Support Services (CSS) can't participate in the development or testing of custom regular expression scripts ("RegEx scripts"). For RegEX custom script development, testing, and debugging, Microsoft 365 customers will need to rely upon internal IT resources. Alternatively, Microsoft 365 customers may choose to use an external consulting resource such as Microsoft Consulting Services (MCS). Regardless of the script development resource, CSS support engineers aren't available to assist customers with custom RegEx script inquiries.
A simple expression is a specific value that you want to match exactly in a message. Conditions and exceptions using simple expressions match specific words or text strings. For example, a mail flow rule condition that looks for documents named Yearly Sales Forecast.docx.
A regular expression is a concise and flexible notation for finding patterns of text in a message. The notation consists of two basic character types:
Literal characters: Text that must exist in the target string. These characters are normal characters, as typed.
Metacharacters: One or more special characters that indicate how the text can vary in the target string.
You can use regular expressions to quickly parse email messages to find specific text patterns. Regular expressions enable you to detect messages with specific types of content, such as social security numbers (SSNs), patent numbers, and phone numbers.
You can't reasonably match variable data with a simple expression because a simple expression requires every possible variation of the value to detect. Matching a large number of simple expressions in message content can be resource intensive. Using regular expressions is more efficient. Instead of specifying all possible variations, you can configure the mail flow rule condition to search for a text pattern.
Carefully test regular expressions. A misconfigured regular expression could yield unexpected matches and cause unwanted mail flow rule behavior, including:
Undesirable actions on messages and message content.
Potential data loss.
Complex regular expressions might also affect mail flow performance. Test your regular expressions in a test environment before you implement them in production.
The following table lists the pattern strings that you can use to create a pattern-matching regular expression in Exchange Online:
Pattern String
Description
\S
The \S pattern string matches any single character that's not a space.
\s
The \s pattern string matches any single white-space character.
\D
The \D pattern string matches any non-numeric digit.
\d
The \d pattern string matches any single numeric digit.
\w
The \w pattern string matches any single Unicode character categorized as a letter or a decimal digit.
\W
The \W pattern string matches any single Unicode character not categorized as a letter or a decimal digit.
*
The asterisk ( * ) character matches zero or more instances of the previous character. For example, ab*c matches the following strings: ac, abc, and abbbbc.
|
The Pipe ( | ) character acts as an OR operator. For example, 'contoso|fabrikam' matches any instances of contoso or fabrikam.
( )
Parentheses act as grouping delimiters. For example, \a(bc)*\ matches the following strings: a, abc, abcbc, abcbcbc, and so on.
\
A backslash is used as an escape character before a special character. Special characters are characters used in pattern strings:
Backslash \
Pipe |
Asterisk *
Opening parenthesis (
Closing parenthesis )
Caret ^
Dollar sign $
For example, if you want to match a string that contains (525), use \(525\).
^
The caret ( ^ ) character indicates that the pattern string that follows the caret must exist at the start of the text string being matched. For example, ^fred@contoso matches fred@contoso.com and fred@contoso.co.uk but not alfred@contoso.com.
$
The dollar-sign ( $ ) character indicates that the preceding pattern string must exist at the end of the text string being matched. For example, contoso.com$ matches adam@contoso.com and kim@research.contoso.com but doesn't match kim@contoso.com.au.
Constructing regular expressions
By using the preceding table, you can construct a regular expression that matches the pattern of the data that you want to match:
Working from left to right, examine each character or group of characters in the data that you want to match.
Read the description of each pattern string to determine how it's applied to the data that you're matching.
Determine which pattern string in the table represents that character or group of characters, and add that pattern string to the regular expression.
Note
Regular expressions used in Transport Rules are NOT case sensitive.
The following example matches North American telephone numbers in the formats 425 555-0100 and 425.555.0100:
425(\s|.)\d\d\d(-|.)\d\d\d\d
You can expand on this example by adding the telephone format (425) 555-0100, which uses parentheses around the area code.
The following example matches all three telephone number formats.
\d\d\d((\s|.|-|\)|\)\s)\d\d\d(\s|.|-)\d\d\d\d
You can analyze the previous example as follows:
\d\d\d: Requires that exactly three numeric digits appear first.
((\s|.|-|\)|\)\s): Requires that a space, a period, or a hyphen exists after the three-digit number. Each character-matching string is contained in the grouping delimiters and is separated by the pipe character. This separation means that only one of the specified characters inside the grouping delimiters can exist in this location in the string being matched. For the separation between area code and the next three digits, it also looks for a closed parenthesis, or closed parenthesis and space.
\d\d\d: Requires that exactly three numeric digits appear next.
(\s|.|-): Requires that a space, a period, or a hyphen exists after the three-digit number.
\d\d\d\d: Requires that exactly four numeric digits appear next.
The previous example matches the following values:
(425)555.0100
425 555 0100
425 555 0100
(425) 555-0100
425-555-0100
(425) 555-0100
Creating a mail flow that uses a regular expression
The following example creates a mail flow rule in Exchange Online PowerShell that uses regular expressions to match SSNs in the subject of an email message:
New-TransportRule -Name "Social Security Number Block Rule" -SubjectOrBodyMatchesPatterns '\d\d\d-\d\d-\d\d\d\d' -RejectMessageEnhancedStatusCode "5.7.1" -RejectMessageReasonText "This message has been rejected because of content restrictions"
For detailed syntax and parameter information, see New-TransportRule.
The following example shows the new mail flow rule:
Get-TransportRule "Social Security Number Block Rule" | Format-List
For detailed syntax and parameter information, see Get-TransportRule.
This module examines how to manage Safe Attachments in your Microsoft 365 tenant by creating and configuring policies and using transport rules to disable a policy from taking effect in certain scenarios. MS-102