Dial Plan Regular Expression Engine - Language Summary
A version of this page is also available for
4/8/2010
This section summarizes the regular expression language that is used to create Dial Plan rule patterns and formatting templates. A rule pattern is a regular expression which specifies the set of strings that a user, or network provided, string will be compared against to find a match. A formatting template specifies the format of the string to be sent to the SIP server or to be displayed to the user.
Operations
The parser recognizes the following operations in order of precedence:
Item | Description |
---|---|
Quantifiers |
A quantifier occurring immediately after an expression specifies the number of times the expression may occur. The following list shows the accepted quantifiers:
|
Concatenation |
Expressions placed directly next to each other indicates that they are to be concatenated to form a new expression. For example, placing the expression abc directly after the expression xyz creates a new expression xyzabc. |
Alternation |
Two expressions separated by a vertical bar specifies alternation. For example aba | bbc, (read "aba" or "bbc") can match the expression aba or the expression bbc. |
Grouping |
Parentheses are used to refine the scope and precedence of other operations and to make the boundaries of expressions explicit. For example (ab)* forces the star quantifier to apply to the whole expression ‘ab’ as opposed to just ‘b’. Parentheses may be nested within other groups. For example: (a(b|c)d)+ represents (abd | acd)+ |
Meta-Characters
To determine a match, the parser does a character by character comparison of the input string and the rule pattern. However, the parser recognizes a set of meta-characters which cause it to alter this behavior. For example, you can embed a meta-character in the pattern expression which will cause the parser to accept any numeric character at a specified position in the input string.
WildCard
The dot character ‘.’ Matches any single character. For example ab.de matches abcde, abdde, ab1de, ab3de,...,etc
Delimited Characters
Characters preceded by a forward-slash ‘\’ are known as delimited and have special meaning. The following list shows the delimited characters:
Item | Description |
---|---|
\d |
Matches any digit 0-9 |
\D |
Matches any non digit character |
\w |
Matches any word character; same as [a-zA-Z_0-9] |
\W |
Matches any non word character |
\s |
Matches any whitespace character (space, tab, newline, etc.) |
\S |
Matches any non whitespace character |
\t |
Matches a tab character |
\\, \(, \), \[, \., \+, \*, \? |
Matches the literal value of the character. For example: \+ matches the + character. |
Character Classes
Matches any character contained inside square brackets []. For example [abc] matches either "a", "b" or "c".
Item | Description |
---|---|
Literals |
All characters inside the square braces are treated as literals – meaning [\d] matches the two characters ‘\’ and ‘d’. This implies there can be no grouping or quantification inside the character class. Example: ab[cd]e matches "abce" or "abde" |
Negation |
The ^ character specified at the beginning of the character class, negates the whole value of the class. Meaning [^abc] matches every character except "abc". Example: a[^b]cd matches all strings of the form a#cd where # is not equal to "b" ("afcd", "accd", "azcd",...etc). |
Ranges |
A range is indicated by 2 characters separated by a ‘-‘ sign (e.g. a-z). One or more ranges can be specified inside the character class. This notation matches all characters within the range (inclusive). For example: a-z matches all lowercase characters and a-zA-CE-Z matches all lower case and uppercase characters, except for D |
Exceptions |
If you want to express the characters “[“ , “-“, or “]” as literal values you must place them at the beginning of the character class. For example: Correct: [-abc], []abc], [[abc] Incorrect:[abc[], [a-bc] |
Match Groups
Match Groups enable you to identify and extract portions of a matched string. A match group is an expression enclosed in a set of parentheses. For example, the expression ab(cd)ef(g*) contains two match groups "cd" and "g*".
Note
Nested Match Groups are not supported currently. While nested sets of parentheses are supported for precedence and scoping, these nested expressions inside a parent match group are all part of the same match group. For example: ab(cd(e|f)g)+ contains one match group: “cd(e|f)g”
A Match Group is referenced by a number which indicates its order, relative to other Match Groups, within a larger expression. In a dial plan rule the reference number is prefixed by a "\".
In the following example, the rule pattern contains 3 Match Groups - "\d{3}", "\d{3}" and "\d{4}". They are referenced in the dial and display templates as "\1", "\2", "\3" respectively. The table in the example illustrates the result of applying the rule to an input string that is a member of the set defined by the pattern. Assume $host$
is replaced by "157.54.9.161
"
<rule pattern='1\s*-?\s*(\d{3})\s*(\d{3})\s*-?\s*(\d{4})'
dial='sip:91\1\2\3@$host$'
display='1 (\1) \2-\3'
/>