ECMAScript vs. Canonical Matching Behavior
The behavior of ECMAScript and canonical regular expressions differs in three areas:
Character classes are specified differently in matching expressions. Canonical regular expressions support Unicode character categories by default. ECMAScript does not support Unicode.
A regular expression capture class with a backreference to itself must be updated with each capture iteration.
Ambiguities between octal escapes and backreferences are treated differently.
The following table summarizes the differences in octal versus backreference interpretation by canonical and ECMAScript regular expressions.
Canonical regular expression behavior | ECMAScript behavior |
---|---|
If \ is followed by 0 followed by 0 to 2 octal digits, interpret as an octal. For example, |
Same behavior. |
If \ is followed by a digit from 1 to 9, followed by no additional decimal digits, interpret as a backreference. For example, |
If a single decimal digit capture exists, backreference to that digit. Otherwise, interpret as a literal. |
If \ is followed by a digit from 1 to 9, followed by additional decimal digits, interpret the digits as a decimal value. If that capture exists, interpret the expression as a backreference. Otherwise, interpret the leading octal digits up to octal 377, that is, consider only the low 8 bits of the value; interpret the remaining digits as literals. For example, in the expression |
If \ is followed by a digit from 1 to 9, followed by any additional decimal digits, interpret as a backreference by converting as many digits as possible to a decimal value that can refer to a capture. If no digits can be converted, interpret as an octal using the leading octal digits up to octal 377; interpret the remaining digits as literals. |