Regular Expressions as a Language
The regular expression language is designed and optimized to manipulate text. The language comprises two basic character types: literal (normal) text characters and metacharacters. The set of metacharacters gives regular expressions their processing power.
You are probably familiar with the ? and * metacharacters used with the DOS file system to represent any single character or group of characters. The DOS file command COPY *.DOC A:
commands the file system to copy any file with a .DOC file name extension to the disk in drive A. The metacharacter * stands in for any file name in front of the file name extension .DOC. Regular expressions extend this basic idea many times over, providing a large set of metacharacters that make it possible to describe very complex text-matching expressions with relatively few characters.
For example, the regular expression \s2000
, when applied to a body of text, matches all occurrences of the string "2000" that are preceded by any white-space character, such as a space or a tab.
Note |
---|
If you are using C++, C#, or JScript, special escape characters, such as \s, must be preceded by an additional backslash (for example, " |
Regular expressions can also perform searches that are more complex. For example, the regular expression (?<char>\w)\k<char>
, using named groups and backreferencing, searches for adjacent paired characters. When applied to the string "I'll have a small coffee" it finds matches in the words "I'll", "small", and "coffee". (For details on this regular expression, see Backreferences.)
The following sections detail the set of metacharacters that define the .NET Framework regular expression language and show how to use the regular expression classes to implement regular expressions in your applications.
See Also
Reference
System.Text.RegularExpressions
Concepts
Other Resources
Details of Regular Expression Behavior
Regular Expression Examples
Regular Expression Language Elements