다음을 통해 공유


C#: Discussing Regular Expression Special Characters

 

Introduction

We often use Regular Expressions but ignore the importance of special characters.

The following are special characters when working with Regular Expressions and will be discussed through-out the article.

. $ ^ { [ (  ) * + ?

Matching any character with a dot, the Period (.)

The full stop or period character (.) is is a wildcard that will match any character except a new line (n). 

For example if we wanted to match the "g" character followed by any two characters as in the following:


Text: gau shu gnt cow
 
Regex: g..
 
Matches: gau shu gnt cow
 
gau
 
gnt

If the Single-line option is enabled, a dot matches any character including the new line character. 

Matching word characters, the Word sign [w] 

A Backslash and a lowercase "w" (w) is a character class that will match any word character.  

The following Regular Expression matches "a" followed by two word characters.

Text: abc anaconda ant cow apple  
 
Regex: aww  
 
Matches: abc anaconda ant cow apple  
 
abc  
 
ana  
 
ant  
 
app 

A Backslash and and uppercase "W" (W) will match any non-word character. 

Matching white space, the Space sign [s] 

White space can be matched using s (backslash and "s").

The following Regular Expression matches the letter "a" followed by two word characters then a white space character.

Text: "abc anaconda ant" 
 
Regex: awws  
 
Matches:  
 
"abc "

Note that ant was not matched since it is not followed by a white space character. White space is defined as the space character, a new line (n), form feed (f), carriage return (r), tab (t) or vertical tab (v). Be careful using s since it can lead to unexpected behavior by matching line breaks (n and r). 

Sometimes it is better to explicitly specify the characters to match instead of using s, for example to match a Tab and Space use [tx0020].

Matching digits, the Digit sign [s]

The digits zero to nine can be matched using d (backslash and lowercase "d"). For example, the following Regular Expression matches any three digits in a row.

Text: 123 12 843 8472  
 
Regex: ddd  
 
Matches: 123 12 843 8472  
 
123  
 
843  
 
847  

Matching sets of single characters – The Square-Brackets sign [( )] 

The square brackets are used to specify a set of single characters to match. Any single character within the set will match. For example, the following Regular Expression matches any three characters where the first character is either "d" or "a".

Text: abc def ant cow  
 
Regex: [da]..  
 
Matches: abc def ant cow  
 
abc  
 
def  
 
ant  

The caret (^) can be added to the start of the set of characters to specify that none of the characters in the character set should be matched. The following Regular Expression matches any three characters where the first character is not "d" and not "a".

Text: abc def ant cow  
 
Regex: [^da]..  
 
Matches:  
 
 "bc " 
 
 "ef " 
 
"nt " 
 
"cow"  

Matching ranges of characters, the Hyphen sign [-]  

Ranges of characters can be matched using the hyphen (-). The following Regular Expression matches any three characters where the second character is either "a", "b", "c" or "d".

Text: abc pen nda uml  
 
Regex: .[a-d].  
 
Matches: abc pen nda uml  
 
abc  
 
nda 

Ranges of characters can also be combined together. The following Regular Expression matches any of the characters from "a" to "z" or any digit from "0" to "9" followed by two word characters.

Text: abc no 0aa i8i  
 
Regex: [a-z0-9]ww  
 
Matches: abc no 0aa i8i  
 
abc  
 
0aa  
 
i8i 

The pattern could be written more simply as [a-zd].

Specify the number of times to match using the quantifiers the Plus Sign and the Asterisk (+ and *).

Quantifiers let you specify the number of times that an expression must match. The most frequently used quantifiers are the Asterisk character (*) and the Plus Sign (+).

Matching zero or more times with an Asterisk (*) 

The Asterisk tells the Regular Expression to match the character, group, or character class that immediately precedes it zero or more times. This means that the character, group, or character class is optional, it can be matched but it does not need to match. The following Regular Expression matches the character "a" followed by zero or more word characters.

Text: Anna Jones and a friend owned an anaconda  
 
Regex: aw*  
 
Options: IgnoreCase  
 
Matches: Anna Jones and a friend owned an anaconda  
 
Anna  
 
and  
 
a  
 
an  
 
anaconda

Matching one or more times with a Plus (+) 

The plus sign tells the Regular Expression to match the character, group, or character class that immediately precedes it one or more times. This means that the character, group, or character class must be found at least once. After it is found once it will be matched again if it follows the first match. The following Regular Expression matches the character "a" followed by at least one word character.

Text: Anna Jones and a friend owned an anaconda  
 
Regex: aw+  
 
Options: IgnoreCase  
 
Matches: Anna Jones and a friend owned an anaconda  
 
Anna  
 
and  
 
an  
 
anaconda  

Note that “a” was not matched since it is not followed by any word characters.

Matching zero or one times with a Question Mark (?) 

To specify an optional match use the question mark (?). The question mark matches zero or one times. The following Regular Expression matches the character "a" followed by "n" then optionally followed by another "n".

Text: Anna Jones and a friend owned an anaconda  
 
Regex: an?  
 
Options: IgnoreCase  
 
Matches: Anna Jones and a friend owned an anaconda  
 
An  
 
a  
 
an  
 
a  
 
an  
 
an
 
a  
 
a  

Specifying the number of matches

The minimum number of matches required for a character, group, or character class can be specified with the curly brackets ({n}). The following Regular Expression matches the character "a" followed by a minimum of two "n" characters. There must be two "n" characters for a match to occur.

Text: Anna Jones and Anne owned an anaconda  
 
Regex: an{2}  
 
Options: IgnoreCase  
 
Matches: Anna Jones and Anne owned an anaconda  
 
Ann  
 
Ann

A range of matches can be specified by curly brackets with two numbers inside ({n,m}). The first number (n) is the minimum number of matches required, the second (m) is the maximum number of matches permitted. This Regular Expression matches the character "a" followed by a minimum of two "n" characters and a maximum of three "n" characters. 

Text: Anna and Anne lunched with an anaconda annnnnex  
 
Regex: an{2,3}  
 
Options: IgnoreCase  
 
Matches: Anna and Anne lunched with an anaconda annnnnex  
 
Ann  
 
Ann  
 
annn  

The Regex stops matching after the maximum number of matches has been found.

Matching the start and end of a string 

To specify that a match must occur at the beginning of a string use the caret character (^). For example, we want a Regular Expression pattern to match the beginning of the string followed by the character "a".

Text: an anaconda ate Anna Jones  
 
Regex: ^a  
 
Matches: an anaconda ate Anna Jones  
 
"a" at position 1  

The pattern above only matches the a in “an”. Note that the caret (^) has different behavior when used inside the square brackets. If the Multi-line option is on, the caret (^) will match the beginning of each line in a multi-line string rather than only the start of the string. To specify that a match must occur at the end of a string use the dollar character ($). If the Multi-line option is on then the pattern will match at the end of each line in a multi-line string. This Regular Expression pattern matches the word at the end of the line in a multi-line string.

Text: "an anaconda  
 
ate Anna  
 
Jones"  
 
Regex: w+$  
 
Options: Multiline, IgnoreCase  
 
Matches:  
 
Jone  

Above, we discussed special characters in regular expressions and how to use them.

Other Languages

This Wiki Article is also available in following languages: