Regular Expressions (Visual Studio)

Статья
02/04/2013

Regular expressions are a concise and flexible notation for finding and replacing patterns of text. You can use a specific set of regular expressions in the Find what and Replace with fields of the Find and Replace Window when you perform Quick Find, Find in Files, Quick Replace, or Replace in Files operations.

To enable regular expressions, expand Find options in the Find and Replace window, select Use, and then select Regular expressions. The triangular Expression Builder buttons next to the Find what and Replace with fields become available. Click the button to display a list of frequently used regular expressions. When you click a regular expression on the list, it is inserted at the cursor location in the Find what or Replace with fields. When you click Complete Character List at the bottom of the Expression Builder, a Help topic appears. The topic contains all the regular expressions that are recognized by Visual Studio Find and Replace. You can copy a regular expression in the topic and then paste it in the Find what or Replace with fields.

Note

There are many syntax differences between the regular expressions that can be used in Find what and Replace with and those that are valid in .NET Framework programming. For example, in the Find and Replace window, braces {} are used for tagging expressions to be replaced: to change every occurrence of doesn't to does not, you would use the find expression {does}n't and the replace expression \1 not.

Regular Expressions for Find and Replace

Frequently used regular expressions that appear in the Expression Builder are as follows.

Expression	Syntax	Description	Example
Any character	.	Matches any single character except a line break.	a.o matches "aro" in "around" and "abo" in "about" but not "acro" in "across".
Zero or more	*	Matches zero or more occurrences of the preceding expression, and makes all possible matches.	ab matches "b" in "bat" and "ab" in "about". e.e matches the word "enterprise".
One or more	+	Matches at least one occurrence of the preceding expression.	ac+ matches words that contain the letter "a" and at least one instance of "c", such as "race", and "ace". a.+s matches the word "access".
Beginning of line	^	Anchors the match string to the beginning of a line.	^car matches the word "car" only when it appears as the first set of characters in a line of the editor.
End of line	$	Anchors the match string to the end of a line.	end$ matches the word "end" only when it appears as the last set of characters possible at the end of a line in the editor.
Beginning of word	<	Matches only when a word starts at this point in the text. In this context, The set [a-zA-Z0-9_] is considered to be a word. Any other character between the combinations of [a-zA-Z0-9_] is treated as a word separator.	<in matches words such as "inside" and "into" that begin with the letters "in".
End of word	>	Matches only when a word ends at this point in the text.	ss> matches words such as "across" and "loss" that end with the letters "ss".
Line break	\n	Matches an operating system-independent line break. In a Replace expression, inserts a line break.	End\nBegin matches the word "End" and "Begin" only when "End" is the last string in a line and "Begin" is the first string in the next line. In a Replace expression, Begin\nEnd replaces the word "End" with "Begin" on the first line, inserts a line break, and then replaces the word "Begin" with the word "End".
Any one character in the set	[]	Matches any one of the characters in the []. To specify a range of characters, list the starting and ending characters separated by a dash (-), as in [a-z].	be[n-t] matches "bet" in "between", "ben" in "beneath", and "bes" in "beside" but not "bel" in "below".
Any one character not in the set	[^...]	Matches any character that is not in the set of characters that follows the ^.	be[^n-t] matches "bef" in "before", "beh" in "behind", and "bel" in "below", but not "ben" in "beneath".
Or	\|	Matches either the expression before or the one after the OR symbol (\|). Mostly used in a group.	(sponge\|mud) bath matches "sponge bath" and "mud bath."
Escape	\	Matches the character that follows the backslash (\) as a literal. This lets you find the characters that are used in regular expression notation, such as { and ^.	\^ searches for the ^ character.
Tagged expression (or backreference)	{}	Uses the text that is inside the braces to identify locations where text is to be replaced.	{does}n't identifies the text that precedes the replacement in the replace string \1 not to change every occurrence of doesn't to does not.
C/C++ Identifier	:i	Shorthand for the expression ([a-zA-Z_$][a-zA-Z0-9_$]*).	Matches any possible C/C++ identifier.
Quoted string	:q	Shorthand for the expression (("[^"]")\|('[^']')), which matches all characters that are enclosed in double or single quotation marks, and also the quotation marks themselves.	:q matches "test quote" and 'test quote' but not the 't of can't.
Space or Tab	:b	Matches either space or tab characters.	Public:bInterface matches the phrase "Public Interface" in text.
Integer	:z	Shorthand for the expression ([0-9]+), which matches any combination of numeric characters.	Matches any integer, such as "1", "234", "56", and so on.

The list of all regular expressions that are valid in Find and Replace operations is longer than can be displayed in the Expression Builder. Although the following regular expressions do not appear in the Expression Builder, you can use them in the Find what or Replace with fields.

Expression	Syntax	Description	Example
Minimal, zero or more	@	Matches zero or more occurrences of the preceding expression, and matches as few characters as possible.	e.@e matches "ente" and "erprise" in "enterprise", but not the full word "enterprise".
Minimal, one or more	#	Matches one or more occurrences of the preceding expression, and matches as few characters as possible.	ac# matches words that contain the letter "a" and at least one instance of "c", such as "ace". a.#s matches "acces" in the word "access".
Repeat n times	^n	Matches n occurrences of the preceding expression.	[0-9]^4 matches any 4-digit sequence.
Grouping	()	Lets you group a set of expressions together, for example to apply a quantifier (* or +).	If you want to search for one or more occurrences of "az", use (az)+.
nth tagged text	\n	In a Find or Replace expression, indicates the text that is matched by the nth tagged expression, where n is a number from 1 to 9. In a Replace expression, \0 inserts the complete matched text.	If you search for a{[0-9]} and replace with \1, all occurrences of "a" followed by a digit are replaced by the digit it follows. For example, "a1" is replaced by "1" and similarly "a2" is replaced by "2".
Right-justified field	\(w,n)	In a Replace expression, right-justifies the nth tagged expression in a field at least w characters wide.	If you search for a{[0-9]} and replace with \(10,1), the occurrences of "an" are replaced by the integer and right-justified by 10 spaces.
Left-justified field	\(-w,n)	In a Replace expression, left-justifies the nth tagged expression in a field at least w characters wide.	If you search for a{[0-9]} and replace with \(-10,1), the occurrences of "an" are replaced by the integer and left-justified by 10 spaces.
Prevent match	~(X)	Prevents a match when X appears at this point in the expression.	real~(ity) matches the "real" in "realty" and "really," but not the "real" in "reality." It also matches the second "real" (but not the first "real" in "realityreal".
Alphanumeric character	:a	Matches the expression ([a-zA-Z0-9]).	Matches any alphanumeric character, such as "a", "A", "w", "W", "5", and so on.
Alphabetic character	:c	Matches the expression ([a-zA-Z]).	Matches any alphabetical character, such as "a", "A", "w", "W", and so on.
Decimal digit	:d	Matches the expression ([0-9]).	Matches any digit, such as "4" and "6".
Hexadecimal digit	:h	Matches the expression ([0-9a-fA-F]+).	Matches any hexadecimal number, such as "1A", "ef", and "007".
Rational number	:n	Matches the expression (([0-9]+.[0-9])\|([0-9].[0-9]+)\|([0-9]+)).	Matches any rational number, such as "2007", "1.0", and ".9".
Alphabetic string	:w	Matches the expression ([a-zA-Z]+).	Matches any string that contains only alphabetical characters.
Escape	\e	Unicode U+001B.	Matches the "Escape" control character.
Bell	\g	Unicode U+0007.	Matches the "Bell" control character.
Backspace	\h	Unicode U+0008.	Matches the "Backspace" control character.
Tab	\t	Unicode U+0009.	Matches a tab character.
Unicode character	\x#### or \u####	Matches a character given by Unicode value where #### is hexadecimal digits. You can specify a character that is outside the Basic Multilingual Plane (that is, a surrogate) with the ISO 10646 code point or with two Unicode code points that give the values of the surrogate pair.	\u0065 matches the character "e".

The following table lists the two-letter abbreviations that are used to specify the General categories that are listed in the Unicode character properties database. You can use these abbreviations as part of a regular expression character set. For example, the expression [:Nd:Nl:No] matches any kind of digit.

For more information about the Unicode character properties database, see Unicode Standard 5.0 Character Properties.

Expression	Syntax	Description
Uppercase letter	:Lu	Matches any one uppercase letter. For example: :Luhe matches "The" but not "the".
Lowercase letter	:Ll	Matches any one lowercase letter. For example: :Llhe matches "the" but not "The".
Title case letter	:Lt	Matches characters that combine an uppercase letter with a lowercase letter, for example, Nj and Dz.
Modifier letter	:Lm	Matches letters or punctuation, such as commas, cross accents, and double prime, that are used to indicate modifications to the preceding letter.
Other letter	:Lo	Matches other letters, such as gothic letter ahsa.
Decimal digit	:Nd	Matches decimal digits, such as 0-9 and their full-width equivalents.
Letter digit	:Nl	Matches letter digits, such as roman numerals and ideographic number zero.
Other digit	:No	Matches other digits, such as old italic number one.
Open punctuation	:Ps	Matches opening punctuation, such as open brackets and braces.
Close punctuation	:Pe	Matches closing punctuation, such as closing brackets and braces.
Initial quote punctuation	:Pi	Matches initial double quotation marks.
Final quote punctuation	:Pf	Matches single quotation marks and ending double quotation marks.
Dash punctuation	:Pd	Matches the dash mark.
Connector punctuation	:Pc	Matches the underscore or underline mark.
Other punctuation	:Po	Matches (,), ?, ", !, @, #, %, &, *, \, (:), (;), ', and /.
Space separator	:Zs	Matches blanks.
Line separator	:Zl	Matches the Unicode character U+2028.
Paragraph separator	:Zp	Matches the Unicode character U+2029.
Non-spacing mark	:Mn	Matches non-spacing marks.
Combining mark	:Mc	Matches combining marks.
Enclosing mark	:Me	Matches enclosing marks.
Math symbol	:Sm	Matches +, =, ~, \|, <, and >.
Currency symbol	:Sc	Matches $ and other currency symbols.
Modifier symbol	:Sk	Matches modifier symbols, such as circumflex accent, grave accent, and macron.
Other symbol	:So	Matches other symbols, such as the copyright sign, pilcrow sign, and the degree sign.
Other control	:Cc	Matches Unicode control characters such as TAB and NEWLINE.
Other format	:Cf	Formatting control character, such as the bi-directional control characters.
Surrogate	:Cs	Matches half of a surrogate pair.
Other private-use	:Co	Matches any character from the private-use area.
Other not assigned	:Cn	Matches characters that do not map to a Unicode character.

In addition to the standard Unicode character properties, the following properties may also be specified as part of a character set.

Expression	Syntax	Description
Alpha	:Al	Matches any one character. For example, :Alhe matches words such as "The", "then", and "reached".
Numeric	:Nu	Matches any one number or digit.
Punctuation	:Pu	Matches any one punctuation mark, such as ?, @, ', and so on.
White space	:Wh	Matches all kinds of white space, such as publishing and ideographic spaces.
Bidi	:Bi	Matches characters from right-to-left scripts, such as Arabic and Hebrew.
Hangul	:Ha	Matches Korean Hangul and combining Jamos.
Hiragana	:Hi	Matches hiragana characters.
Katakana	:Ka	Matches katakana characters.
Ideographic/Han/Kanji	:Id	Matches ideographic characters, such as Han and kanji.

Поделиться через

Regular Expressions (Visual Studio)

Regular Expressions for Find and Replace

See Also

Reference

Other Resources

Дополнительные ресурсы