System.Text.RegularExpressions.Regex class

This article provides supplementary remarks to the reference documentation for this API.

The Regex class represents .NET's regular expression engine. You can use this class to:

  • Quickly parse large amounts of text to find specific character patterns.
  • Extract, edit, replace, or delete text substrings.
  • Add the extracted strings to a collection to generate a report.

Note

If you want to validate a string by determining whether it conforms to a particular regular expression pattern, you can use the System.Configuration.RegexStringValidator class.

To use regular expressions, you define the pattern that you want to identify in a text stream by using the syntax documented in Regular expression language - quick reference. Next, you can optionally instantiate a Regex object. Finally, you call a method that performs some operation, such as replacing text that matches the regular expression pattern, or identifying a pattern match.

For more information about the regular expression language, see Regular expression language - quick reference or download and print one of these brochures:

Quick Reference in Word (.docx) format Quick Reference in PDF (.pdf) format

Regex vs. String methods

The System.String class includes several search and comparison methods that you can use to perform pattern matching with text. For example, the String.Contains, String.EndsWith, and String.StartsWith methods determine whether a string instance contains a specified substring; and the String.IndexOf, String.IndexOfAny, String.LastIndexOf, and String.LastIndexOfAny methods return the starting position of a specified substring in a string. Use the methods of the System.String class when you are searching for a specific string. Use the Regex class when you are searching for a specific pattern in a string. For more information and examples, see .NET Regular Expressions.

Static vs. instance methods

After you define a regular expression pattern, you can provide it to the regular expression engine in either of two ways:

  • By instantiating a Regex object that represents the regular expression. To do this, you pass the regular expression pattern to a Regex constructor. A Regex object is immutable; when you instantiate a Regex object with a regular expression, that object's regular expression cannot be changed.

  • By supplying both the regular expression and the text to search to a static (Shared in Visual Basic) Regex method. This enables you to use a regular expression without explicitly creating a Regex object.

All Regex pattern identification methods include both static and instance overloads.

The regular expression engine must compile a particular pattern before the pattern can be used. Because Regex objects are immutable, this is a one-time procedure that occurs when a Regex class constructor or a static method is called. To eliminate the need to repeatedly compile a single regular expression, the regular expression engine caches the compiled regular expressions used in static method calls. As a result, regular expression pattern-matching methods offer comparable performance for static and instance methods. However, caching can adversely affect performance in the following two cases:

  • When you use static method calls with a large number of regular expressions. By default, the regular expression engine caches the 15 most recently used static regular expressions. If your application uses more than 15 static regular expressions, some regular expressions must be recompiled. To prevent this recompilation, you can increase the Regex.CacheSize property.

  • When you instantiate new Regex objects with regular expressions that have previously been compiled. For example, the following code defines a regular expression to locate duplicated words in a text stream. Although the example uses a single regular expression, it instantiates a new Regex object to process each line of text. This results in the recompilation of the regular expression with each iteration of the loop.

    StreamReader sr = new StreamReader(filename);
    string input;
    string pattern = @"\b(\w+)\s\1\b";
    while (sr.Peek() >= 0)
    {
       input = sr.ReadLine();
       Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
       MatchCollection matches = rgx.Matches(input);
       if (matches.Count > 0)
       {
          Console.WriteLine("{0} ({1} matches):", input, matches.Count);
          foreach (Match match in matches)
             Console.WriteLine("   " + match.Value);
       }
    }
    sr.Close();
    
    Dim sr As New StreamReader(filename)
    Dim input As String
    Dim pattern As String = "\b(\w+)\s\1\b"
    Do While sr.Peek() >= 0
       input = sr.ReadLine()
       Dim rgx As New Regex(pattern, RegexOptions.IgnoreCase)
       Dim matches As MatchCollection = rgx.Matches(input)
       If matches.Count > 0 Then
          Console.WriteLine("{0} ({1} matches):", input, matches.Count)
          For Each match As Match In matches
             Console.WriteLine("   " + match.Value)
          Next   
       End If
    Loop
    sr.Close()
    

    To prevent recompilation, you should instantiate a single Regex object that is accessible to all code that requires it, as shown in the following rewritten example.

    StreamReader sr = new StreamReader(filename);
    string input;
    string pattern = @"\b(\w+)\s\1\b";
    Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
    
    while (sr.Peek() >= 0)
    {
       input = sr.ReadLine();
       MatchCollection matches = rgx.Matches(input);
       if (matches.Count > 0)
       {
          Console.WriteLine("{0} ({1} matches):", input, matches.Count);
          foreach (Match match in matches)
             Console.WriteLine("   " + match.Value);
       }
    }
    sr.Close();
    
    Dim sr As New StreamReader(filename)
    Dim input As String
    Dim pattern As String = "\b(\w+)\s\1\b"
    Dim rgx As New Regex(pattern, RegexOptions.IgnoreCase)
    Do While sr.Peek() >= 0
       input = sr.ReadLine()
       Dim matches As MatchCollection = rgx.Matches(input)
       If matches.Count > 0 Then
          Console.WriteLine("{0} ({1} matches):", input, matches.Count)
          For Each match As Match In matches
             Console.WriteLine("   " + match.Value)
          Next   
       End If
    Loop
    sr.Close()
    

Perform regular expression operations

Whether you decide to instantiate a Regex object and call its methods or call static methods, the Regex class offers the following pattern-matching functionality:

  • Validation of a match. You call the IsMatch method to determine whether a match is present.

  • Retrieval of a single match. You call the Match method to retrieve a Match object that represents the first match in a string or in part of a string. Subsequent matches can be retrieved by calling the Match.NextMatch method.

  • Retrieval of all matches. You call the Matches method to retrieve a System.Text.RegularExpressions.MatchCollection object that represents all the matches found in a string or in part of a string.

  • Replacement of matched text. You call the Replace method to replace matched text. The replacement text can also be defined by a regular expression. In addition, some of the Replace methods include a MatchEvaluator parameter that enables you to programmatically define the replacement text.

  • Creation of a string array that is formed from parts of an input string. You call the Split method to split an input string at positions that are defined by the regular expression.

In addition to its pattern-matching methods, the Regex class includes several special-purpose methods:

  • The Escape method escapes any characters that may be interpreted as regular expression operators in a regular expression or input string.
  • The Unescape method removes these escape characters.
  • The CompileToAssembly method creates an assembly that contains predefined regular expressions. .NET contains examples of these special-purpose assemblies in the System.Web.RegularExpressions namespace.

Define a time-out value

.NET supports a full-featured regular expression language that provides substantial power and flexibility in pattern matching. However, the power and flexibility come at a cost: the risk of poor performance. Regular expressions that perform poorly are surprisingly easy to create. In some cases, regular expression operations that rely on excessive backtracking can appear to stop responding when they process text that nearly matches the regular expression pattern. For more information about the .NET Regular Expression engine, see Details of regular expression behavior. For more information about excessive backtracking, see Backtracking.

Starting with .NET Framework 4.5, you can define a time-out interval for regular expression matches to limit excessive backtracking. Depending on the regular expression pattern and the input text, the execution time may exceed the specified time-out interval, but it will not spend more time backtracking than the specified time-out interval. If the regular expression engine times out, it throws a RegexMatchTimeoutException exception. In most cases, this prevents the regular expression engine from wasting processing power by trying to match text that nearly matches the regular expression pattern. It also could indicate, however, that the time-out interval has been set too low, or that the current machine load has caused an overall degradation in performance.

How you handle the exception depends on the cause of the exception. If the exception occurs because the time-out interval is set too low or because of excessive machine load, you can increase the time-out interval and retry the matching operation. If the exception occurs because the regular expression relies on excessive backtracking, you can assume that a match does not exist, and, optionally, you can log information that will help you modify the regular expression pattern.

You can set a time-out interval by calling the Regex(String, RegexOptions, TimeSpan) constructor when you instantiate a regular expression object. For static methods, you can set a time-out interval by calling an overload of a matching method that has a matchTimeout parameter. If you do not set a time-out value explicitly, the default time-out value is determined as follows:

  • By using the application-wide time-out value, if one exists. Set the application-wide time-out value by calling the AppDomain.SetData method to assign the string representation of a TimeSpan value to the REGEX_DEFAULT_MATCH_TIMEOUT property.
  • By using the value InfiniteMatchTimeout, if no application-wide time-out value has been set.

Important

We recommend that you set a time-out value in all regular expression pattern-matching operations. For more information, see Best practices for regular expressions.

Examples

The following example uses a regular expression to check for repeated occurrences of words in a string. The regular expression \b(?<word>\w+)\s+(\k<word>)\b can be interpreted as shown in the following table.

Pattern Description
\b Start the match at a word boundary.
(?<word>\w+) Match one or more word characters up to a word boundary. Name this captured group word.
\s+ Match one or more white-space characters.
(\k<word>) Match the captured group that's named word.
\b Match a word boundary.
using System;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main ()
    {
        // Define a regular expression for repeated words.
        Regex rx = new Regex(@"\b(?<word>\w+)\s+(\k<word>)\b",
          RegexOptions.Compiled | RegexOptions.IgnoreCase);

        // Define a test string.
        string text = "The the quick brown fox  fox jumps over the lazy dog dog.";

        // Find matches.
        MatchCollection matches = rx.Matches(text);

        // Report the number of matches found.
        Console.WriteLine("{0} matches found in:\n   {1}",
                          matches.Count,
                          text);

        // Report on each match.
        foreach (Match match in matches)
        {
            GroupCollection groups = match.Groups;
            Console.WriteLine("'{0}' repeated at positions {1} and {2}",
                              groups["word"].Value,
                              groups[0].Index,
                              groups[1].Index);
        }
    }
}

// The example produces the following output to the console:
//       3 matches found in:
//          The the quick brown fox  fox jumps over the lazy dog dog.
//       'The' repeated at positions 0 and 4
//       'fox' repeated at positions 20 and 25
//       'dog' repeated at positions 49 and 53
Imports System.Text.RegularExpressions

Public Module Test

    Public Sub Main()
        ' Define a regular expression for repeated words.
        Dim rx As New Regex("\b(?<word>\w+)\s+(\k<word>)\b", _
               RegexOptions.Compiled Or RegexOptions.IgnoreCase)

        ' Define a test string.        
        Dim text As String = "The the quick brown fox  fox jumps over the lazy dog dog."
        
        ' Find matches.
        Dim matches As MatchCollection = rx.Matches(text)

        ' Report the number of matches found.
        Console.WriteLine("{0} matches found in:", matches.Count)
        Console.WriteLine("   {0}", text)

        ' Report on each match.
        For Each match As Match In matches
            Dim groups As GroupCollection = match.Groups
            Console.WriteLine("'{0}' repeated at positions {1} and {2}", _ 
                              groups.Item("word").Value, _
                              groups.Item(0).Index, _
                              groups.Item(1).Index)
        Next
    End Sub
End Module
' The example produces the following output to the console:
'       3 matches found in:
'          The the quick brown fox  fox jumps over the lazy dog dog.
'       'The' repeated at positions 0 and 4
'       'fox' repeated at positions 20 and 25
'       'dog' repeated at positions 49 and 53

The next example illustrates the use of a regular expression to check whether a string either represents a currency value or has the correct format to represent a currency value. In this case, the regular expression is built dynamically from the NumberFormatInfo.CurrencyDecimalSeparator, CurrencyDecimalDigits, NumberFormatInfo.CurrencySymbol, NumberFormatInfo.NegativeSign, and NumberFormatInfo.PositiveSign properties for the en-US culture. The resulting regular expression is ^\s*[\+-]?\s?\$?\s?(\d*\.?\d{2}?){1}$. This regular expression can be interpreted as shown in the following table.

Pattern Description
^ Start at the beginning of the string.
\s* Match zero or more white-space characters.
[\+-]? Match zero or one occurrence of either the positive sign or the negative sign.
\s? Match zero or one white-space character.
\$? Match zero or one occurrence of the dollar sign.
\s? Match zero or one white-space character.
\d* Match zero or more decimal digits.
\.? Match zero or one decimal point symbol.
(\d{2})? Capturing group 1: Match two decimal digits zero or one time.
(\d*\.?(\d{2})?){1} Match the pattern of integral and fractional digits separated by a decimal point symbol at least one time.
$ Match the end of the string.

In this case, the regular expression assumes that a valid currency string does not contain group separator symbols, and that it has either no fractional digits or the number of fractional digits defined by the specified culture's CurrencyDecimalDigits property.

using System;
using System.Globalization;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        // Get the en-US NumberFormatInfo object to build the regular 
        // expression pattern dynamically.
        NumberFormatInfo nfi = CultureInfo.GetCultureInfo("en-US").NumberFormat;

        // Define the regular expression pattern.
        string pattern;
        pattern = @"^\s*[";
        // Get the positive and negative sign symbols.
        pattern += Regex.Escape(nfi.PositiveSign + nfi.NegativeSign) + @"]?\s?";
        // Get the currency symbol.
        pattern += Regex.Escape(nfi.CurrencySymbol) + @"?\s?";
        // Add integral digits to the pattern.
        pattern += @"(\d*";
        // Add the decimal separator.
        pattern += Regex.Escape(nfi.CurrencyDecimalSeparator) + "?";
        // Add the fractional digits.
        pattern += @"(\d{";
        // Determine the number of fractional digits in currency values.
        pattern += nfi.CurrencyDecimalDigits.ToString() + "})?){1}$";

        Console.WriteLine($"Pattern is {pattern}\n");

        Regex rgx = new Regex(pattern);

        // Define some test strings.
        string[] tests = { "-42", "19.99", "0.001", "100 USD",
                         ".34", "0.34", "1,052.21", "$10.62",
                         "+1.43", "-$0.23" };

        // Check each test string against the regular expression.
        foreach (string test in tests)
        {
            if (rgx.IsMatch(test))
                Console.WriteLine($"{test} is a currency value.");
            else
                Console.WriteLine($"{test} is not a currency value.");
        }
    }
}
// The example displays the following output:
//       Pattern is ^\s*[\+-]?\s?\$?\s?(\d*\.?(\d{2})?){1}$
//
//       -42 is a currency value.
//       19.99 is a currency value.
//       0.001 is not a currency value.
//       100 USD is not a currency value.
//       .34 is a currency value.
//       0.34 is a currency value.
//       1,052.21 is not a currency value.
//       $10.62 is a currency value.
//       +1.43 is a currency value.
//       -$0.23 is a currency value.
Imports System.Globalization
Imports System.Text.RegularExpressions

Public Module Example
   Public Sub Main()
      ' Get the current NumberFormatInfo object to build the regular 
      ' expression pattern dynamically.
      Dim nfi As NumberFormatInfo = CultureInfo.GetCultureInfo("en-US").NumberFormat

      ' Define the regular expression pattern.
      Dim pattern As String 
      pattern = "^\s*["
      ' Get the positive and negative sign symbols.
      pattern += Regex.Escape(nfi.PositiveSign + nfi.NegativeSign) + "]?\s?"
      ' Get the currency symbol.
      pattern += Regex.Escape(nfi.CurrencySymbol) + "?\s?"
      ' Add integral digits to the pattern.
      pattern += "(\d*"
      ' Add the decimal separator.
      pattern += Regex.Escape(nfi.CurrencyDecimalSeparator) + "?"
      ' Add the fractional digits.
      pattern += "(\d{"
      ' Determine the number of fractional digits in currency values.
      pattern += nfi.CurrencyDecimalDigits.ToString() + "})?){1}$"
      
      Console.WriteLine("Pattern is {0}", pattern)
      Console.WriteLine()
      
      Dim rgx As New Regex(pattern)

      ' Define some test strings.
      Dim tests() As String = {"-42", "19.99", "0.001", "100 USD", _
                               ".34", "0.34", "1,052.21", "$10.62", _
                               "+1.43", "-$0.23" }

      ' Check each test string against the regular expression.
      For Each test As String In tests
         If rgx.IsMatch(test) Then
            Console.WriteLine("{0} is a currency value.", test)
         Else
            Console.WriteLine("{0} is not a currency value.", test)
         End If
      Next
   End Sub
End Module
' The example displays the following output:
'       Pattern is ^\s*[\+-]?\s?\$?\s?(\d*\.?(\d{2})?){1}$
'
'       -42 is a currency value.
'       19.99 is a currency value.
'       0.001 is not a currency value.
'       100 USD is not a currency value.
'       .34 is a currency value.
'       0.34 is a currency value.
'       1,052.21 is not a currency value.
'       $10.62 is a currency value.
'       +1.43 is a currency value.
'       -$0.23 is a currency value.

Because the regular expression in this example is built dynamically, you don't know at design time whether the currency symbol, decimal sign, or positive and negative signs of the specified culture (en-US in this example) might be misinterpreted by the regular expression engine as regular expression language operators. To prevent any misinterpretation, the example passes each dynamically generated string to the Escape method.