正規表現での前方参照構成体

2023-05-09

前方参照は、文字列内の繰り返しの文字または部分文字列を識別するために便利な方法を提供します。たとえば、入力文字列に複数回出現する任意の部分文字列が含まれている場合は、キャプチャグループを使用して最初の一致を検出し、前方参照を使用して部分文字列の後続の出現箇所を見つけます。

注意

別の構文を使用して、置換文字列内の名前付きおよび番号付きのキャプチャグループを参照します。詳細については、「置換」を参照してください。

.NET では、番号付きおよび名前付きのキャプチャグループを参照する個別の言語要素が定義されています。キャプチャグループの詳細については、「グループ化コンストラクト」を参照してください。

番号付き前方参照

番号付き前方参照は、次の構文を使用します。

\ number

ここで、number は、正規表現でのキャプチャグループの位置を表す序数です。たとえば、\4 は 4 番目のキャプチャグループの内容と一致します。 number が正規表現パターンで定義されていない場合は、解析エラーが発生し、正規表現エンジンが ArgumentException をスローします。たとえば、正規表現 \b(\w+)\s\1 は有効です ((\w+) が式の中の最初で唯一のキャプチャグループであるため)。これに対して、\b(\w+)\s\2 は無効であり、引数の例外がスローされます (\2 という番号のキャプチャグループは存在しないため)。さらに、number が特定の序数位置のキャプチャグループを示していても、そのキャプチャグループにその序数位置とは異なる数値名が割り当てられている場合、正規表現パーサーで ArgumentException もスローされます。

同じ表記法を使用した、8 進数のエスケープコード (\16 など) と \number 前方参照との間には、あいまいさがあることに注意してください。このあいまいさは、次のように解決されます。

\1 から \9 までの式は、8 進数コードとしてではなく、常に前方参照として解釈されます。
複数桁の式の最初の桁が 8 または 9 (\80や \91) の場合、式はリテラルとして解釈されます。
\10 以降の式は、その番号に対応する前方参照がある場合、前方参照として解釈されます。それ以外の場合は、8 進数のコードとして解釈されます。
正規表現に未定義のグループ番号への前方参照が含まれる場合、解析エラーが発生し、正規表現エンジンが ArgumentException をスローします。

あいまいさが問題になる場合は、\k<name> という表記を使用できます。この表記はあいまいではなく、8 進数の文字コードと混同することはありません。同様に、\xdd などの 16 進数コードはあいまいではなく、前方参照と混同することはありません。

次の例では、文字列内の単語に使用される重複した文字を検索します。例で定義している正規表現 (\w)\1 は、次の要素で構成されています。

要素	説明
`(\w)`	単語文字を検出し、最初のキャプチャグループに割り当てます。
`\1`	最初のキャプチャグループの値と同じ次の文字を検出します。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\w)\1";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine($"Found '{match.Value}' at position {match.Index}.");
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.

Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "(\w)\1"
        Dim input As String = "trellis llama webbing dresser swagger"
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Found '{0}' at position {1}.", _
                              match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

名前付き前方参照

名前付き前方参照は、次の構文を使用して定義します。

\k< name >

または

\k' name '

ここで、name は正規表現パターンで定義されたキャプチャグループの名前です。 name が正規表現パターンで定義されていない場合は、解析エラーが発生し、正規表現エンジンが ArgumentException をスローします。

次の例では、文字列内の単語に使用される重複した文字を検索します。例で定義している正規表現 (?<char>\w)\k<char> は、次の要素で構成されています。

要素	説明
`(?<char>\w)`	単語文字を検出し、`char` という名前のキャプチャグループに割り当てます。
`\k<char>`	`char` キャプチャグループの値と同じ次の文字を検出します。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<char>\w)\k<char>";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine($"Found '{match.Value}' at position {match.Index}.");
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.

Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "(?<char>\w)\k<char>"
        Dim input As String = "trellis llama webbing dresser swagger"
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Found '{0}' at position {1}.", _
                              match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

名前付き数値前方参照

\k を使用する名前付き前方参照の場合、name は数字の文字列表現にすることもできます。たとえば、次の例では正規表現 (?<2>\w)\k<2> を使用して、文字列内の単語の重複した文字を検索します。この例では、明示的に "2" という名前が付けられたキャプチャグループを定義し、これに応じて、前方参照には "2" という名前が付けられています。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<2>\w)\k<2>";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine($"Found '{match.Value}' at position {match.Index}.");
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.

Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "(?<2>\w)\k<2>"
        Dim input As String = "trellis llama webbing dresser swagger"
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Found '{0}' at position {1}.", _
                              match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

name が数字の文字列表現で、その名前を持つキャプチャグループが存在しない場合、\k<name> は前方参照 \number と同じになります。ここで、number はキャプチャの序数位置です。次の例には、char という名前の単一のキャプチャグループがあります。前方参照構成体ではこれを \k<1> と呼びます。例からの出力に示されているように、char は最初のキャプチャグループであるため、Regex.IsMatch の呼び出しは正常に行われています。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      Console.WriteLine(Regex.IsMatch("aa", @"(?<char>\w)\k<1>"));
      // Displays "True".
   }
}


Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Console.WriteLine(Regex.IsMatch("aa", "(?<char>\w)\k<1>"))
        ' Displays "True".
    End Sub
End Module

ただし、name が数字の文字列表現であり、その位置のキャプチャグループに数値名が明示的に割り当てられている場合、正規表現パーサーではその序数位置でキャプチャグループを識別することはできません。代わりに、ArgumentException がスローされます。次の例の唯一のキャプチャグループには "2" という名前が付けられています。 \k コンストラクトが "1" という名前の前方参照を定義するために使用されているため、正規表現パーサーは最初のキャプチャグループを識別できず、例外をスローします。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      Console.WriteLine(Regex.IsMatch("aa", @"(?<2>\w)\k<1>"));
      // Throws an ArgumentException.
   }
}


Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Console.WriteLine(Regex.IsMatch("aa", "(?<2>\w)\k<1>"))
        ' Throws an ArgumentException.
    End Sub
End Module

前方参照と一致する内容

前方参照は、グループの最新の定義 (左から右に検出する場合は、すぐ左にある定義) を参照します。 1 つのグループで複数のキャプチャが発生した場合、前方参照は最新のキャプチャを参照します。

次の例には、正規表現パターン (?<1>a)(?<1>\1b)* が含まれています。このパターンは \1 の名前付きグループを再定義します。正規表現の各パターンは、次の表に示すように定義されています。

Pattern	説明
`(?<1>a)`	文字 "a" を検出し、結果を `1` という名前のキャプチャグループに割り当てます。
`(?<1>\1b)*`	`1` という名前のグループの 0 個以上の出現箇所を "b" と共に検出し、結果を `1` という名前のキャプチャグループに割り当てます。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<1>a)(?<1>\1b)*";
      string input = "aababb";
      foreach (Match match in Regex.Matches(input, pattern))
      {
         Console.WriteLine("Match: " + match.Value);
         foreach (Group group in match.Groups)
            Console.WriteLine("   Group: " + group.Value);
      }
   }
}
// The example displays the following output:
//          Group: aababb
//          Group: abb

Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "(?<1>a)(?<1>\1b)*"
        Dim input As String = "aababb"
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Match: " + match.Value)
            For Each group As Group In match.Groups
                Console.WriteLIne("   Group: " + group.Value)
            Next
        Next
    End Sub
End Module
' The example display the following output:
'          Group: aababb
'          Group: abb

正規表現を入力文字列 ("aababb") と比較する際、正規表現エンジンは次の操作を実行します。

文字列の先頭から開始し、式 (?<1>a) で "a" を検出します。グループ 1 の値が "a" になります。
次の文字に進み、式 \1b で文字列 "ab" を検出します。次に、その結果 "ab" を \1 に割り当てます。
これにより 4 番目の文字に進みます。式 (?<1>\1b)* を 0 回以上照合し、式 \1b で文字列 "abb" を検出します。その結果 "abb" を \1 に割り当てます。

この例では、* はループ量指定子であり、正規表現エンジンが定義したパターンを照合できなくなるまで、繰り返し評価されます。ループ量指定子によってグループの定義はクリアされません。

グループで部分文字列がキャプチャされなかった場合、そのグループへの前方参照は未定義になり、一致することはありません。次のように定義されている正規表現パターン \b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b を例として示します。

Pattern	説明
`\b`	ワード境界から照合を開始します。
`(\p{Lu}{2})`	2 つの大文字と一致します。これが最初のキャプチャグループです。
`(\d{2})?`	2 桁の 10 進数の 0 回または 1 回の出現と一致します。これが 2 番目のキャプチャグループです。
`(\p{Lu}{2})`	2 つの大文字と一致します。これが 3 番目のキャプチャグループです。
`\b`	ワード境界で照合を終了します。

2 番目のキャプチャグループによって定義されている 2 桁の 10 進数が存在しない場合でも、入力文字列はこの正規表現を照合できます。次の例では、一致が見つかった場合でも、成功した 2 つのキャプチャグループの間に空のキャプチャグループが検出されます。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b";
      string[] inputs = { "AA22ZZ", "AABB" };
      foreach (string input in inputs)
      {
         Match match = Regex.Match(input, pattern);
         if (match.Success)
         {
            Console.WriteLine($"Match in {input}: {match.Value}");
            if (match.Groups.Count > 1)
            {
               for (int ctr = 1; ctr <= match.Groups.Count - 1; ctr++)
               {
                  if (match.Groups[ctr].Success)
                     Console.WriteLine($"Group {ctr}: {match.Groups[ctr].Value}");
                  else
                     Console.WriteLine($"Group {ctr}: <no match>");
               }
            }
         }
         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//       Match in AA22ZZ: AA22ZZ
//       Group 1: AA
//       Group 2: 22
//       Group 3: ZZ
//
//       Match in AABB: AABB
//       Group 1: AA
//       Group 2: <no match>
//       Group 3: BB

Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "\b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b"
        Dim inputs() As String = {"AA22ZZ", "AABB"}
        For Each input As String In inputs
            Dim match As Match = Regex.Match(input, pattern)
            If match.Success Then
                Console.WriteLine("Match in {0}: {1}", input, match.Value)
                If match.Groups.Count > 1 Then
                    For ctr As Integer = 1 To match.Groups.Count - 1
                        If match.Groups(ctr).Success Then
                            Console.WriteLine("Group {0}: {1}", _
                                              ctr, match.Groups(ctr).Value)
                        Else
                            Console.WriteLine("Group {0}: <no match>", ctr)
                        End If
                    Next
                End If
            End If
            Console.WriteLine()
        Next
    End Sub
End Module
' The example displays the following output:
'       Match in AA22ZZ: AA22ZZ
'       Group 1: AA
'       Group 2: 22
'       Group 3: ZZ
'       
'       Match in AABB: AABB
'       Group 1: AA
'       Group 2: <no match>
'       Group 3: BB

次の方法で共有

正規表現での前方参照構成体

番号付き前方参照

名前付き前方参照

名前付き数値前方参照

前方参照と一致する内容

関連項目

その他のリソース