前方参照構成体

[アーティクル]
08/09/2011

前方参照を使用すると、文字列内で繰り返し出現する文字や部分文字列を簡単に特定できます。たとえば、入力文字列に任意の部分文字列が複数回出現する場合は、最初の出現箇所をキャプチャグループで一致させた後に、前方参照を使用してそれ以降の出現箇所を一致させることができます。

メモ
置換文字列の名前付きのキャプチャグループと番号付きのキャプチャグループの参照には、別の構文が使用されます。詳細については、「置換」を参照してください。

.NET Framework では、番号付きのキャプチャグループと名前付きのキャプチャグループを参照する個別の言語要素が定義されています。キャプチャグループの詳細については、「グループ化構成体」を参照してください。

番号付き前方参照

番号付き前方参照では、次の構文を使用します。

\number

number は、正規表現のキャプチャグループの位置を表す序数です。たとえば、\4 を使用すると、4 番目のキャプチャグループの内容と一致します。 number が正規表現パターンで定義されていない場合は、解析エラーが発生し、正規表現エンジンから ArgumentException がスローされます。たとえば、正規表現 \b(\w+)\s\1 は、(\w+) が式の 1 番目の唯一のキャプチャグループであるので有効です。これに対して、\b(\w+)\s\2 は、\2 の番号が付けられたキャプチャグループが存在しないので無効であり、引数の例外がスローされます。

同じ表記を使用する 8 進数エスケープコード (\16 など) と \number 前方参照はあいまいです。このあいまいさは次のように解決されます。

\1 から \9 までの式は常に前方参照と解釈され、8 進コードとは解釈されません。
複数桁の式の最初の桁が 8 または 9 の場合 (\80 や \91 など)、その式はリテラルと解釈されます。
\10 以降の式は、その番号に対応する前方参照がある場合には、前方参照と見なされます。それ以外の場合は、8 進コードと解釈されます。
未定義のグループ番号への前方参照が正規表現にある場合は、解析エラーが発生し、正規表現エンジンから ArgumentException がスローされます。

あいまいさが問題となる場合は、\k<name> という表記法を使用できます。この表記法にはあいまいさがなく、8 進文字コードと混同されません。同様に、\xdd などの 16 進コードにもあいまいさがないため、前方参照と混同されることはありません。

次の例では、文字列内で、単語に使用される文字のうち重複する文字を検索します。この例では、次の要素で構成される正規表現 (\w)\1 が定義されています。

要素	説明
(\w)	単語に使用される文字と一致し、その文字を 1 番目のキャプチャグループに代入します。
\1	次に出現した、1 番目のキャプチャグループの値と同じ文字と一致します。

Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(\w)\1"
      Dim input As String = "trellis llama webbing dresser swagger"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Found '{0}' at position {1}.", _
                           match.Value, match.Index)
      Next   
   End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\w)\1";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found '{0}' at position {1}.", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.

名前付き前方参照

名前付き前方参照は、次の構文を使用して定義します。

\k<name>

または

\k'name'

name は、正規表現パターンで定義されたキャプチャグループの名前です。 name が正規表現パターンで定義されていない場合は、解析エラーが発生し、正規表現エンジンから ArgumentException がスローされます。

次の例では、文字列内で、単語に使用される文字のうち重複する文字を検索します。この例では、次の要素で構成される正規表現 (?<char>\w)\k<char> が定義されています。

要素	説明
(?<char>\w)	単語に使用される文字と一致し、その文字を char という名前のキャプチャグループに代入します。
\k<char>	次に出現した、char キャプチャグループの値と同じ文字と一致します。

Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<char>\w)\k<char>"
      Dim input As String = "trellis llama webbing dresser swagger"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Found '{0}' at position {1}.", _
                           match.Value, match.Index)
      Next   
   End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<char>\w)\k<char>";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found '{0}' at position {1}.", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.

name には、数値の文字列形式を指定することもできます。たとえば、次の例では、正規表現 (?<2>\w)\k<2> を使用して、文字列内で、単語に使用される文字のうち重複する文字を検索します。

Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<2>\w)\k<2>"
      Dim input As String = "trellis llama webbing dresser swagger"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Found '{0}' at position {1}.", _
                           match.Value, match.Index)
      Next   
   End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<2>\w)\k<2>";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found '{0}' at position {1}.", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.

前方参照で一致する対象

前方参照は、最も近いグループ定義 (左から右に検索する場合、左側の最も近いところにある定義) を参照します。 1 つのグループで複数のキャプチャが行われる場合、前方参照は最も近いキャプチャを参照します。

次の例では、\1 名前付きグループを再定義する正規表現パターン (?<1>a)(?<1>\1b)* を使用しています。この正規表現の各パターンの説明を次の表に示します。

パターン	説明
(?<1>a)	文字 "a" と一致し、結果を 1 という名前のキャプチャグループに代入します。
(?<1>\1b)*	1 という名前のグループに "b" を加えた文字列に 0 回または 1 回一致し、結果を 1 という名前のキャプチャグループに代入します。

Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<1>a)(?<1>\1b)*"
      Dim input As String = "aababb"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Match: " + match.Value)
         For Each group As Group In match.Groups
            Console.WriteLIne("   Group: " + group.Value)
         Next
      Next
   End Sub
End Module
' The example display the following output:
'          Group: aababb
'          Group: abb

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<1>a)(?<1>\1b)*";
      string input = "aababb";
      foreach (Match match in Regex.Matches(input, pattern))
      {
         Console.WriteLine("Match: " + match.Value);
         foreach (Group group in match.Groups)
            Console.WriteLine("   Group: " + group.Value);
      }
   }
}
// The example displays the following output:
//          Group: aababb
//          Group: abb

正規表現と入力文字列 ("aababb") の比較時、正規表現エンジンは次の操作を実行します。

文字列の先頭から開始し、"a" と式 (?<1>a) の一致に成功します。 1 グループの値は "a" になります。
2 番目の文字に進み、文字列 "ab" と式 \1b (つまり "ab") の一致に成功します。結果の "ab" を \1 に代入します。
4 番目の文字に進みます。式 (?<1>\1b) は 0 回以上一致するので、文字列 "abb" と式 \1b の一致に成功します。結果の "abb" を \1 に代入します。

この例では、* はループ量指定子であるので、定義されているパターンと一致する文字列がなくなるまで、繰り返し評価されます。量指定子によるループが行われても、グループ定義はクリアされません。

あるグループによってキャプチャされる部分文字列がない場合は、そのグループへの前方参照は未定義になるため、一致する文字列は見つかりません。このことを、次のように定義される正規表現パターン \b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b を使用して示します。

パターン	説明
\b	ワード境界から照合を開始します。
(\p{Lu}{2})	2 個の大文字と一致します。これが最初のキャプチャグループです。
(\d{2})?	2 桁の 10 進数と 0 回または 1 回一致します。これが 2 番目のキャプチャグループです。
(\p{Lu}{2})	2 個の大文字と一致します。これが 3 番目のキャプチャグループです。
\b	ワード境界で照合を終了します。

2 番目のキャプチャグループで定義されている 2 桁の 10 進数が存在しない場合でも、入力文字列とこの正規表現を一致させることができます。次の例では、一致する対象は見つかりますが、一致する対象が見つかった 2 つのキャプチャグループの間には空のキャプチャグループがあります。

Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b"
      Dim inputs() As String = { "AA22ZZ", "AABB" }
      For Each input As String In inputs
         Dim match As Match = Regex.Match(input, pattern)
         If match.Success Then
            Console.WriteLine("Match in {0}: {1}", input, match.Value)
            If match.Groups.Count > 1 Then
               For ctr As Integer = 1 To match.Groups.Count - 1
                  If match.Groups(ctr).Success Then
                     Console.WriteLine("Group {0}: {1}", _
                                       ctr, match.Groups(ctr).Value)
                  Else
                     Console.WriteLine("Group {0}: <no match>", ctr)
                  End If      
               Next
            End If
         End If
         Console.WriteLine()
      Next      
   End Sub
End Module
' The example displays the following output:
'       Match in AA22ZZ: AA22ZZ
'       Group 1: AA
'       Group 2: 22
'       Group 3: ZZ
'       
'       Match in AABB: AABB
'       Group 1: AA
'       Group 2: <no match>
'       Group 3: BB

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b";
      string[] inputs = { "AA22ZZ", "AABB" };
      foreach (string input in inputs)
      {
         Match match = Regex.Match(input, pattern);
         if (match.Success)
         {
            Console.WriteLine("Match in {0}: {1}", input, match.Value);
            if (match.Groups.Count > 1)
            {
               for (int ctr = 1; ctr <= match.Groups.Count - 1; ctr++)
               {
                  if (match.Groups[ctr].Success)
                     Console.WriteLine("Group {0}: {1}", 
                                       ctr, match.Groups[ctr].Value);
                  else
                     Console.WriteLine("Group {0}: <no match>", ctr);
               }
            }
         }
         Console.WriteLine();
      }      
   }
}
// The example displays the following output:
//       Match in AA22ZZ: AA22ZZ
//       Group 1: AA
//       Group 2: 22
//       Group 3: ZZ
//       
//       Match in AABB: AABB
//       Group 1: AA
//       Group 2: <no match>
//       Group 3: BB

参照

その他の技術情報

正規表現言語要素

次の方法で共有

前方参照構成体

番号付き前方参照

名前付き前方参照

前方参照で一致する対象

参照

その他の技術情報

その他のリソース