文字列から部分文字列を抽出する

この記事では、文字列の一部を抽出するためのさまざまな手法について説明します。

目的の部分文字列が既知の区切り文字 (1 つまたは複数) で区切られている場合は、Split メソッドを使用します。
文字列が固定パターンに準拠している場合は、正規表現が便利です。
文字列内の "すべての" 部分文字列を抽出したくないときは、IndexOf メソッドと Substring メソッドを組み合わせて使用します。
既知の位置で文字を抽出またはトリミングするには、C# で範囲とインデックスを使用します。

String.Split メソッド

String.Split には、指定した 1 つ以上の区切り文字に基づいて文字列を部分文字列のグループに分割するのに役立つ、いくつかのオーバーロードが用意されています。最終結果の部分文字列の合計数を制限したり、部分文字列から空白文字を取り除いたり、空の部分文字列を除外したりできます。

次の例では、String.Split() の 3 つの異なるオーバーロードを示します。最初の例では、区切り文字を渡さずに Split(Char[]) のオーバーロードを呼び出します。区切り文字を指定しないと、String.Split() により、文字列を分割するための既定の区切り記号 (空白文字) が使用されます。

string s = "You win some. You lose some.";

string[] subs = s.Split();

foreach (string sub in subs)
{
    Console.WriteLine($"Substring: {sub}");
}

// This example produces the following output:
//
// Substring: You
// Substring: win
// Substring: some.
// Substring: You
// Substring: lose
// Substring: some.

Dim s As String = "You win some. You lose some."
Dim subs As String() = s.Split()

For Each substring As String In subs
    Console.WriteLine("Substring: {0}", substring)
Next

' This example produces the following output:
'
' Substring: You
' Substring: win
' Substring: some.
' Substring: You
' Substring: lose
' Substring: some.

ご覧のように、部分文字列の 2 つにピリオド文字 (.) が含まれています。ピリオド文字を除外する場合は、ピリオド文字を追加の区切り文字として追加できます。その方法を次の例に示します。

string s = "You win some. You lose some.";

string[] subs = s.Split(' ', '.');

foreach (string sub in subs)
{
    Console.WriteLine($"Substring: {sub}");
}

// This example produces the following output:
//
// Substring: You
// Substring: win
// Substring: some
// Substring:
// Substring: You
// Substring: lose
// Substring: some
// Substring:

Dim s As String = "You win some. You lose some."
Dim subs As String() = s.Split(" "c, "."c)

For Each substring As String In subs
    Console.WriteLine("Substring: {0}", substring)
Next

' This example produces the following output:
'
' Substring: You
' Substring: win
' Substring: some
' Substring:
' Substring: You
' Substring: lose
' Substring: some
' Substring:

部分文字列からピリオドは削除されましたが、今度は 2 つの余分な空の部分文字列が含まれるようになっています。これらの空の部分文字列は、単語とそれに続くピリオドの間の部分文字列を表します。結果の配列から空の部分文字列を省略するには、Split(Char[], StringSplitOptions) のオーバーロードを呼び出し、StringSplitOptions.RemoveEmptyEntries パラメーターとして options を指定します。

string s = "You win some. You lose some.";
char[] separators = new char[] { ' ', '.' };

string[] subs = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);

foreach (string sub in subs)
{
    Console.WriteLine($"Substring: {sub}");
}

// This example produces the following output:
//
// Substring: You
// Substring: win
// Substring: some
// Substring: You
// Substring: lose
// Substring: some

Dim s As String = "You win some. You lose some."
Dim separators As Char() = New Char() {" "c, "."c}
Dim subs As String() = s.Split(separators, StringSplitOptions.RemoveEmptyEntries)

For Each substring As String In subs
    Console.WriteLine("Substring: {0}", substring)
Next

' This example produces the following output:
'
' Substring: You
' Substring: win
' Substring: some
' Substring: You
' Substring: lose
' Substring: some

正規表現

文字列が固定パターンに準拠している場合は、正規表現を使用してその要素を抽出して処理できます。たとえば、文字列の形式が "数字オペランド数字" の場合、正規表現を使用して、文字列の要素を抽出して処理できます。次に例を示します。

String[] expressions = { "16 + 21", "31 * 3", "28 / 3",
                       "42 - 18", "12 * 7",
                       "2, 4, 6, 8" };
String pattern = @"(\d+)\s+([-+*/])\s+(\d+)";

foreach (string expression in expressions)
{
    foreach (System.Text.RegularExpressions.Match m in
    System.Text.RegularExpressions.Regex.Matches(expression, pattern))
    {
        int value1 = Int32.Parse(m.Groups[1].Value);
        int value2 = Int32.Parse(m.Groups[3].Value);
        switch (m.Groups[2].Value)
        {
            case "+":
                Console.WriteLine($"{m.Value} = {value1 + value2}");
                break;
            case "-":
                Console.WriteLine($"{m.Value} = {value1 - value2}");
                break;
            case "*":
                Console.WriteLine($"{m.Value} = {value1 * value2}");
                break;
            case "/":
                Console.WriteLine($"{m.Value} = {value1 / value2:N2}");
                break;
        }
    }
}

// The example displays the following output:
//       16 + 21 = 37
//       31 * 3 = 93
//       28 / 3 = 9.33
//       42 - 18 = 24
//       12 * 7 = 84

Dim expressions() As String = {"16 + 21", "31 * 3", "28 / 3",
                              "42 - 18", "12 * 7",
                              "2, 4, 6, 8"}

Dim pattern As String = "(\d+)\s+([-+*/])\s+(\d+)"
For Each expression In expressions
    For Each m As Match In Regex.Matches(expression, pattern)
        Dim value1 As Integer = Int32.Parse(m.Groups(1).Value)
        Dim value2 As Integer = Int32.Parse(m.Groups(3).Value)
        Select Case m.Groups(2).Value
            Case "+"
                Console.WriteLine("{0} = {1}", m.Value, value1 + value2)
            Case "-"
                Console.WriteLine("{0} = {1}", m.Value, value1 - value2)
            Case "*"
                Console.WriteLine("{0} = {1}", m.Value, value1 * value2)
            Case "/"
                Console.WriteLine("{0} = {1:N2}", m.Value, value1 / value2)
        End Select
    Next
Next

' The example displays the following output:
'       16 + 21 = 37
'       31 * 3 = 93
'       28 / 3 = 9.33
'       42 - 18 = 24
'       12 * 7 = 84

(\d+)\s+([-+*/])\s+(\d+) という正規表現パターンは、次のように定義されます。

パターン	説明
`(\d+)`	1 個以上の 10 進数字にマッチします。これが最初の捕捉グループです。
`\s+`	1 つ以上の空白文字と一致します。
`([-+*/])`	算術演算子の記号 (+、-、*、/) と一致します。これが 2 番目のキャプチャグループです。
`\s+`	1 つ以上の空白文字と一致します。
`(\d+)`	1 個以上の 10 進数字と一致します。これが 3 番目のキャプチャリンググループです。

また、正規表現を使用して、固定の文字セットではなく、パターンに基づいて文字列から部分文字列を抽出することもできます。これは、次のいずれかの条件が発生するときの一般的なシナリオです。

1つ以上の区切り文字がStringインスタンスでは、常に区切り記号として機能するとは限りません。
区切り文字のシーケンスと数が、可変または不明です。

たとえば、Split メソッドを使用して、次の文字列を分割することはできません。これは、\n (改行) 文字の数が可変であり、常に区切り記号として機能しないためです。

[This is captured\ntext.]\n\n[\n[This is more captured text.]\n]
\n[Some more captured text:\n   Option1\n   Option2][Terse text.]

次の例に示すように、正規表現を使用すると、この文字列を簡単に分割できます。

String input = "[This is captured\ntext.]\n\n[\n" +
               "[This is more captured text.]\n]\n" +
               "[Some more captured text:\n   Option1" +
               "\n   Option2][Terse text.]";
String pattern = @"\[([^\[\]]+)\]";
int ctr = 0;

foreach (System.Text.RegularExpressions.Match m in
   System.Text.RegularExpressions.Regex.Matches(input, pattern))
{
    Console.WriteLine($"{++ctr}: {m.Groups[1].Value}");
}

// The example displays the following output:
//       1: This is captured
//       text.
//       2: This is more captured text.
//       3: Some more captured text:
//          Option1
//          Option2
//       4: Terse text.

Dim input As String = String.Format("[This is captured{0}text.]" +
                                  "{0}{0}[{0}[This is more " +
                                  "captured text.]{0}{0}" +
                                  "[Some more captured text:" +
                                  "{0}   Option1" +
                                  "{0}   Option2][Terse text.]",
                                  vbCrLf)
Dim pattern As String = "\[([^\[\]]+)\]"
Dim ctr As Integer = 0
For Each m As Match In Regex.Matches(input, pattern)
    ctr += 1
    Console.WriteLine("{0}: {1}", ctr, m.Groups(1).Value)
Next

' The example displays the following output:
'       1: This is captured
'       text.
'       2: This is more captured text.
'       3: Some more captured text:
'          Option1
'          Option2
'       4: Terse text.

\[([^\[\]]+)\] という正規表現パターンは、次のように定義されます。

パターン	説明
`\[`	開く括弧に一致させる
`([^\[\]]+)`	任意の文字と角括弧ではないものを1回以上繰り返し一致させます。これが最初のキャプチャーグループです。
`\]`	閉じ括弧と一致します。

Regex.Split メソッドは String.Split とほぼ同じですが、固定文字セットではなく正規表現パターンに基づいて文字列を分割する点が異なります。たとえば、次の例では、Regex.Split メソッドを使用して、ハイフンと他の文字のさまざまな組み合わせで区切られた部分文字列が含まれる文字列を分割します。

String input = "abacus -- alabaster - * - atrium -+- " +
               "any -*- actual - + - armoire - - alarm";
String pattern = @"\s-\s?[+*]?\s?-\s";
String[] elements = System.Text.RegularExpressions.Regex.Split(input, pattern);

foreach (string element in elements)
    Console.WriteLine(element);

// The example displays the following output:
//       abacus
//       alabaster
//       atrium
//       any
//       actual
//       armoire
//       alarm

Dim input As String = "abacus -- alabaster - * - atrium -+- " +
                    "any -*- actual - + - armoire - - alarm"
Dim pattern As String = "\s-\s?[+*]?\s?-\s"
Dim elements() As String = Regex.Split(input, pattern)
For Each element In elements
    Console.WriteLine(element)
Next

' The example displays the following output:
'       abacus
'       alabaster
'       atrium
'       any
'       actual
'       armoire
'       alarm

\s-\s?[+*]?\s?-\s という正規表現パターンは、次のように定義されます。

パターン	説明
`\s-`	空白文字に続くハイフンを一致させます。
`\s?`	0 個または 1 個の空白文字と一致します。
`[+*]?`	+ または * の文字の 0 回または 1 回の出現を一致させます。
`\s?`	0 個または 1 個の空白文字にマッチします。
`-\s`	ハイフンの後に空白文字が続くパターンに一致します。

String.IndexOf メソッドと String.Substring メソッド

文字列内のすべての部分文字列に関心があるわけではない場合は、一致が始まる位置のインデックスを返す文字列比較メソッドのいずれかを使用することをお勧めします。その後、Substring メソッドを呼び出して、必要な部分文字列を抽出できます。次のような文字列比較メソッドがあります。

IndexOf。文字列インスタンス内で文字または文字列が最初に出現する位置の 0 から始まるインデックスを返します。
IndexOfAny。文字配列内で任意の文字が最初に出現する位置の、現在の文字列インスタンス内での 0 から始まるインデックスを返します。
LastIndexOf。文字列インスタンス内で文字または文字列が最後に出現する位置の 0 から始まるインデックスを返します。
LastIndexOfAny。文字配列内で任意の文字が最後に出現する位置の、現在の文字列インスタンス内での 0 から始まるインデックスを返します。

次の例では、IndexOf メソッドを使用して、文字列内のピリオドを検索します。次に、Substring メソッドを使用して完全な文を返します。

String s = "This is the first sentence in a string. " +
               "More sentences will follow. For example, " +
               "this is the third sentence. This is the " +
               "fourth. And this is the fifth and final " +
               "sentence.";
var sentences = new List<String>();
int start = 0;
int position;

// Extract sentences from the string.
do
{
    position = s.IndexOf('.', start);
    if (position >= 0)
    {
        sentences.Add(s.Substring(start, position - start + 1).Trim());
        start = position + 1;
    }
} while (position > 0);

// Display the sentences.
foreach (var sentence in sentences)
    Console.WriteLine(sentence);

// The example displays the following output:
//       This is the first sentence in a string.
//       More sentences will follow.
//       For example, this is the third sentence.
//       This is the fourth.
//       And this is the fifth and final sentence.

    Dim input As String = "This is the first sentence in a string. " +
                        "More sentences will follow. For example, " +
                        "this is the third sentence. This is the " +
                        "fourth. And this is the fifth and final " +
                        "sentence."
    Dim sentences As New List(Of String)
    Dim start As Integer = 0
    Dim position As Integer

    ' Extract sentences from the string.
    Do
        position = input.IndexOf("."c, start)
        If position >= 0 Then
            sentences.Add(input.Substring(start, position - start + 1).Trim())
            start = position + 1
        End If
    Loop While position > 0

    ' Display the sentences.
    For Each sentence In sentences
        Console.WriteLine(sentence)
    Next
End Sub

' The example displays the following output:
'       This is the first sentence in a string.
'       More sentences will follow.
'       For example, this is the third sentence.
'       This is the fourth.
'       And this is the fifth and final sentence.

範囲とインデックス

C# 範囲演算子 .. とインデックスの from-end 演算子 ^ 、簡潔な構文を使用して部分文字列を抽出できます。これらの演算子は、 Substringを呼び出さずに文字列に直接適用できます。

次の例は、範囲を使用して文字列の一部を抽出するいくつかの方法を示しています。

string str = "Hello, World!";

// Get the first 5 characters.
string hello = str[..5];
Console.WriteLine(hello);
// Output: Hello

// Get the last 6 characters.
string world = str[^6..];
Console.WriteLine(world);
// Output: World!

// Get characters from index 7 through 11 (exclusive of 12).
string substr = str[7..12];
Console.WriteLine(substr);
// Output: World

次の例では、index-from-end 演算子を使用して、パスからファイル拡張子 (最後の 3 文字) を削除します。

string filePath = "C:\\Users\\user1\\bin\\fileA.cs";

// Remove the last 3 characters (.cs extension).
string trimmedPath = filePath[..^3];
Console.WriteLine(trimmedPath);
// Output: C:\Users\user1\bin\fileA

注

範囲とインデックスの from-end 演算子は C# の機能です。 Visual Basic では、この構文はサポートされていません。代わりに Substring を使用してください。

フィードバック

このページはお役に立ちましたか?

Last updated on 2026-03-24