StringInfo 類別
定義
重要
部分資訊涉及發行前產品,在發行之前可能會有大幅修改。 Microsoft 對此處提供的資訊,不做任何明確或隱含的瑕疵擔保。
提供將字串拆分成文字元素並遍歷這些文字元素的功能。
public ref class StringInfo
public class StringInfo
[System.Serializable]
public class StringInfo
[System.Serializable]
[System.Runtime.InteropServices.ComVisible(true)]
public class StringInfo
type StringInfo = class
[<System.Serializable>]
type StringInfo = class
[<System.Serializable>]
[<System.Runtime.InteropServices.ComVisible(true)>]
type StringInfo = class
Public Class StringInfo
- 繼承
-
StringInfo
- 屬性
範例
此範例使用類別的 StringInfo and ParseCombiningCharactersGetTextElementEnumerator 方法來操作包含代理字元與組合字元的字串。
using System;
using System.Text;
using System.Globalization;
public sealed class App {
static void Main() {
// The string below contains combining characters.
String s = "a\u0304\u0308bc\u0327";
// Show each 'character' in the string.
EnumTextElements(s);
// Show the index in the string where each 'character' starts.
EnumTextElementIndexes(s);
}
// Show how to enumerate each real character (honoring surrogates) in a string.
static void EnumTextElements(String s) {
// This StringBuilder holds the output results.
StringBuilder sb = new StringBuilder();
// Use the enumerator returned from GetTextElementEnumerator
// method to examine each real character.
TextElementEnumerator charEnum = StringInfo.GetTextElementEnumerator(s);
while (charEnum.MoveNext()) {
sb.AppendFormat(
"Character at index {0} is '{1}'{2}",
charEnum.ElementIndex, charEnum.GetTextElement(),
Environment.NewLine);
}
// Show the results.
Console.WriteLine("Result of GetTextElementEnumerator:");
Console.WriteLine(sb);
}
// Show how to discover the index of each real character (honoring surrogates) in a string.
static void EnumTextElementIndexes(String s) {
// This StringBuilder holds the output results.
StringBuilder sb = new StringBuilder();
// Use the ParseCombiningCharacters method to
// get the index of each real character in the string.
Int32[] textElemIndex = StringInfo.ParseCombiningCharacters(s);
// Iterate through each real character showing the character and the index where it was found.
for (Int32 i = 0; i < textElemIndex.Length; i++) {
sb.AppendFormat(
"Character {0} starts at index {1}{2}",
i, textElemIndex[i], Environment.NewLine);
}
// Show the results.
Console.WriteLine("Result of ParseCombiningCharacters:");
Console.WriteLine(sb);
}
}
// This code produces the following output:
//
// Result of GetTextElementEnumerator:
// Character at index 0 is 'ā̈'
// Character at index 3 is 'b'
// Character at index 4 is 'ç'
//
// Result of ParseCombiningCharacters:
// Character 0 starts at index 0
// Character 1 starts at index 3
// Character 2 starts at index 4
Imports System.Text
Imports System.Globalization
Public Module Example
Public Sub Main()
' The string below contains combining characters.
Dim s As String = "a" + ChrW(&h0304) + ChrW(&h0308) + "bc" + ChrW(&h0327)
' Show each 'character' in the string.
EnumTextElements(s)
' Show the index in the string where each 'character' starts.
EnumTextElementIndexes(s)
End Sub
' Show how to enumerate each real character (honoring surrogates) in a string.
Sub EnumTextElements(s As String)
' This StringBuilder holds the output results.
Dim sb As New StringBuilder()
' Use the enumerator returned from GetTextElementEnumerator
' method to examine each real character.
Dim charEnum As TextElementEnumerator = StringInfo.GetTextElementEnumerator(s)
Do While charEnum.MoveNext()
sb.AppendFormat("Character at index {0} is '{1}'{2}",
charEnum.ElementIndex,
charEnum.GetTextElement(),
Environment.NewLine)
Loop
' Show the results.
Console.WriteLine("Result of GetTextElementEnumerator:")
Console.WriteLine(sb)
End Sub
' Show how to discover the index of each real character (honoring surrogates) in a string.
Sub EnumTextElementIndexes(s As String)
' This StringBuilder holds the output results.
Dim sb As New StringBuilder()
' Use the ParseCombiningCharacters method to
' get the index of each real character in the string.
Dim textElemIndex() As Integer = StringInfo.ParseCombiningCharacters(s)
' Iterate through each real character showing the character and the index where it was found.
For i As Int32 = 0 To textElemIndex.Length - 1
sb.AppendFormat("Character {0} starts at index {1}{2}",
i, textElemIndex(i), Environment.NewLine)
Next
' Show the results.
Console.WriteLine("Result of ParseCombiningCharacters:")
Console.WriteLine(sb)
End Sub
End Module
' The example displays the following output:
'
' Result of GetTextElementEnumerator:
' Character at index 0 is 'ā̈'
' Character at index 3 is 'b'
' Character at index 4 is 'ç'
'
' Result of ParseCombiningCharacters:
' Character 0 starts at index 0
' Character 1 starts at index 3
' Character 2 starts at index 4
備註
.NET 將文字元素定義為以單一字元顯示的文字單位,也就是字素。 文字元素可以是基底字元、替代字元對或組合字元序列。 Unicode 標準將代理對定義為單一抽象字元的編碼字元表示,該字元由兩個編碼單元組成,其中第一個單元為高代理,第二個為低代碼。 Unicode 標準將結合字元序列定義為由一個基礎字元與一個或多個組合字元組成。 替代角色對可以代表基礎角色或組合角色。
這 StringInfo 堂課讓你能將字串作為一系列文本元素,而非單一 Char 物件來處理。
要實 StringInfo 例化代表指定字串的物件,你可以做以下任一:
呼叫 StringInfo(String) 建構子,並將物件要代表的字串 StringInfo 作為參數傳給它。
呼叫預設 StringInfo() 建構子,並將物件要表示的字串 StringInfo 指派到屬性 String 上。
你可以用兩種方式處理字串中的個別文字元素:
透過列舉每個文字元素。 為此,你呼叫該GetTextElementEnumerator方法,然後反覆呼叫回傳TextElementEnumerator物件的方法MoveNext,直到該方法返回
false。透過呼叫 ParseCombiningCharacters 方法來取得包含每個文字元素起始索引的陣列。 接著你可以透過將這些索引傳給 SubstringByTextElements 方法,來檢索個別的文字元素。
以下範例說明了字串中文字元素的兩種處理方式。 它會產生兩串字串:
strCombining,這是一串包含三個具有多個 Char 物件的文字元素的阿拉伯字元串。 第一個文字元素為基本字母 ARABIC LETTER ALEF(U+0627),接著是阿拉伯字母 HAMZA BELOW(U+0655)及阿拉伯字母 KASRA(U+0650)。 第二個文字元素是阿拉伯字母 HEH(U+0647),接著是阿拉伯字母 FATHA(U+064E)。 第三個文字元素是阿拉伯字母 BEH(U+0628),接著是阿拉伯字母 DAMMATAN(U+064C)。strSurrogates,這是一個包含三對替代配對的字串:來自補充多語位面的希臘衛音五天賦(U+10148)、來自補充表意位面的U+20026,以及來自私人使用者區的U+F1001。 每個字元的 UTF-16 編碼是一對替代,由高代替換接低代替換組成。
每個字串先由 ParseCombiningCharacters 方法解析一次,再 GetTextElementEnumerator 由方法解析一次。 兩種方法都能正確解析兩個字串中的文字元素,並顯示解析操作的結果。
using System;
using System.Globalization;
public class Example
{
public static void Main()
{
// The Unicode code points specify Arabic base characters and
// combining character sequences.
string strCombining = "\u0627\u0655\u0650\u064A\u0647\u064E" +
"\u0627\u0628\u064C";
// The Unicode code points specify private surrogate pairs.
string strSurrogates = Char.ConvertFromUtf32(0x10148) +
Char.ConvertFromUtf32(0x20026) + "a" +
Char.ConvertFromUtf32(0xF1001);
EnumerateTextElements(strCombining);
EnumerateTextElements(strSurrogates);
}
public static void EnumerateTextElements(string str)
{
// Get the Enumerator.
TextElementEnumerator teEnum = null;
// Parse the string using the ParseCombiningCharacters method.
Console.WriteLine("\nParsing with ParseCombiningCharacters:");
int[] teIndices = StringInfo.ParseCombiningCharacters(str);
for (int i = 0; i < teIndices.Length; i++) {
if (i < teIndices.Length - 1)
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", i,
teIndices[i], teIndices[i + 1] - 1,
ShowHexValues(str.Substring(teIndices[i], teIndices[i + 1] -
teIndices[i])));
else
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", i,
teIndices[i], str.Length - 1,
ShowHexValues(str.Substring(teIndices[i])));
}
Console.WriteLine();
// Parse the string with the GetTextElementEnumerator method.
Console.WriteLine("Parsing with TextElementEnumerator:");
teEnum = StringInfo.GetTextElementEnumerator(str);
int teCount = - 1;
while (teEnum.MoveNext()) {
// Displays the current element.
// Both GetTextElement() and Current retrieve the current
// text element. The latter returns it as an Object.
teCount++;
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", teCount,
teEnum.ElementIndex, teEnum.ElementIndex +
teEnum.GetTextElement().Length - 1, ShowHexValues((string)(teEnum.Current)));
}
}
private static string ShowHexValues(string s)
{
string hexString = "";
foreach (var ch in s)
hexString += $"{(ushort)ch:X4} ";
return hexString;
}
}
// The example displays the following output:
// Parsing with ParseCombiningCharacters:
// Text Element 0 (0..2)= 0627 0655 0650
// Text Element 1 (3..3)= 064A
// Text Element 2 (4..5)= 0647 064E
// Text Element 3 (6..6)= 0627
// Text Element 4 (7..8)= 0628 064C
//
// Parsing with TextElementEnumerator:
// Text Element 0 (0..2)= 0627 0655 0650
// Text Element 1 (3..3)= 064A
// Text Element 2 (4..5)= 0647 064E
// Text Element 3 (6..6)= 0627
// Text Element 4 (7..8)= 0628 064C
//
// Parsing with ParseCombiningCharacters:
// Text Element 0 (0..1)= D800 DD48
// Text Element 1 (2..3)= D840 DC26
// Text Element 2 (4..4)= 0061
// Text Element 3 (5..6)= DB84 DC01
//
// Parsing with TextElementEnumerator:
// Text Element 0 (0..1)= D800 DD48
// Text Element 1 (2..3)= D840 DC26
// Text Element 2 (4..4)= 0061
// Text Element 3 (5..6)= DB84 DC01
Imports System.Globalization
Public Module Example
Public Sub Main()
' The Unicode code points specify Arabic base characters and
' combining character sequences.
Dim strCombining As String = ChrW(&H627) & ChrW(&h0655) + ChrW(&H650) &
ChrW(&H64A) & ChrW(&H647) & ChrW(&H64E) & ChrW(&H627) &
ChrW(&H628) & ChrW(&H64C)
' The Unicode code points specify private surrogate pairs.
Dim strSurrogates As String = Char.ConvertFromUtf32(&h10148) +
Char.ConvertFromUtf32(&h20026) + "a" +
Char.ConvertFromUtf32(&hF1001)
EnumerateTextElements(strCombining)
EnumerateTextElements(strSurrogates)
End Sub
Public Sub EnumerateTextElements(str As String)
' Get the Enumerator.
Dim teEnum As TextElementEnumerator = Nothing
' Parse the string using the ParseCombiningCharacters method.
Console.WriteLine()
Console.WriteLine("Parsing with ParseCombiningCharacters:")
Dim teIndices As Integer() = StringInfo.ParseCombiningCharacters(str)
For i As Integer = 0 To teIndices.Length - 1
If i < teIndices.Length - 1 Then
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", i,
TEIndices(i), TEIndices((i + 1)) - 1,
ShowHexValues(str.Substring(TEIndices(i), TEIndices((i + 1)) -
teIndices(i))))
Else
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", i,
teIndices(i), str.Length - 1,
ShowHexValues(str.Substring(teIndices(i))))
End If
Next
Console.WriteLine()
' Parse the string with the GetTextElementEnumerator method.
Console.WriteLine("Parsing with TextElementEnumerator:")
teEnum = StringInfo.GetTextElementEnumerator(str)
Dim TECount As Integer = - 1
While teEnum.MoveNext()
' Prints the current element.
' Both GetTextElement() and Current retrieve the current
' text element. The latter returns it as an Object.
TECount += 1
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", teCount,
teEnum.ElementIndex, teEnum.ElementIndex +
teEnum.GetTextElement().Length - 1, ShowHexValues(CStr(teEnum.Current)))
End While
End Sub
Private Function ShowHexValues(s As String) As String
Dim hexString As String = ""
For Each ch In s
hexString += String.Format("{0:X4} ", Convert.ToUInt16(ch))
Next
Return hexString
End Function
End Module
' The example displays the following output:
' Parsing with ParseCombiningCharacters:
' Text Element 0 (0..2)= 0627 0655 0650
' Text Element 1 (3..3)= 064A
' Text Element 2 (4..5)= 0647 064E
' Text Element 3 (6..6)= 0627
' Text Element 4 (7..8)= 0628 064C
'
' Parsing with TextElementEnumerator:
' Text Element 0 (0..2)= 0627 0655 0650
' Text Element 1 (3..3)= 064A
' Text Element 2 (4..5)= 0647 064E
' Text Element 3 (6..6)= 0627
' Text Element 4 (7..8)= 0628 064C
'
' Parsing with ParseCombiningCharacters:
' Text Element 0 (0..1)= D800 DD48
' Text Element 1 (2..3)= D840 DC26
' Text Element 2 (4..4)= 0061
' Text Element 3 (5..6)= DB84 DC01
'
' Parsing with TextElementEnumerator:
' Text Element 0 (0..1)= D800 DD48
' Text Element 1 (2..3)= D840 DC26
' Text Element 2 (4..4)= 0061
' Text Element 3 (5..6)= DB84 DC01
給呼叫者的注意事項
在內部,類別 StringInfo 的方法呼叫類別 CharUnicodeInfo 的方法來決定字元類別。 從 .NET Framework 4.6.2 開始,字元分類基於 Unicode 標準 8.0.0 版本。 從 .NET Framework 4 到 .NET Framework 4.6.1,則基於 Unicode 標準 6.3.0 版本。 在 .NET Core 中,它是基於 Unicode 標準 8.0.0 版本。
建構函式
| 名稱 | Description |
|---|---|
| StringInfo() |
初始化 StringInfo 類別的新執行個體。 |
| StringInfo(String) |
將該類別的新實例 StringInfo 初始化為指定的字串。 |
屬性
| 名稱 | Description |
|---|---|
| LengthInTextElements |
取得目前 StringInfo 物件中文字元素的數量。 |
| String |
取得或設定當前 StringInfo 物件的值。 |
方法
| 名稱 | Description |
|---|---|
| Equals(Object) |
表示當前 StringInfo 物件是否等於指定物件。 |
| Equals(Object) |
判斷指定的物件是否等於目前的物件。 (繼承來源 Object) |
| GetHashCode() |
計算當前 StringInfo 物件值的雜湊碼。 |
| GetHashCode() |
做為預設哈希函式。 (繼承來源 Object) |
| GetNextTextElement(String, Int32) |
取得指定字串索引的文字元素。 |
| GetNextTextElement(String) |
取得指定字串中的第一個文字元素。 |
| GetNextTextElementLength(ReadOnlySpan<Char>) |
回傳輸入區間中第一個文字元素(擴展字素叢集)的長度。 |
| GetNextTextElementLength(String, Int32) |
回傳輸入字串中從指定索引開始的第一個文字元素(擴展字素叢集)的長度。 |
| GetNextTextElementLength(String) |
回傳輸入字串中第一個文字元素(擴展字素叢集)的長度。 |
| GetTextElementEnumerator(String, Int32) |
回傳一個枚舉器,從指定的索引開始遍歷字串的文字元素。 |
| GetTextElementEnumerator(String) |
回傳一個枚舉子,遍歷整個字串的文字元素。 |
| GetType() |
取得目前實例的 Type。 (繼承來源 Object) |
| MemberwiseClone() |
建立目前 Object的淺層複本。 (繼承來源 Object) |
| ParseCombiningCharacters(String) |
回傳指定字串中每個基底字元、高階代理字元或控制字元的索引。 |
| SubstringByTextElements(Int32, Int32) |
從當前 StringInfo 物件中擷取一個文字元素子串,從指定的文字元素開始,並依規定數量的文字元素繼續搜尋。 |
| SubstringByTextElements(Int32) |
從當前 StringInfo 物件中擷取一個文字元素子串,從指定的文字元素開始,一直到最後一個文字元素。 |
| ToString() |
傳回表示目前 物件的字串。 (繼承來源 Object) |