StringInfo 类
定义
重要
一些信息与预发行产品相关,相应产品在发行之前可能会进行重大修改。 对于此处提供的信息,Microsoft 不作任何明示或暗示的担保。
提供将字符串拆分为文本元素并循环访问这些文本元素的功能。
public ref class StringInfo
public class StringInfo
[System.Serializable]
public class StringInfo
[System.Serializable]
[System.Runtime.InteropServices.ComVisible(true)]
public class StringInfo
type StringInfo = class
[<System.Serializable>]
type StringInfo = class
[<System.Serializable>]
[<System.Runtime.InteropServices.ComVisible(true)>]
type StringInfo = class
Public Class StringInfo
- 继承
-
StringInfo
- 属性
示例
此示例使用GetTextElementEnumerator类的StringInfo和ParseCombiningCharacters方法操作包含代理项和组合字符的字符串。
using System;
using System.Text;
using System.Globalization;
public sealed class App {
static void Main() {
// The string below contains combining characters.
String s = "a\u0304\u0308bc\u0327";
// Show each 'character' in the string.
EnumTextElements(s);
// Show the index in the string where each 'character' starts.
EnumTextElementIndexes(s);
}
// Show how to enumerate each real character (honoring surrogates) in a string.
static void EnumTextElements(String s) {
// This StringBuilder holds the output results.
StringBuilder sb = new StringBuilder();
// Use the enumerator returned from GetTextElementEnumerator
// method to examine each real character.
TextElementEnumerator charEnum = StringInfo.GetTextElementEnumerator(s);
while (charEnum.MoveNext()) {
sb.AppendFormat(
"Character at index {0} is '{1}'{2}",
charEnum.ElementIndex, charEnum.GetTextElement(),
Environment.NewLine);
}
// Show the results.
Console.WriteLine("Result of GetTextElementEnumerator:");
Console.WriteLine(sb);
}
// Show how to discover the index of each real character (honoring surrogates) in a string.
static void EnumTextElementIndexes(String s) {
// This StringBuilder holds the output results.
StringBuilder sb = new StringBuilder();
// Use the ParseCombiningCharacters method to
// get the index of each real character in the string.
Int32[] textElemIndex = StringInfo.ParseCombiningCharacters(s);
// Iterate through each real character showing the character and the index where it was found.
for (Int32 i = 0; i < textElemIndex.Length; i++) {
sb.AppendFormat(
"Character {0} starts at index {1}{2}",
i, textElemIndex[i], Environment.NewLine);
}
// Show the results.
Console.WriteLine("Result of ParseCombiningCharacters:");
Console.WriteLine(sb);
}
}
// This code produces the following output:
//
// Result of GetTextElementEnumerator:
// Character at index 0 is 'ā̈'
// Character at index 3 is 'b'
// Character at index 4 is 'ç'
//
// Result of ParseCombiningCharacters:
// Character 0 starts at index 0
// Character 1 starts at index 3
// Character 2 starts at index 4
Imports System.Text
Imports System.Globalization
Public Module Example
Public Sub Main()
' The string below contains combining characters.
Dim s As String = "a" + ChrW(&h0304) + ChrW(&h0308) + "bc" + ChrW(&h0327)
' Show each 'character' in the string.
EnumTextElements(s)
' Show the index in the string where each 'character' starts.
EnumTextElementIndexes(s)
End Sub
' Show how to enumerate each real character (honoring surrogates) in a string.
Sub EnumTextElements(s As String)
' This StringBuilder holds the output results.
Dim sb As New StringBuilder()
' Use the enumerator returned from GetTextElementEnumerator
' method to examine each real character.
Dim charEnum As TextElementEnumerator = StringInfo.GetTextElementEnumerator(s)
Do While charEnum.MoveNext()
sb.AppendFormat("Character at index {0} is '{1}'{2}",
charEnum.ElementIndex,
charEnum.GetTextElement(),
Environment.NewLine)
Loop
' Show the results.
Console.WriteLine("Result of GetTextElementEnumerator:")
Console.WriteLine(sb)
End Sub
' Show how to discover the index of each real character (honoring surrogates) in a string.
Sub EnumTextElementIndexes(s As String)
' This StringBuilder holds the output results.
Dim sb As New StringBuilder()
' Use the ParseCombiningCharacters method to
' get the index of each real character in the string.
Dim textElemIndex() As Integer = StringInfo.ParseCombiningCharacters(s)
' Iterate through each real character showing the character and the index where it was found.
For i As Int32 = 0 To textElemIndex.Length - 1
sb.AppendFormat("Character {0} starts at index {1}{2}",
i, textElemIndex(i), Environment.NewLine)
Next
' Show the results.
Console.WriteLine("Result of ParseCombiningCharacters:")
Console.WriteLine(sb)
End Sub
End Module
' The example displays the following output:
'
' Result of GetTextElementEnumerator:
' Character at index 0 is 'ā̈'
' Character at index 3 is 'b'
' Character at index 4 is 'ç'
'
' Result of ParseCombiningCharacters:
' Character 0 starts at index 0
' Character 1 starts at index 3
' Character 2 starts at index 4
注解
.NET 将文本元素定义为显示为单个字符(即图形体)的文本单元。 文本元素可以是基字符、代理项对或组合字符序列。 Unicode 标准将代理项对定义为由两个代码单元组成的单个抽象字符的编码字符表示形式,其中第一个代理项是高代理项,第二个是低代理项。 Unicode 标准将组合字符序列定义为基字符和一个或多个组合字符的组合。 代理项对可以表示基字符或组合字符。
类 StringInfo 使你可以将字符串用作一系列文本元素,而不是单个 Char 对象。
若要实例化 StringInfo 表示指定字符串的对象,可以执行下列操作之一:
StringInfo(String)调用构造函数并向其传递对象StringInfo要表示为参数的字符串。
调用默认 StringInfo() 构造函数,并分配对象 StringInfo 要表示给属性的 String 字符串。
可以通过两种方式在字符串中使用单个文本元素:
通过枚举每个文本元素。 为此,请调用该方法GetTextElementEnumerator,然后在返回TextElementEnumerator的对象上重复调用MoveNext该方法,直到方法返回
false。通过调用 ParseCombiningCharacters 该方法来检索包含每个文本元素起始索引的数组。 然后,可以通过将这些索引 SubstringByTextElements 传递给方法来检索各个文本元素。
下面的示例演示了使用字符串中的文本元素的两种方法。 它创建两个字符串:
strCombining,它是一个包含三个具有多个 Char 对象的文本元素的阿拉伯字符字符串。 第一个文本元素是基字符阿拉伯文字母 ALEF (U+0627),后跟下方的阿拉伯文 HAMZA (U+0655) 和阿拉伯文 KASRA (U+0650)。 第二个文本元素是阿拉伯文字母 HEH(U+0647),后跟阿拉伯 FATHA(U+064E)。 第三个文本元素是阿拉伯文字母 BEH(U+0628),后跟阿拉伯文 DAMMATAN (U+064C)。strSurrogates,这是一个字符串,包括三个代理项对:希腊杂技五人才(U+10148)从补充多语言平面,U+20026 从补充象形图平面,以及来自私人用户区域的 U+F1001。 每个字符的 UTF-16 编码是一个代理项对,由高代理项后跟低代理项组成。
每一个字符串由 ParseCombiningCharacters 方法分析一次,然后由 GetTextElementEnumerator 该方法分析。 这两种方法正确分析两个字符串中的文本元素,并显示分析操作的结果。
using System;
using System.Globalization;
public class Example
{
public static void Main()
{
// The Unicode code points specify Arabic base characters and
// combining character sequences.
string strCombining = "\u0627\u0655\u0650\u064A\u0647\u064E" +
"\u0627\u0628\u064C";
// The Unicode code points specify private surrogate pairs.
string strSurrogates = Char.ConvertFromUtf32(0x10148) +
Char.ConvertFromUtf32(0x20026) + "a" +
Char.ConvertFromUtf32(0xF1001);
EnumerateTextElements(strCombining);
EnumerateTextElements(strSurrogates);
}
public static void EnumerateTextElements(string str)
{
// Get the Enumerator.
TextElementEnumerator teEnum = null;
// Parse the string using the ParseCombiningCharacters method.
Console.WriteLine("\nParsing with ParseCombiningCharacters:");
int[] teIndices = StringInfo.ParseCombiningCharacters(str);
for (int i = 0; i < teIndices.Length; i++) {
if (i < teIndices.Length - 1)
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", i,
teIndices[i], teIndices[i + 1] - 1,
ShowHexValues(str.Substring(teIndices[i], teIndices[i + 1] -
teIndices[i])));
else
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", i,
teIndices[i], str.Length - 1,
ShowHexValues(str.Substring(teIndices[i])));
}
Console.WriteLine();
// Parse the string with the GetTextElementEnumerator method.
Console.WriteLine("Parsing with TextElementEnumerator:");
teEnum = StringInfo.GetTextElementEnumerator(str);
int teCount = - 1;
while (teEnum.MoveNext()) {
// Displays the current element.
// Both GetTextElement() and Current retrieve the current
// text element. The latter returns it as an Object.
teCount++;
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", teCount,
teEnum.ElementIndex, teEnum.ElementIndex +
teEnum.GetTextElement().Length - 1, ShowHexValues((string)(teEnum.Current)));
}
}
private static string ShowHexValues(string s)
{
string hexString = "";
foreach (var ch in s)
hexString += $"{(ushort)ch:X4} ";
return hexString;
}
}
// The example displays the following output:
// Parsing with ParseCombiningCharacters:
// Text Element 0 (0..2)= 0627 0655 0650
// Text Element 1 (3..3)= 064A
// Text Element 2 (4..5)= 0647 064E
// Text Element 3 (6..6)= 0627
// Text Element 4 (7..8)= 0628 064C
//
// Parsing with TextElementEnumerator:
// Text Element 0 (0..2)= 0627 0655 0650
// Text Element 1 (3..3)= 064A
// Text Element 2 (4..5)= 0647 064E
// Text Element 3 (6..6)= 0627
// Text Element 4 (7..8)= 0628 064C
//
// Parsing with ParseCombiningCharacters:
// Text Element 0 (0..1)= D800 DD48
// Text Element 1 (2..3)= D840 DC26
// Text Element 2 (4..4)= 0061
// Text Element 3 (5..6)= DB84 DC01
//
// Parsing with TextElementEnumerator:
// Text Element 0 (0..1)= D800 DD48
// Text Element 1 (2..3)= D840 DC26
// Text Element 2 (4..4)= 0061
// Text Element 3 (5..6)= DB84 DC01
Imports System.Globalization
Public Module Example
Public Sub Main()
' The Unicode code points specify Arabic base characters and
' combining character sequences.
Dim strCombining As String = ChrW(&H627) & ChrW(&h0655) + ChrW(&H650) &
ChrW(&H64A) & ChrW(&H647) & ChrW(&H64E) & ChrW(&H627) &
ChrW(&H628) & ChrW(&H64C)
' The Unicode code points specify private surrogate pairs.
Dim strSurrogates As String = Char.ConvertFromUtf32(&h10148) +
Char.ConvertFromUtf32(&h20026) + "a" +
Char.ConvertFromUtf32(&hF1001)
EnumerateTextElements(strCombining)
EnumerateTextElements(strSurrogates)
End Sub
Public Sub EnumerateTextElements(str As String)
' Get the Enumerator.
Dim teEnum As TextElementEnumerator = Nothing
' Parse the string using the ParseCombiningCharacters method.
Console.WriteLine()
Console.WriteLine("Parsing with ParseCombiningCharacters:")
Dim teIndices As Integer() = StringInfo.ParseCombiningCharacters(str)
For i As Integer = 0 To teIndices.Length - 1
If i < teIndices.Length - 1 Then
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", i,
TEIndices(i), TEIndices((i + 1)) - 1,
ShowHexValues(str.Substring(TEIndices(i), TEIndices((i + 1)) -
teIndices(i))))
Else
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", i,
teIndices(i), str.Length - 1,
ShowHexValues(str.Substring(teIndices(i))))
End If
Next
Console.WriteLine()
' Parse the string with the GetTextElementEnumerator method.
Console.WriteLine("Parsing with TextElementEnumerator:")
teEnum = StringInfo.GetTextElementEnumerator(str)
Dim TECount As Integer = - 1
While teEnum.MoveNext()
' Prints the current element.
' Both GetTextElement() and Current retrieve the current
' text element. The latter returns it as an Object.
TECount += 1
Console.WriteLine("Text Element {0} ({1}..{2})= {3}", teCount,
teEnum.ElementIndex, teEnum.ElementIndex +
teEnum.GetTextElement().Length - 1, ShowHexValues(CStr(teEnum.Current)))
End While
End Sub
Private Function ShowHexValues(s As String) As String
Dim hexString As String = ""
For Each ch In s
hexString += String.Format("{0:X4} ", Convert.ToUInt16(ch))
Next
Return hexString
End Function
End Module
' The example displays the following output:
' Parsing with ParseCombiningCharacters:
' Text Element 0 (0..2)= 0627 0655 0650
' Text Element 1 (3..3)= 064A
' Text Element 2 (4..5)= 0647 064E
' Text Element 3 (6..6)= 0627
' Text Element 4 (7..8)= 0628 064C
'
' Parsing with TextElementEnumerator:
' Text Element 0 (0..2)= 0627 0655 0650
' Text Element 1 (3..3)= 064A
' Text Element 2 (4..5)= 0647 064E
' Text Element 3 (6..6)= 0627
' Text Element 4 (7..8)= 0628 064C
'
' Parsing with ParseCombiningCharacters:
' Text Element 0 (0..1)= D800 DD48
' Text Element 1 (2..3)= D840 DC26
' Text Element 2 (4..4)= 0061
' Text Element 3 (5..6)= DB84 DC01
'
' Parsing with TextElementEnumerator:
' Text Element 0 (0..1)= D800 DD48
' Text Element 1 (2..3)= D840 DC26
' Text Element 2 (4..4)= 0061
' Text Element 3 (5..6)= DB84 DC01
调用方说明
在内部,类的方法 StringInfo 调用类的方法 CharUnicodeInfo 来确定字符类别。 从 .NET Framework 4.6.2 开始,字符分类基于 Unicode 标准版本 8.0.0。 对于 .NET Framework 4 到 .NET Framework 4.6.1,它基于 Unicode 标准版本 6.3.0。 在 .NET Core 中,它基于 Unicode 标准版本 8.0.0。
构造函数
| 名称 | 说明 |
|---|---|
| StringInfo() |
初始化 StringInfo 类的新实例。 |
| StringInfo(String) |
将类的新实例 StringInfo 初始化为指定的字符串。 |
属性
| 名称 | 说明 |
|---|---|
| LengthInTextElements |
获取当前 StringInfo 对象中的文本元素数。 |
| String |
获取或设置当前 StringInfo 对象的值。 |
方法
| 名称 | 说明 |
|---|---|
| Equals(Object) |
指示当前 StringInfo 对象是否等于指定对象。 |
| Equals(Object) |
确定指定的对象是否等于当前对象。 (继承自 Object) |
| GetHashCode() |
计算当前 StringInfo 对象的值的哈希代码。 |
| GetHashCode() |
用作默认哈希函数。 (继承自 Object) |
| GetNextTextElement(String, Int32) |
获取指定字符串的指定索引处的文本元素。 |
| GetNextTextElement(String) |
获取指定字符串中的第一个文本元素。 |
| GetNextTextElementLength(ReadOnlySpan<Char>) |
返回输入范围中发生的第一个文本元素(扩展图形群集)的长度。 |
| GetNextTextElementLength(String, Int32) |
返回从指定索引开始的输入字符串中发生的第一个文本元素(扩展图形群集)的长度。 |
| GetNextTextElementLength(String) |
返回输入字符串中发生的第一个文本元素(扩展 grapheme 群集)的长度。 |
| GetTextElementEnumerator(String, Int32) |
返回从指定索引开始循环访问字符串的文本元素的枚举数。 |
| GetTextElementEnumerator(String) |
返回一个枚举器,该枚举器循环访问整个字符串的文本元素。 |
| GetType() |
获取当前实例的 Type。 (继承自 Object) |
| MemberwiseClone() |
创建当前 Object的浅表副本。 (继承自 Object) |
| ParseCombiningCharacters(String) |
返回指定字符串中每个基字符、高代理项或控制字符的索引。 |
| SubstringByTextElements(Int32, Int32) |
从指定的文本元素开始,从当前 StringInfo 对象中检索文本元素的子字符串,然后继续执行指定的文本元素数。 |
| SubstringByTextElements(Int32) |
从指定的文本元素开始,从当前 StringInfo 对象中检索文本元素的子字符串,并继续执行最后一个文本元素。 |
| ToString() |
返回一个表示当前对象的字符串。 (继承自 Object) |