StringInfo.ParseCombiningCharacters(String) 方法
定义
重要
一些信息与预发行产品相关,相应产品在发行之前可能会进行重大修改。 对于此处提供的信息,Microsoft 不作任何明示或暗示的担保。
返回指定字符串中每个基字符、高代理项或控制字符的索引。
public:
static cli::array <int> ^ ParseCombiningCharacters(System::String ^ str);
public static int[] ParseCombiningCharacters(string str);
static member ParseCombiningCharacters : string -> int[]
Public Shared Function ParseCombiningCharacters (str As String) As Integer()
参数
- str
- String
要搜索的字符串。
返回
一个整数数组,其中包含指定字符串中每个基字符、高代理项或控制字符的从零开始的索引。
例外
str 是 null。
示例
以下示例演示如何调用该方法 ParseCombiningCharacters 。 该代码示例是 StringInfo 类中的一个较大示例的一部分。
using System;
using System.Text;
using System.Globalization;
public sealed class App {
static void Main() {
// The string below contains combining characters.
String s = "a\u0304\u0308bc\u0327";
// Show each 'character' in the string.
EnumTextElements(s);
// Show the index in the string where each 'character' starts.
EnumTextElementIndexes(s);
}
// Show how to enumerate each real character (honoring surrogates) in a string.
static void EnumTextElements(String s) {
// This StringBuilder holds the output results.
StringBuilder sb = new StringBuilder();
// Use the enumerator returned from GetTextElementEnumerator
// method to examine each real character.
TextElementEnumerator charEnum = StringInfo.GetTextElementEnumerator(s);
while (charEnum.MoveNext()) {
sb.AppendFormat(
"Character at index {0} is '{1}'{2}",
charEnum.ElementIndex, charEnum.GetTextElement(),
Environment.NewLine);
}
// Show the results.
Console.WriteLine("Result of GetTextElementEnumerator:");
Console.WriteLine(sb);
}
// Show how to discover the index of each real character (honoring surrogates) in a string.
static void EnumTextElementIndexes(String s) {
// This StringBuilder holds the output results.
StringBuilder sb = new StringBuilder();
// Use the ParseCombiningCharacters method to
// get the index of each real character in the string.
Int32[] textElemIndex = StringInfo.ParseCombiningCharacters(s);
// Iterate through each real character showing the character and the index where it was found.
for (Int32 i = 0; i < textElemIndex.Length; i++) {
sb.AppendFormat(
"Character {0} starts at index {1}{2}",
i, textElemIndex[i], Environment.NewLine);
}
// Show the results.
Console.WriteLine("Result of ParseCombiningCharacters:");
Console.WriteLine(sb);
}
}
// This code produces the following output:
//
// Result of GetTextElementEnumerator:
// Character at index 0 is 'ā̈'
// Character at index 3 is 'b'
// Character at index 4 is 'ç'
//
// Result of ParseCombiningCharacters:
// Character 0 starts at index 0
// Character 1 starts at index 3
// Character 2 starts at index 4
Imports System.Text
Imports System.Globalization
Public Module Example
Public Sub Main()
' The string below contains combining characters.
Dim s As String = "a" + ChrW(&h0304) + ChrW(&h0308) + "bc" + ChrW(&h0327)
' Show each 'character' in the string.
EnumTextElements(s)
' Show the index in the string where each 'character' starts.
EnumTextElementIndexes(s)
End Sub
' Show how to enumerate each real character (honoring surrogates) in a string.
Sub EnumTextElements(s As String)
' This StringBuilder holds the output results.
Dim sb As New StringBuilder()
' Use the enumerator returned from GetTextElementEnumerator
' method to examine each real character.
Dim charEnum As TextElementEnumerator = StringInfo.GetTextElementEnumerator(s)
Do While charEnum.MoveNext()
sb.AppendFormat("Character at index {0} is '{1}'{2}",
charEnum.ElementIndex,
charEnum.GetTextElement(),
Environment.NewLine)
Loop
' Show the results.
Console.WriteLine("Result of GetTextElementEnumerator:")
Console.WriteLine(sb)
End Sub
' Show how to discover the index of each real character (honoring surrogates) in a string.
Sub EnumTextElementIndexes(s As String)
' This StringBuilder holds the output results.
Dim sb As New StringBuilder()
' Use the ParseCombiningCharacters method to
' get the index of each real character in the string.
Dim textElemIndex() As Integer = StringInfo.ParseCombiningCharacters(s)
' Iterate through each real character showing the character and the index where it was found.
For i As Int32 = 0 To textElemIndex.Length - 1
sb.AppendFormat("Character {0} starts at index {1}{2}",
i, textElemIndex(i), Environment.NewLine)
Next
' Show the results.
Console.WriteLine("Result of ParseCombiningCharacters:")
Console.WriteLine(sb)
End Sub
End Module
' The example displays the following output:
'
' Result of GetTextElementEnumerator:
' Character at index 0 is 'ā̈'
' Character at index 3 is 'b'
' Character at index 4 is 'ç'
'
' Result of ParseCombiningCharacters:
' Character 0 starts at index 0
' Character 1 starts at index 3
' Character 2 starts at index 4
注解
Unicode 标准将代理项对定义为由两个代码单元组成的单个抽象字符的编码字符表示形式,其中第一个代理项是高代理项,第二个是低代理项。 高代理项是 U+D800 到 U+DBFF 范围内的 Unicode 码位,低代理项是 U+DC00 到 U+DFFF 范围内的 Unicode 码位。
控制字符是 Unicode 值为 U+007F 或 U+0000 到 U+001F 的范围或 U+0080 到 U+009F 的字符。
.NET 将文本元素定义为显示为单个字符(即图形体)的文本单元。 文本元素可以是基字符、代理项对或组合字符序列。 Unicode 标准将组合字符序列定义为基字符和一个或多个组合字符的组合。 代理项对可以表示基字符或组合字符。
如果组合字符序列无效,也会返回该序列中的每个组合字符。
生成的数组中的每个索引都是文本元素的开头,即基字符或高代理项的索引。
每个元素的长度很容易计算为连续索引之间的差异。 数组的长度始终小于或等于字符串的长度。 例如,鉴于字符串“\u4f00\u302a\ud800\udc00\u4f01”,此方法返回索引 0、2 和 4。
等效成员
从 .NET Framework 版本 2.0 开始, SubstringByTextElements 方法和 LengthInTextElements 属性提供该方法提供的 ParseCombiningCharacters 功能的易于使用实现。