String.Normalize 方法
定义
重要
一些信息与预发行产品相关,相应产品在发行之前可能会进行重大修改。 对于此处提供的信息,Microsoft 不作任何明示或暗示的担保。
返回一个新字符串,其二进制表示形式符合特定的 Unicode 范式。
重载
Normalize() |
返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合 Unicode 范式 C。 |
Normalize(NormalizationForm) |
返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合指定的 Unicode 范式。 |
示例
以下示例将字符串规范化为四种规范化形式中的每一种,确认字符串已规范化为指定的规范化形式,然后列出规范化字符串中的代码点。
using namespace System;
using namespace System::Text;
void Show( String^ title, String^ s )
{
Console::Write( "Characters in string {0} = ", title );
for each (short x in s) {
Console::Write("{0:X4} ", x);
}
Console::WriteLine();
}
int main()
{
// Character c; combining characters acute and cedilla; character 3/4
array<Char>^temp0 = {L'c',L'\u0301',L'\u0327',L'\u00BE'};
String^ s1 = gcnew String( temp0 );
String^ s2 = nullptr;
String^ divider = gcnew String( '-',80 );
divider = String::Concat( Environment::NewLine, divider, Environment::NewLine );
Show( "s1", s1 );
Console::WriteLine();
Console::WriteLine( "U+0063 = LATIN SMALL LETTER C" );
Console::WriteLine( "U+0301 = COMBINING ACUTE ACCENT" );
Console::WriteLine( "U+0327 = COMBINING CEDILLA" );
Console::WriteLine( "U+00BE = VULGAR FRACTION THREE QUARTERS" );
Console::WriteLine( divider );
Console::WriteLine( "A1) Is s1 normalized to the default form (Form C)?: {0}", s1->IsNormalized() );
Console::WriteLine( "A2) Is s1 normalized to Form C?: {0}", s1->IsNormalized( NormalizationForm::FormC ) );
Console::WriteLine( "A3) Is s1 normalized to Form D?: {0}", s1->IsNormalized( NormalizationForm::FormD ) );
Console::WriteLine( "A4) Is s1 normalized to Form KC?: {0}", s1->IsNormalized( NormalizationForm::FormKC ) );
Console::WriteLine( "A5) Is s1 normalized to Form KD?: {0}", s1->IsNormalized( NormalizationForm::FormKD ) );
Console::WriteLine( divider );
Console::WriteLine( "Set string s2 to each normalized form of string s1." );
Console::WriteLine();
Console::WriteLine( "U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE" );
Console::WriteLine( "U+0033 = DIGIT THREE" );
Console::WriteLine( "U+2044 = FRACTION SLASH" );
Console::WriteLine( "U+0034 = DIGIT FOUR" );
Console::WriteLine( divider );
s2 = s1->Normalize();
Console::Write( "B1) Is s2 normalized to the default form (Form C)?: " );
Console::WriteLine( s2->IsNormalized() );
Show( "s2", s2 );
Console::WriteLine();
s2 = s1->Normalize( NormalizationForm::FormC );
Console::Write( "B2) Is s2 normalized to Form C?: " );
Console::WriteLine( s2->IsNormalized( NormalizationForm::FormC ) );
Show( "s2", s2 );
Console::WriteLine();
s2 = s1->Normalize( NormalizationForm::FormD );
Console::Write( "B3) Is s2 normalized to Form D?: " );
Console::WriteLine( s2->IsNormalized( NormalizationForm::FormD ) );
Show( "s2", s2 );
Console::WriteLine();
s2 = s1->Normalize( NormalizationForm::FormKC );
Console::Write( "B4) Is s2 normalized to Form KC?: " );
Console::WriteLine( s2->IsNormalized( NormalizationForm::FormKC ) );
Show( "s2", s2 );
Console::WriteLine();
s2 = s1->Normalize( NormalizationForm::FormKD );
Console::Write( "B5) Is s2 normalized to Form KD?: " );
Console::WriteLine( s2->IsNormalized( NormalizationForm::FormKD ) );
Show( "s2", s2 );
Console::WriteLine();
}
/*
This example produces the following results:
Characters in string s1 = 0063 0301 0327 00BE
U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS
--------------------------------------------------------------------------------
A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?: False
A3) Is s1 normalized to Form D?: False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False
--------------------------------------------------------------------------------
Set string s2 to each normalized form of string s1.
U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR
--------------------------------------------------------------------------------
B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE
B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE
B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE
B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034
B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034
*/
using System;
using System.Text;
class Example
{
public static void Main()
{
// Character c; combining characters acute and cedilla; character 3/4
string s1 = new String( new char[] {'\u0063', '\u0301', '\u0327', '\u00BE'});
string s2 = null;
string divider = new String('-', 80);
divider = String.Concat(Environment.NewLine, divider, Environment.NewLine);
Show("s1", s1);
Console.WriteLine();
Console.WriteLine("U+0063 = LATIN SMALL LETTER C");
Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT");
Console.WriteLine("U+0327 = COMBINING CEDILLA");
Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS");
Console.WriteLine(divider);
Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}",
s1.IsNormalized());
Console.WriteLine("A2) Is s1 normalized to Form C?: {0}",
s1.IsNormalized(NormalizationForm.FormC));
Console.WriteLine("A3) Is s1 normalized to Form D?: {0}",
s1.IsNormalized(NormalizationForm.FormD));
Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}",
s1.IsNormalized(NormalizationForm.FormKC));
Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}",
s1.IsNormalized(NormalizationForm.FormKD));
Console.WriteLine(divider);
Console.WriteLine("Set string s2 to each normalized form of string s1.");
Console.WriteLine();
Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE");
Console.WriteLine("U+0033 = DIGIT THREE");
Console.WriteLine("U+2044 = FRACTION SLASH");
Console.WriteLine("U+0034 = DIGIT FOUR");
Console.WriteLine(divider);
s2 = s1.Normalize();
Console.Write("B1) Is s2 normalized to the default form (Form C)?: ");
Console.WriteLine(s2.IsNormalized());
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormC);
Console.Write("B2) Is s2 normalized to Form C?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC));
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormD);
Console.Write("B3) Is s2 normalized to Form D?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD));
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormKC);
Console.Write("B4) Is s2 normalized to Form KC?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC));
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormKD);
Console.Write("B5) Is s2 normalized to Form KD?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD));
Show("s2", s2);
Console.WriteLine();
}
private static void Show(string title, string s)
{
Console.Write("Characters in string {0} = ", title);
foreach(short x in s) {
Console.Write("{0:X4} ", x);
}
Console.WriteLine();
}
}
/*
This example produces the following results:
Characters in string s1 = 0063 0301 0327 00BE
U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS
--------------------------------------------------------------------------------
A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?: False
A3) Is s1 normalized to Form D?: False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False
--------------------------------------------------------------------------------
Set string s2 to each normalized form of string s1.
U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR
--------------------------------------------------------------------------------
B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE
B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE
B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE
B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034
B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034
*/
open System
open System.Text
let show title (s: string) =
printf $"Characters in string %s{title} = "
for x in s do
printf $"{int16 x:X4} "
printfn ""
[<EntryPoint>]
let main _ =
// Character c; combining characters acute and cedilla; character 3/4
let s1 = String [| '\u0063'; '\u0301'; '\u0327'; '\u00BE' |]
let divider = String('-', 80)
let divider = String.Concat(Environment.NewLine, divider, Environment.NewLine)
show "s1" s1
printfn "\nU+0063 = LATIN SMALL LETTER C"
printfn "U+0301 = COMBINING ACUTE ACCENT"
printfn "U+0327 = COMBINING CEDILLA"
printfn "U+00BE = VULGAR FRACTION THREE QUARTERS"
printfn $"{divider}"
printfn $"A1) Is s1 normalized to the default form (Form C)?: {s1.IsNormalized()}"
printfn $"A2) Is s1 normalized to Form C?: {s1.IsNormalized NormalizationForm.FormC}"
printfn $"A3) Is s1 normalized to Form D?: {s1.IsNormalized NormalizationForm.FormD}"
printfn $"A4) Is s1 normalized to Form KC?: {s1.IsNormalized NormalizationForm.FormKC}"
printfn $"A5) Is s1 normalized to Form KD?: {s1.IsNormalized NormalizationForm.FormKD}"
printfn $"{divider}"
printfn "Set string s2 to each normalized form of string s1.\n"
printfn "U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE"
printfn"U+0033 = DIGIT THREE"
printfn"U+2044 = FRACTION SLASH"
printfn"U+0034 = DIGIT FOUR"
printfn $"{divider}"
let s2 = s1.Normalize()
printf "B1) Is s2 normalized to the default form (Form C)?: "
printfn $"{s2.IsNormalized()}"
show "s2" s2
printfn ""
let s2 = s1.Normalize NormalizationForm.FormC
printf "B2) Is s2 normalized to Form C?: "
printfn $"{s2.IsNormalized NormalizationForm.FormC}"
show "s2" s2
printfn ""
let s2 = s1.Normalize NormalizationForm.FormD
printf "B3) Is s2 normalized to Form D?: "
printfn $"{s2.IsNormalized NormalizationForm.FormD}"
show "s2" s2
printfn ""
let s2 = s1.Normalize(NormalizationForm.FormKC)
printf "B4) Is s2 normalized to Form KC?: "
printfn $"{s2.IsNormalized NormalizationForm.FormKC}"
show "s2" s2
printfn ""
let s2 = s1.Normalize(NormalizationForm.FormKD)
printf "B5) Is s2 normalized to Form KD?: "
printfn $"{s2.IsNormalized NormalizationForm.FormKD}"
show "s2" s2
0
(*
This example produces the following results:
Characters in string s1 = 0063 0301 0327 00BE
U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS
--------------------------------------------------------------------------------
A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?: False
A3) Is s1 normalized to Form D?: False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False
--------------------------------------------------------------------------------
Set string s2 to each normalized form of string s1.
U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR
--------------------------------------------------------------------------------
B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE
B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE
B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE
B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034
B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034
*)
Imports System.Text
Class Example
Public Shared Sub Main()
' Character c; combining characters acute and cedilla; character 3/4
Dim s1 = New [String](New Char() {ChrW(&H0063), ChrW(&H0301), ChrW(&H0327), ChrW(&H00BE)})
Dim s2 As String = Nothing
Dim divider = New [String]("-"c, 80)
divider = [String].Concat(Environment.NewLine, divider, Environment.NewLine)
Show("s1", s1)
Console.WriteLine()
Console.WriteLine("U+0063 = LATIN SMALL LETTER C")
Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT")
Console.WriteLine("U+0327 = COMBINING CEDILLA")
Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS")
Console.WriteLine(divider)
Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}", s1.IsNormalized())
Console.WriteLine("A2) Is s1 normalized to Form C?: {0}", s1.IsNormalized(NormalizationForm.FormC))
Console.WriteLine("A3) Is s1 normalized to Form D?: {0}", s1.IsNormalized(NormalizationForm.FormD))
Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}", s1.IsNormalized(NormalizationForm.FormKC))
Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}", s1.IsNormalized(NormalizationForm.FormKD))
Console.WriteLine(divider)
Console.WriteLine("Set string s2 to each normalized form of string s1.")
Console.WriteLine()
Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE")
Console.WriteLine("U+0033 = DIGIT THREE")
Console.WriteLine("U+2044 = FRACTION SLASH")
Console.WriteLine("U+0034 = DIGIT FOUR")
Console.WriteLine(divider)
s2 = s1.Normalize()
Console.Write("B1) Is s2 normalized to the default form (Form C)?: ")
Console.WriteLine(s2.IsNormalized())
Show("s2", s2)
Console.WriteLine()
s2 = s1.Normalize(NormalizationForm.FormC)
Console.Write("B2) Is s2 normalized to Form C?: ")
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC))
Show("s2", s2)
Console.WriteLine()
s2 = s1.Normalize(NormalizationForm.FormD)
Console.Write("B3) Is s2 normalized to Form D?: ")
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD))
Show("s2", s2)
Console.WriteLine()
s2 = s1.Normalize(NormalizationForm.FormKC)
Console.Write("B4) Is s2 normalized to Form KC?: ")
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC))
Show("s2", s2)
Console.WriteLine()
s2 = s1.Normalize(NormalizationForm.FormKD)
Console.Write("B5) Is s2 normalized to Form KD?: ")
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD))
Show("s2", s2)
Console.WriteLine()
End Sub
Private Shared Sub Show(title As String, s As String)
Console.Write("Characters in string {0} = ", title)
For Each x As Char In s
Console.Write("{0:X4} ", AscW(x))
Next
Console.WriteLine()
End Sub
End Class
'This example produces the following results:
'
'Characters in string s1 = 0063 0301 0327 00BE
'
'U+0063 = LATIN SMALL LETTER C
'U+0301 = COMBINING ACUTE ACCENT
'U+0327 = COMBINING CEDILLA
'U+00BE = VULGAR FRACTION THREE QUARTERS
'
'--------------------------------------------------------------------------------
'
'A1) Is s1 normalized to the default form (Form C)?: False
'A2) Is s1 normalized to Form C?: False
'A3) Is s1 normalized to Form D?: False
'A4) Is s1 normalized to Form KC?: False
'A5) Is s1 normalized to Form KD?: False
'
'--------------------------------------------------------------------------------
'
'Set string s2 to each normalized form of string s1.
'
'U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
'U+0033 = DIGIT THREE
'U+2044 = FRACTION SLASH
'U+0034 = DIGIT FOUR
'
'--------------------------------------------------------------------------------
'
'B1) Is s2 normalized to the default form (Form C)?: True
'Characters in string s2 = 1E09 00BE
'
'B2) Is s2 normalized to Form C?: True
'Characters in string s2 = 1E09 00BE
'
'B3) Is s2 normalized to Form D?: True
'Characters in string s2 = 0063 0327 0301 00BE
'
'B4) Is s2 normalized to Form KC?: True
'Characters in string s2 = 1E09 0033 2044 0034
'
'B5) Is s2 normalized to Form KD?: True
'Characters in string s2 = 0063 0327 0301 0033 2044 0034
'
Normalize()
- Source:
- String.cs
- Source:
- String.cs
- Source:
- String.cs
返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合 Unicode 范式 C。
public:
System::String ^ Normalize();
public string Normalize ();
member this.Normalize : unit -> string
Public Function Normalize () As String
返回
一个新的规范化字符串,其文本值与此字符串相同,但其二进制表示形式符合范式 C。
例外
当前实例包含无效的 Unicode 字符。
注解
某些 Unicode 字符具有多个等效的二进制表示形式,由组合和/或复合 Unicode 字符集组成。 例如,以下任何码位都可以表示字母“ắ”:
U+1EAF
U+0103 U+0301
U+0061 U+0306 U+0301
单个字符存在多个表示形式,使搜索、排序、匹配和其他操作复杂化。
Unicode 标准定义了一个称为规范化的过程,当给定字符的任何等效二进制表示形式时,该流程将返回一个二进制表示形式。 可以使用多种算法(称为规范化形式)来执行规范化,这些算法遵循不同的规则。 .NET 支持由 Unicode 标准定义的四种规范化形式 (C、D、KC 和 KD) 。 当两个字符串以相同的规范化形式表示时,可以使用序号比较来比较它们。
若要规范化和比较两个字符串,请执行以下操作:
获取要从输入源(例如文件或用户输入设备)进行比较的字符串。
调用 方法, Normalize() 将字符串规范化为规范化形式 C。
若要比较两个字符串,请调用支持序号字符串比较的方法(如 Compare(String, String, StringComparison) 方法),并提供 值 StringComparison.Ordinal 或 StringComparison.OrdinalIgnoreCase 作为 StringComparison 参数。 若要对规范化字符串数组进行排序,请将 或 的值传递给
comparer
的相应重载Array.Sort。StringComparer.OrdinalStringComparer.OrdinalIgnoreCase根据上一步指示的顺序,在排序的输出中发出字符串。
有关支持的 Unicode 规范化形式的说明,请参阅 System.Text.NormalizationForm。
调用方说明
该方法 IsNormalized 在遇到字符串中的第一个非规范化字符时立即返回 false
。 因此,如果字符串包含非规范化字符,后跟无效的 Unicode 字符,该方法 Normalize 将引发 , ArgumentException 尽管 IsNormalized 返回 false
。
另请参阅
适用于
Normalize(NormalizationForm)
- Source:
- String.cs
- Source:
- String.cs
- Source:
- String.cs
返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合指定的 Unicode 范式。
public:
System::String ^ Normalize(System::Text::NormalizationForm normalizationForm);
public string Normalize (System.Text.NormalizationForm normalizationForm);
member this.Normalize : System.Text.NormalizationForm -> string
Public Function Normalize (normalizationForm As NormalizationForm) As String
参数
- normalizationForm
- NormalizationForm
一个 Unicode 范式。
返回
一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合由 normalizationForm
参数指定的范式。
例外
当前实例包含无效的 Unicode 字符。
注解
某些 Unicode 字符具有多个等效的二进制表示形式,由组合和/或复合 Unicode 字符集组成。 单个字符存在多个表示形式,使搜索、排序、匹配和其他操作复杂化。
Unicode 标准定义了一个称为规范化的过程,当给定字符的任何等效二进制表示形式时,该流程将返回一个二进制表示形式。 可以使用多种算法(称为规范化形式)来执行规范化,这些算法遵循不同的规则。 .NET 支持由 Unicode 标准定义的四种规范化形式 (C、D、KC 和 KD) 。 当两个字符串以相同的规范化形式表示时,可以使用序号比较来比较它们。
若要规范化和比较两个字符串,请执行以下操作:
获取要从输入源(例如文件或用户输入设备)进行比较的字符串。
调用 方法以 Normalize(NormalizationForm) 将字符串规范化为指定的规范化形式。
若要比较两个字符串,请调用支持序号字符串比较的方法(如 Compare(String, String, StringComparison) 方法),并提供 值 StringComparison.Ordinal 或 StringComparison.OrdinalIgnoreCase 作为 StringComparison 参数。 若要对规范化字符串数组进行排序,请将 或 的值传递给
comparer
的相应重载Array.Sort。StringComparer.OrdinalStringComparer.OrdinalIgnoreCase根据上一步指示的顺序,在排序的输出中发出字符串。
有关支持的 Unicode 规范化形式的说明,请参阅 System.Text.NormalizationForm。
调用方说明
该方法 IsNormalized 在遇到字符串中的第一个非规范化字符时立即返回 false
。 因此,如果字符串包含非规范化字符,后跟无效的 Unicode 字符,该方法 Normalize 可能会引发 , ArgumentException 尽管 IsNormalized 返回 false
。