Normalización
Actualización: noviembre 2007
Algunos caracteres Unicode tienen múltiples representaciones binarias equivalentes, que consisten en conjuntos de caracteres Unicode combinables o compuestos. La existencia de varias representaciones para un carácter individual complica la búsqueda, clasificación, coincidencia y otras operaciones.
El estándar Unicode define un proceso denominado normalización, que devuelve una representación binaria al proporcionar cualquiera de las representaciones binarias equivalentes de un carácter. La normalización puede utilizar varios algoritmos, denominados formas de normalización, que obedecen a reglas diferentes. .NET Framework admite actualmente las formas de normalización Unicode C, D, KC y KD.
Nota: |
---|
Se pueden comparar dos cadenas normalizadas con la misma forma de normalización utilizando una comparación ordinal, es decir, una comparación binaria, carácter por carácter. |
Para obtener más información sobre las formas de normalización admitidas por .NET Framework, vea NormalizationForm. Para obtener más información sobre la normalización, la descomposición de caracteres y la equivalencia, vea Unicode Standard Annex #15, "Unicode Normalization Forms", en la página principal de Unicode.
Normalizar una cadena
La aplicación debe utilizar el método String.Normalize de un objeto String para devolver una nueva cadena que, de forma predeterminada, se normaliza según la forma de normalización C. Alternativamente, la aplicación puede utilizar el método String.Normalize de un objeto String, especificando un valor NormalizationForm, para devolver una cadena nueva normalizada específicamente según la forma de normalización C, D, KC o KD.
Probar si una cadena está normalizada
La aplicación puede utilizar el método String.IsNormalized de un objeto String para determinar si el valor de cadena del objeto se ha normalizado según la forma de normalización C. Alternativamente, la aplicación puede utilizar el método String.IsNormalized de un objeto String, especificando un valor NormalizationForm concreto, para determinar si el valor de cadena del objeto se ha normalizado específicamente según la forma de normalización C, D, KC o KD.
Ejemplo
En el siguiente ejemplo de código se muestran los métodos IsNormalized y Normalize. En el ejemplo de código se comprueba si una cadena original se encuentra en alguna de las cuatro formas de normalización, crea una versión de la cadena original en cada forma de normalización, comprueba si cada cadena normalizada se encuentra en la forma de normalización prevista y, luego, muestra el punto de código hexadecimal de cada carácter en cada cadena normalizada.
' This example demonstrates the String.Normalize method
' and the String.IsNormalized method
Imports System
Imports System.Text
Imports Microsoft.VisualBasic
Class Sample
Public Shared Sub Main()
' Character c; combining characters acute and cedilla; character 3/4
Dim s1 = New [String](New Char() {ChrW(&H0063), ChrW(&H0301), ChrW(&H0327), ChrW(&H00BE)})
Dim s2 As String = Nothing
Dim divider = New [String]("-"c, 80)
divider = [String].Concat(Environment.NewLine, divider, Environment.NewLine)
Try
Show("s1", s1)
Console.WriteLine()
Console.WriteLine("U+0063 = LATIN SMALL LETTER C")
Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT")
Console.WriteLine("U+0327 = COMBINING CEDILLA")
Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS")
Console.WriteLine(divider)
Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}", s1.IsNormalized())
Console.WriteLine("A2) Is s1 normalized to Form C?: {0}", s1.IsNormalized(NormalizationForm.FormC))
Console.WriteLine("A3) Is s1 normalized to Form D?: {0}", s1.IsNormalized(NormalizationForm.FormD))
Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}", s1.IsNormalized(NormalizationForm.FormKC))
Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}", s1.IsNormalized(NormalizationForm.FormKD))
Console.WriteLine(divider)
Console.WriteLine("Set string s2 to each normalized form of string s1.")
Console.WriteLine()
Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE")
Console.WriteLine("U+0033 = DIGIT THREE")
Console.WriteLine("U+2044 = FRACTION SLASH")
Console.WriteLine("U+0034 = DIGIT FOUR")
Console.WriteLine(divider)
s2 = s1.Normalize()
Console.Write("B1) Is s2 normalized to the default form (Form C)?: ")
Console.WriteLine(s2.IsNormalized())
Show("s2", s2)
Console.WriteLine()
s2 = s1.Normalize(NormalizationForm.FormC)
Console.Write("B2) Is s2 normalized to Form C?: ")
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC))
Show("s2", s2)
Console.WriteLine()
s2 = s1.Normalize(NormalizationForm.FormD)
Console.Write("B3) Is s2 normalized to Form D?: ")
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD))
Show("s2", s2)
Console.WriteLine()
s2 = s1.Normalize(NormalizationForm.FormKC)
Console.Write("B4) Is s2 normalized to Form KC?: ")
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC))
Show("s2", s2)
Console.WriteLine()
s2 = s1.Normalize(NormalizationForm.FormKD)
Console.Write("B5) Is s2 normalized to Form KD?: ")
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD))
Show("s2", s2)
Console.WriteLine()
Catch e As Exception
Console.WriteLine(e.Message)
End Try
End Sub 'Main
Private Shared Sub Show(title As String, s As String)
Console.Write("Characters in string {0} = ", title)
Dim x As Char
For Each x In s.ToCharArray()
Console.Write("{0:X4} ", AscW(x))
Next x
Console.WriteLine()
End Sub 'Show
End Class 'Sample
'
'This example produces the following results:
'
'Characters in string s1 = 0063 0301 0327 00BE
'
'U+0063 = LATIN SMALL LETTER C
'U+0301 = COMBINING ACUTE ACCENT
'U+0327 = COMBINING CEDILLA
'U+00BE = VULGAR FRACTION THREE QUARTERS
'
'--------------------------------------------------------------------------------
'
'A1) Is s1 normalized to the default form (Form C)?: False
'A2) Is s1 normalized to Form C?: False
'A3) Is s1 normalized to Form D?: False
'A4) Is s1 normalized to Form KC?: False
'A5) Is s1 normalized to Form KD?: False
'
'--------------------------------------------------------------------------------
'
'Set string s2 to each normalized form of string s1.
'
'U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
'U+0033 = DIGIT THREE
'U+2044 = FRACTION SLASH
'U+0034 = DIGIT FOUR
'
'--------------------------------------------------------------------------------
'
'B1) Is s2 normalized to the default form (Form C)?: True
'Characters in string s2 = 1E09 00BE
'
'B2) Is s2 normalized to Form C?: True
'Characters in string s2 = 1E09 00BE
'
'B3) Is s2 normalized to Form D?: True
'Characters in string s2 = 0063 0327 0301 00BE
'
'B4) Is s2 normalized to Form KC?: True
'Characters in string s2 = 1E09 0033 2044 0034
'
'B5) Is s2 normalized to Form KD?: True
'Characters in string s2 = 0063 0327 0301 0033 2044 0034
'
// This example demonstrates the String.Normalize method
// and the String.IsNormalized method
using System;
using System.Text;
class Sample
{
public static void Main()
{
// Character c; combining characters acute and cedilla; character 3/4
string s1 = new String( new char[] {'\u0063', '\u0301', '\u0327', '\u00BE'});
string s2 = null;
string divider = new String('-', 80);
divider = String.Concat(Environment.NewLine, divider, Environment.NewLine);
try
{
Show("s1", s1);
Console.WriteLine();
Console.WriteLine("U+0063 = LATIN SMALL LETTER C");
Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT");
Console.WriteLine("U+0327 = COMBINING CEDILLA");
Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS");
Console.WriteLine(divider);
Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}",
s1.IsNormalized());
Console.WriteLine("A2) Is s1 normalized to Form C?: {0}",
s1.IsNormalized(NormalizationForm.FormC));
Console.WriteLine("A3) Is s1 normalized to Form D?: {0}",
s1.IsNormalized(NormalizationForm.FormD));
Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}",
s1.IsNormalized(NormalizationForm.FormKC));
Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}",
s1.IsNormalized(NormalizationForm.FormKD));
Console.WriteLine(divider);
Console.WriteLine("Set string s2 to each normalized form of string s1.");
Console.WriteLine();
Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE");
Console.WriteLine("U+0033 = DIGIT THREE");
Console.WriteLine("U+2044 = FRACTION SLASH");
Console.WriteLine("U+0034 = DIGIT FOUR");
Console.WriteLine(divider);
s2 = s1.Normalize();
Console.Write("B1) Is s2 normalized to the default form (Form C)?: ");
Console.WriteLine(s2.IsNormalized());
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormC);
Console.Write("B2) Is s2 normalized to Form C?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC));
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormD);
Console.Write("B3) Is s2 normalized to Form D?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD));
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormKC);
Console.Write("B4) Is s2 normalized to Form KC?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC));
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormKD);
Console.Write("B5) Is s2 normalized to Form KD?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD));
Show("s2", s2);
Console.WriteLine();
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
private static void Show(string title, string s)
{
Console.Write("Characters in string {0} = ", title);
foreach(short x in s.ToCharArray())
{
Console.Write("{0:X4} ", x);
}
Console.WriteLine();
}
}
/*
This example produces the following results:
Characters in string s1 = 0063 0301 0327 00BE
U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS
--------------------------------------------------------------------------------
A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?: False
A3) Is s1 normalized to Form D?: False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False
--------------------------------------------------------------------------------
Set string s2 to each normalized form of string s1.
U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR
--------------------------------------------------------------------------------
B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE
B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE
B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE
B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034
B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034
*/
// This example demonstrates the String.Normalize method
// and the String.IsNormalized method
using namespace System;
using namespace System::Text;
void Show( String^ title, String^ s )
{
Console::Write( "Characters in string {0} = ", title );
System::Collections::IEnumerator^ myEnum = s->ToCharArray()->GetEnumerator();
while ( myEnum->MoveNext() )
{
/*) * __try_cast < Char * > ( myEnum -> Current );*/
int x;
Console::Write( "{0:X4} ", x );
}
Console::WriteLine();
}
int main()
{
// Character c; combining characters acute and cedilla; character 3/4
array<Char>^temp0 = {L'c',L'\u0301',L'\u0327',L'\u00BE'};
String^ s1 = gcnew String( temp0 );
String^ s2 = nullptr;
String^ divider = gcnew String( '-',80 );
divider = String::Concat( Environment::NewLine, divider, Environment::NewLine );
try
{
Show( "s1", s1 );
Console::WriteLine();
Console::WriteLine( "U+0063 = LATIN SMALL LETTER C" );
Console::WriteLine( "U+0301 = COMBINING ACUTE ACCENT" );
Console::WriteLine( "U+0327 = COMBINING CEDILLA" );
Console::WriteLine( "U+00BE = VULGAR FRACTION THREE QUARTERS" );
Console::WriteLine( divider );
Console::WriteLine( "A1) Is s1 normalized to the default form (Form C)?: {0}", s1->IsNormalized() );
Console::WriteLine( "A2) Is s1 normalized to Form C?: {0}", s1->IsNormalized( NormalizationForm::FormC ) );
Console::WriteLine( "A3) Is s1 normalized to Form D?: {0}", s1->IsNormalized( NormalizationForm::FormD ) );
Console::WriteLine( "A4) Is s1 normalized to Form KC?: {0}", s1->IsNormalized( NormalizationForm::FormKC ) );
Console::WriteLine( "A5) Is s1 normalized to Form KD?: {0}", s1->IsNormalized( NormalizationForm::FormKD ) );
Console::WriteLine( divider );
Console::WriteLine( "Set string s2 to each normalized form of string s1." );
Console::WriteLine();
Console::WriteLine( "U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE" );
Console::WriteLine( "U+0033 = DIGIT THREE" );
Console::WriteLine( "U+2044 = FRACTION SLASH" );
Console::WriteLine( "U+0034 = DIGIT FOUR" );
Console::WriteLine( divider );
s2 = s1->Normalize();
Console::Write( "B1) Is s2 normalized to the default form (Form C)?: " );
Console::WriteLine( s2->IsNormalized() );
Show( "s2", s2 );
Console::WriteLine();
s2 = s1->Normalize( NormalizationForm::FormC );
Console::Write( "B2) Is s2 normalized to Form C?: " );
Console::WriteLine( s2->IsNormalized( NormalizationForm::FormC ) );
Show( "s2", s2 );
Console::WriteLine();
s2 = s1->Normalize( NormalizationForm::FormD );
Console::Write( "B3) Is s2 normalized to Form D?: " );
Console::WriteLine( s2->IsNormalized( NormalizationForm::FormD ) );
Show( "s2", s2 );
Console::WriteLine();
s2 = s1->Normalize( NormalizationForm::FormKC );
Console::Write( "B4) Is s2 normalized to Form KC?: " );
Console::WriteLine( s2->IsNormalized( NormalizationForm::FormKC ) );
Show( "s2", s2 );
Console::WriteLine();
s2 = s1->Normalize( NormalizationForm::FormKD );
Console::Write( "B5) Is s2 normalized to Form KD?: " );
Console::WriteLine( s2->IsNormalized( NormalizationForm::FormKD ) );
Show( "s2", s2 );
Console::WriteLine();
}
catch ( Exception^ e )
{
Console::WriteLine( e->Message );
}
}
/*
This example produces the following results:
Characters in string s1 = 0063 0301 0327 00BE
U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS
--------------------------------------------------------------------------------
A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?: False
A3) Is s1 normalized to Form D?: False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False
--------------------------------------------------------------------------------
Set string s2 to each normalized form of string s1.
U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR
--------------------------------------------------------------------------------
B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE
B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE
B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE
B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034
B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034
*/
// This example demonstrates the String.Normalize method
// and the String.IsNormalized method
import System.*;
import System.Text.*;
class Sample
{
public static void main(String[] args)
{
// Character c; combining characters acute and cedilla; character 3/4
String s1 = new String(new char[] { '\u0063', '\u0301', '\u0327',
'\u00BE' });
String s2 = null;
String divider = new String('-', 80);
divider = String.Concat(Environment.get_NewLine(), divider,
Environment.get_NewLine());
try {
Show("s1", s1);
Console.WriteLine();
Console.WriteLine("U+0063 = LATIN SMALL LETTER C");
Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT");
Console.WriteLine("U+0327 = COMBINING CEDILLA");
Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS");
Console.WriteLine(divider);
Console.WriteLine("A1) Is s1 normalized to the default form "
+ "(Form C)?: {0}", System.Convert.ToString(s1.IsNormalized()));
Console.WriteLine("A2) Is s1 normalized to Form C?: {0}",
System.Convert.ToString(s1.
IsNormalized(NormalizationForm.FormC)));
Console.WriteLine("A3) Is s1 normalized to Form D?: {0}",
System.Convert.ToString(s1.
IsNormalized(NormalizationForm.FormD)));
Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}",
System.Convert.ToString(s1.
IsNormalized(NormalizationForm.FormKC)));
Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}",
System.Convert.ToString(s1.
IsNormalized(NormalizationForm.FormKD)));
Console.WriteLine(divider);
Console.WriteLine("Set string s2 to each normalized form of "
+ "string s1.");
Console.WriteLine();
Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA "
+ "AND ACUTE");
Console.WriteLine("U+0033 = DIGIT THREE");
Console.WriteLine("U+2044 = FRACTION SLASH");
Console.WriteLine("U+0034 = DIGIT FOUR");
Console.WriteLine(divider);
s2 = s1.Normalize();
Console.Write("B1) Is s2 normalized to the default form "
+ "(Form C)?: ");
Console.WriteLine(s2.IsNormalized());
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormC);
Console.Write("B2) Is s2 normalized to Form C?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC));
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormD);
Console.Write("B3) Is s2 normalized to Form D?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD));
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormKC);
Console.Write("B4) Is s2 normalized to Form KC?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC));
Show("s2", s2);
Console.WriteLine();
s2 = s1.Normalize(NormalizationForm.FormKD);
Console.Write("B5) Is s2 normalized to Form KD?: ");
Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD));
Show("s2", s2);
Console.WriteLine();
}
catch (System.Exception e) {
Console.WriteLine(e.get_Message());
}
} //main
private static void Show(String title, String s)
{
Console.Write("Characters in string {0} = ", title);
char myCharArray[] = s.ToCharArray();
for (int iCtr = 0; iCtr < myCharArray.length; iCtr++) {
char c = myCharArray[iCtr];
Console.Write(((System.Int32)c).ToString("X4") + " ");
}
Console.WriteLine();
} //Show
} //Sample
/*
This example produces the following results:
Characters in string s1 = 0063 0301 0327 00BE
U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS
--------------------------------------------------------------------------------
A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?: False
A3) Is s1 normalized to Form D?: False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False
--------------------------------------------------------------------------------
Set string s2 to each normalized form of string s1.
U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR
--------------------------------------------------------------------------------
B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE
B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE
B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE
B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034
B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034
*/