Custom Case Mappings and Sorting Rules
Updated: May 2012
Case mappings, alphabetical order, and conventions for sequencing items vary from culture to culture. You should be aware of these variations and understand that they can cause the results of string operations to vary depending on culture.
The unique case-mapping rules for the Turkish alphabet illustrate how uppercase and lowercase mappings differ from language to language even when they use most of the same letters. In most Latin alphabets, the character "i" (U+0069) is the lowercase version of the character "I" (U+0049). However, the Turkish alphabet has two versions of both the uppercase and lowercase "I": one with a dot and one without a dot. In Turkish, the character "I" (U+0049) is considered the uppercase version of the character "ı" (U+0131), whereas "İ" (U+0130) is considered the uppercase version of the character "i" (U+0069). As a result, a case-insensitive string comparison of the characters "i" (U+0069) and "I" (U+0049) that succeeds for most cultures fails for the culture Turkish (Turkey), designated tr-TR.
Note
The culture Azerbaijani (Azerbaijan, Latin), designated az-Latn-AZ, also uses this case-mapping rule.
The following example demonstrates how the result of a case-insensitive String.Compare operation performed on the strings "FILE" and "file" differs depending on culture. The comparison returns true if the Thread.CurrentThread.CurrentCulture property is set to the culture English (United States), designated "en-US". The comparison returns false if the current culture is set to Turkish (Turkey), designated tr-TR.
Imports System.Globalization
Imports System.Threading
Public Class Example
Public Shared Sub Main()
' Set the CurrentCulture property to English in the U.S.
Thread.CurrentThread.CurrentCulture = New CultureInfo("en-US")
Console.WriteLine("Culture = {0}", _
Thread.CurrentThread.CurrentCulture.DisplayName)
Console.WriteLine("(file == FILE) = {0}", String.Compare("file", _
"FILE", True) = 0)
Console.WriteLine()
' Set the CurrentCulture property to Turkish in Turkey.
Thread.CurrentThread.CurrentCulture = New CultureInfo("tr-TR")
Console.WriteLine("Culture = {0}", _
Thread.CurrentThread.CurrentCulture.DisplayName)
Console.WriteLine("(file == FILE) = {0}", String.Compare("file", _
"FILE", True) = 0)
End Sub
End Class
' The example displays the following output:
' Culture = English (United States)
' (file == FILE) = True
'
' Culture = Turkish (Turkey)
' (file == FILE) = False
using System;
using System.Globalization;
using System.Threading;
public class Example
{
public static void Main()
{
// Set the CurrentCulture property to English in the U.S.
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
Console.WriteLine("Culture = {0}",
Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}\n", (string.Compare("file",
"FILE", true) == 0));
// Set the CurrentCulture property to Turkish in Turkey.
Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
Console.WriteLine("Culture = {0}",
Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}", (string.Compare("file",
"FILE", true) == 0));
}
}
// The example displays the following output:
// Culture = English (United States)
// (file == FILE) = True
//
// Culture = Turkish (Turkey)
// (file == FILE) = False
Additional Custom Case Mappings and Sorting Rules
In addition to the unique case mappings used in the Turkish and Azerbaijani alphabets, there are other custom case mappings and sorting rules that you should be aware of when considering string operations. The alphabets of nine cultures in the ASCII range (Unicode 0000 through Unicode 007F) contain two-letter pairs for which the result of a case-insensitive comparison, for example, using String.Compare, does not evaluate to equal when the case is mixed. These cultures are:
Croatian (Croatia), hr-HR
Czech (Czech Republic), cs-CZ
Slovak (Slovenia), sk-SK
Danish (Denmark), da-DK
Norwegian (Bokmål, Norway), nb-NO
Norwegian (Nynorsk, Norway), nn-NO
Hungarian (Hungary), hu-HU
Vietnamese (Vietnam), vi-VN
Spanish (Spain, Traditional Sort), es-ES_tradnl
For example, in the Danish language, a case-insensitive comparison of the two-letter pairs "aA" and "AA" is not considered equal. In the Vietnamese alphabet, a case-insensitive comparison of the two-letter pairs "nG" and "NG" is not considered equal. Although you should be aware that these rules exist, in practice, it is unusual to run into a situation where a culture-sensitive comparison of these pairs creates problems, since they are uncommon in fixed strings or identifiers.
The alphabets of six cultures within the ASCII range have standard casing rules, but different sorting rules. These cultures are:
Estonian (Estonia), et-EE
Finnish (Finland), fi-FI
Hungarian (Hungary, Technical Sort Order), hu-HU_technl
Lithuanian (Lithuania), lt-LT
Swedish (Finland), sv-FI
Swedish (Sweden), sv-SE
For example, in the Swedish alphabet, the letter "w" sorts as if it is the letter "v". In application code, sorting operations tend to be used less frequently than equality comparisons and therefore are less likely to create problems.
An additional 35 cultures have custom case mappings and sorting rules outside of the ASCII range. These rules are generally confined to the alphabets used by the specific cultures. Therefore, the likelihood that they will cause problems is low.
For details about the custom case mappings and sorting rules that apply to specific cultures, see The Unicode Standard at the Unicode home page.
See Also
Concepts
Culture-Insensitive String Operations
Other Resources
Performing Culture-Insensitive String Operations
Change History
Date |
History |
Reason |
---|---|---|
May 2012 |
Corrected casing of "I" characters. |
Customer feedback. |