Dev Info: Which StringComparator to use?
This post continues on the Dev Info from the previous post.
The confusion over what string comparator to use is close second to the #1 coding confusion (mix up between CultureInfo.InvariantCulture and CultureInfo.CurrentCulture) mentioned earlier. MSDN has a good detailed article on this here and below is just my summary of the same to increase awareness.
What is StringComparison?
Specifies the culture, case, and sort rules to be used by certain overloads of the String.Compare and String.Equals methods. Corresponding to each StringComparison value, there is StringComparer that can be used too for various string comparison functionality.
For example –
// Here the result will be always TRUE
bool result = string.Equals("string", "STRING", StringComparison.OrdinalIgnoreCase);
// Here the result will be TRUE for most cultures but for culture like Turkish
// it will be false. Turkish has four i's.
bool result = string.Equals("string", "STRING", StringComparison.CurrentCultureIgnoreCase);
English vs. Turkish Case Mappings
|
|||
Language
|
Letter
|
Lowercase Map
|
Uppercase Map
|
English
|
i
|
i
|
I
|
Turkish
|
dotted i
|
i
|
İ
|
Turkish
|
dotless ı
|
ı
|
I
|
What are various StringComparisons?
StringComparison member
|
Description
|
When to use
|
|
Performs an ordinal comparison.
|
· Case-sensitive identifiers in standards such as XML and HTTP.
· Case-sensitive security-related settings.
· Other case-insensitive system\OS resources.
|
|
Performs a case-insensitive ordinal comparison.
|
· Case-insensitive identifiers in standards such as XML and HTTP.
· Case-insensitive security-related settings.
· File paths.
· Registry keys and values.
· Environment variables.
· Resource identifiers (for example, display names).
· Command line arguments
· Other case-insensitive system\OS resources.
|
|
Performs a case-sensitive comparison using the current culture.
|
· While working with most user data (and not system\OS data)
· While showing\sorting the system\OS data mentioned above to the user (but not when comparing for equality).
|
|
Performs a case-insensitive comparison using the current culture.
|
|
|
Performs a case-sensitive comparison using the invariant culture.
|
Valid in very rare cases – best is to forget these exist.
|
|
Performs a case-insensitive comparison using the invariant culture.
|
Some Examples!
Use Case
|
StringComparison to use
|
Reason
|
Check file name, URL or registry value etc. for equality |
OrdinalIgnoreCase |
These system resources are case-insensitive (at least in .NET\Win32 context). |
Sort file name (or any other system resource) to show to the user in the UI |
CurrentCultureIgnoreCase |
User will want sorting to be in his culture, the way he understands it. |
Comparing PropertyName in WPF’s PropertyChangeHandler – a case insensitive data with a const string |
Ordinal |
The only reason for using Ordinal (over OrdinalIgnoreCase) is that you know the data will not vary in case and Ordinal is faster. |
Other notes
- Most APIs (probably except String.Equals and String.Contains) that has overloads that do not take StringComparison internally uses StringComparison.CurrentCulture. The String.Equals uses StringComparison.Ordinal. Again, even if your intention matches the default, it is better to explicitly pass appropriate StringComparison to make things clear. FxCop will also warn you if you don’t pass StringComparison.
- String.Contains does not have any overload that takes StringComparison and it is best to avoid it. Use String.IndexOf() != -1 instead with appropriate comparison.
- You need to be careful with string.ToLower\ToUpper functions. Because of the cases like “Turkish i” mentioned above, the ToLower() of a char might give completely different result in different culture. If you need to make something lower case in culture invariant manner, use ToLowerInvariant().
- In C#, you can use strings in switch\case statement. Again this internally calls String.Equals with Ordinal comparison which may not be what you want. The best is to avoid this construct.
- When you creating objects like Dictionary with string key, there is implied comparison internally. In such cases, you should pass the comparison\comparer while creating -
Dictionary<string, int> stringDictionary = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);
Comments
Anonymous
February 26, 2011
Your Turkish dotted and dotless "İ" and "ı" shows wrong above, see: en.wikipedia.org/.../Dotted_and_dotless_IAnonymous
February 27, 2011
The comment has been removed