.NET globalization and ICU

Before .NET 5, the .NET globalization APIs used different underlying libraries on different platforms. On Unix, the APIs used International Components for Unicode (ICU), and on Windows, they used National Language Support (NLS). This resulted in some behavioral differences in a handful of globalization APIs when running applications on different platforms. Behavior differences were evident in these areas:

  • Cultures and culture data
  • String casing
  • String sorting and searching
  • Sort keys
  • String normalization
  • Internationalized Domain Names (IDN) support
  • Time zone display name on Linux

Starting with .NET 5, developers have more control over which underlying library is used, enabling applications to avoid differences across platforms.

ICU on Windows

Windows 10 May 2019 Update and later versions include icu.dll as part of the OS, and .NET 5 and later versions use ICU by default. When running on Windows, .NET 5 and later versions try to load icu.dll and, if it's available, use it for the globalization implementation. If the ICU library can't be found or loaded, such as when running on older versions of Windows, .NET 5 and later versions fall back to the NLS-based implementation.

Note

Even when using ICU, the CurrentCulture, CurrentUICulture, and CurrentRegion members still use Windows operating system APIs to honor user settings.

Behavioral differences

If you upgrade your app to target .NET 5, you might see changes in your app even if you don't realize you're using globalization facilities. This section lists one of the behavioral changes you might see, but there are others too.

String.IndexOf

Consider the following code that calls String.IndexOf(String) to find the index of the null character \0 in a string.

const string greeting = "Hel\0lo";
Console.WriteLine($"{greeting.IndexOf("\0")}");
Console.WriteLine($"{greeting.IndexOf("\0", StringComparison.CurrentCulture)}");
Console.WriteLine($"{greeting.IndexOf("\0", StringComparison.Ordinal)}");
  • In .NET Core 3.1 and earlier versions on Windows, the snippet prints 3 on each of the three lines.
  • In .NET 5 and later versions on Windows 19H1 and later versions, the snippet prints 0, 0, and 3 (for the ordinal search).

By default, String.IndexOf(String) performs a culture-aware linguistic search. ICU considers the null character \0 to be a zero-weight character, and thus the character isn't found in the string when using a linguistic search on .NET 5 and later. However, NLS doesn't consider the null character \0 to be a zero-weight character, and a linguistic search on .NET Core 3.1 and earlier locates the character at position 3. An ordinal search finds the character at position 3 on all .NET versions.

You can run code analysis rules CA1307: Specify StringComparison for clarity and CA1309: Use ordinal StringComparison to find call sites in your code where the string comparison isn't specified or it is not ordinal.

For more information, see Behavior changes when comparing strings on .NET 5+.

Use NLS instead of ICU

Using ICU instead of NLS may result in behavioral differences with some globalization-related operations. To revert back to using NLS, a developer can opt out of the ICU implementation. Applications can enable NLS mode in any of the following ways:

  • In the project file:

    <ItemGroup>
      <RuntimeHostConfigurationOption Include="System.Globalization.UseNls" Value="true" />
    </ItemGroup>
    
  • In the runtimeconfig.json file:

    {
      "runtimeOptions": {
         "configProperties": {
           "System.Globalization.UseNls": true
          }
      }
    }
    
  • By setting the environment variable DOTNET_SYSTEM_GLOBALIZATION_USENLS to the value true or 1.

Note

A value set in the project or in the runtimeconfig.json file takes precedence over the environment variable.

For more information, see Runtime config settings.

Determine if your app is using ICU

The following code snippet can help you determine if your app is running with ICU libraries (and not NLS).

public static bool ICUMode()
{
    SortVersion sortVersion = CultureInfo.InvariantCulture.CompareInfo.Version;
    byte[] bytes = sortVersion.SortId.ToByteArray();
    int version = bytes[3] << 24 | bytes[2] << 16 | bytes[1] << 8 | bytes[0];
    return version != 0 && version == sortVersion.FullVersion;
}

To determine the version of .NET, use RuntimeInformation.FrameworkDescription.

App-local ICU

Each release of ICU may bring with it bug fixes as well as updated Common Locale Data Repository (CLDR) data that describes the world's languages. Moving between versions of ICU can subtly impact app behavior when it comes to globalization-related operations. To help application developers ensure consistency across all deployments, .NET 5 and later versions enable apps on both Windows and Unix to carry and use their own copy of ICU.

Applications can opt-in to an app-local ICU implementation mode in one of the following ways:

  • In the project file, set the appropriate RuntimeHostConfigurationOption value:

    <ItemGroup>
      <RuntimeHostConfigurationOption Include="System.Globalization.AppLocalIcu" Value="<suffix>:<version> or <version>" />
    </ItemGroup>
    
  • Or in the runtimeconfig.json file, set the appropriate runtimeOptions.configProperties value:

    {
      "runtimeOptions": {
         "configProperties": {
           "System.Globalization.AppLocalIcu": "<suffix>:<version> or <version>"
         }
      }
    }
    
  • Or by setting the environment variable DOTNET_SYSTEM_GLOBALIZATION_APPLOCALICU to the value <suffix>:<version> or <version>.

    <suffix>: Optional suffix of fewer than 36 characters in length, following the public ICU packaging conventions. When building a custom ICU, you can customize it to produce the lib names and exported symbol names to contain a suffix, for example, libicuucmyapp, where myapp is the suffix.

    <version>: A valid ICU version, for example, 67.1. This version is used to load the binaries and to get the exported symbols.

When either of these options is set, you can add a Microsoft.ICU.ICU4C.Runtime PackageReference to your project that corresponds to the configured version and that's all that is needed.

Alternatively, to load ICU when the app-local switch is set, .NET uses the NativeLibrary.TryLoad method, which probes multiple paths. The method first tries to find the library in the NATIVE_DLL_SEARCH_DIRECTORIES property, which is created by the dotnet host based on the deps.json file for the app. For more information, see Default probing.

For self-contained apps, no special action is required by the user, other than making sure ICU is in the app directory (for self-contained apps, the working directory defaults to NATIVE_DLL_SEARCH_DIRECTORIES).

If you're consuming ICU via a NuGet package, this works in framework-dependent applications. NuGet resolves the native assets and includes them in the deps.json file and in the output directory for the application under the runtimes directory. .NET loads it from there.

For framework-dependent apps (not self-contained) where ICU is consumed from a local build, you must take additional steps. The .NET SDK doesn't yet have a feature for "loose" native binaries to be incorporated into deps.json (see this SDK issue). Instead, you can enable this by adding additional information into the application's project file. For example:

<ItemGroup>
  <IcuAssemblies Include="icu\*.so*" />
  <RuntimeTargetsCopyLocalItems Include="@(IcuAssemblies)" AssetType="native" CopyLocal="true" 
    DestinationSubDirectory="runtimes/linux-x64/native/" DestinationSubPath="%(FileName)%(Extension)" 
    RuntimeIdentifier="linux-x64" NuGetPackageId="System.Private.Runtime.UnicodeData" />
</ItemGroup>

This must be done for all the ICU binaries for the supported runtimes. Also, the NuGetPackageId metadata in the RuntimeTargetsCopyLocalItems item group needs to match a NuGet package that the project actually references.

macOS behavior

macOS has a different behavior for resolving dependent dynamic libraries from the load commands specified in the Mach-O file than the Linux loader. In the Linux loader, .NET can try libicudata, libicuuc, and libicui18n (in that order) to satisfy ICU dependency graph. However, on macOS, this doesn't work. When building ICU on macOS, you, by default, get a dynamic library with these load commands in libicuuc. The following snippet shows an example.

~/ % otool -L /Users/santifdezm/repos/icu-build/icu/install/lib/libicuuc.67.1.dylib
/Users/santifdezm/repos/icu-build/icu/install/lib/libicuuc.67.1.dylib:
 libicuuc.67.dylib (compatibility version 67.0.0, current version 67.1.0)
 libicudata.67.dylib (compatibility version 67.0.0, current version 67.1.0)
 /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1)
 /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 902.1.0)

These commands just reference the name of the dependent libraries for the other components of ICU. The loader performs the search following the dlopen conventions, which involves having these libraries in the system directories or setting the LD_LIBRARY_PATH env vars, or having ICU at the app-level directory. If you can't set LD_LIBRARY_PATH or ensure that ICU binaries are at the app-level directory, you will need to do some extra work.

There are some directives for the loader, like @loader_path, which tell the loader to search for that dependency in the same directory as the binary with that load command. There are two ways to achieve this:

  • install_name_tool -change

    Run the following commands:

    install_name_tool -change "libicudata.67.dylib" "@loader_path/libicudata.67.dylib" /path/to/libicuuc.67.1.dylib
    install_name_tool -change "libicudata.67.dylib" "@loader_path/libicudata.67.dylib" /path/to/libicui18n.67.1.dylib
    install_name_tool -change "libicuuc.67.dylib" "@loader_path/libicuuc.67.dylib" /path/to/libicui18n.67.1.dylib
    
  • Patch ICU to produce the install names with @loader_path

    Before running autoconf (./runConfigureICU), change these lines to:

    LD_SONAME = -Wl,-compatibility_version -Wl,$(SO_TARGET_VERSION_MAJOR) -Wl,-current_version -Wl,$(SO_TARGET_VERSION) -install_name @loader_path/$(notdir $(MIDDLE_SO_TARGET))
    

ICU on WebAssembly

A version of ICU is available that's specifically for WebAssembly workloads. This version provides globalization compatibility with desktop profiles. To reduce the ICU data file size from 24 MB to 1.4 MB (or ~0.3 MB if compressed with Brotli), this workload has a handful of limitations.

The following APIs are not supported:

The following APIs are supported with limitations:

In addition, fewer locales are supported. The supported list can be found in the dotnet/icu repo.