Handling Internationalized Domain Names (IDNs)
This topic describes how you can work with internationalized domain names (IDNs) in your applications. IDNs are specified by Network Working Group RFC 3490: Internationalizing Domain Names in Applications (IDNA). Prior to this draft standard, IDNs were limited to Latin characters without diacritics. IDNA allows IDNs to include Latin characters with diacritics, along with characters from non-Latin scripts, such as Cyrillic, Arabic, and Chinese. The standard also establishes rules for mapping IDNs to ASCII-only domain names. Thus, IDNA issues can be handled on the client side, without requiring any domain name server (DNS) changes.
RFC 3490 introduces a number of security issues related to the use of IDNs. For more information see the related section of Security Considerations: International Features.
IDNA is currently based on Unicode 3.2.
NLS API Functions for Handling IDNs
NLS includes the following conversion functions that your application can use to convert an IDN to different representations. For an example of the use of these functions, see NLS: Internationalized Domain Name (IDN) Conversion Sample.
- IdnToAscii. Converts an IDN to Punycode.
- IdnToNameprepUnicode. Performs the NamePrep portion of the conversion of an IDN to an ASCII name. This function creates a canonical Unicode representation of a string.
- IdnToUnicode. Converts a Punycode string to a normal UTF-16 string.
NLS also defines several API functions that can be used to mitigate some of the security risks presented by the IDN technology. On Windows Vista and later, the following functions are used to verify that the characters in a given IDN are drawn entirely from the scripts associated with a particular locale or locales. For an example of the use of these functions, see NLS: Internationalized Domain Name (IDN) Mitigation Sample.
- GetStringScripts. Provides a list of scripts used in a particular string.
- GetLocaleInfo, GetLocaleInfoEx. Retrieve locale information. Using the functions with LCType set to LOCALE_SSCRIPTS provides a list of scripts normally used for a particular locale.
- VerifyScripts. Compares lists of scripts. To verify against multiple locales, the application can make multiple calls to GetLocaleInfo or GetLocaleInfoEx and VerifyScripts.
For applications that run on Windows XP and Windows Server 2003, the functions DownlevelGetLocaleScripts, DownlevelGetStringScripts, and DownlevelVerifyScripts play a similar role to the functions listed above in mitigating security risk. The "Microsoft Internationalized Domain Name (IDN) Mitigation APIs" download available from archive.org.
Handle Unicode Strings
IDNA supports the transformation of Unicode strings into legitimate host name labels, with the exception of strings containing certain prohibited characters, such as control characters, characters from the private use area (PUA), and the like. Your application can use the IDN_USE_STD3_ASCII_RULES flag with several NLS conversion functions to force the functions to fail if they encounter ASCII characters other than letters, numbers, or the hyphen-minus (-) character, or if a string begins or ends with the hyphen-minus character. These characters have always been prohibited from use in domain names, and remain prohibited in the draft standard.
Handle Unassigned Code Points
IDNs cannot contain unassigned code points. Therefore, code points that are not associated with a character ("assigned") as of Unicode 3.2 do not have defined IDN mappings, even though the IDN_ALLOW_UNASSIGNED flag in certain conversion functions allows them to be mapped to Punycode. You can find a list of unassigned code points in RFC 3454.
If your application encodes unassigned code points as Punycode, the resulting domain names should be illegal. Security can be compromised if a later version of IDNA makes these names legal or if the application filters out the illegal characters to try to create a legal domain name.
Unassigned code points are not allowed in the stored strings used in protocol identifiers and named entities, such as names in digital certificates and DNS domain name parts. However, the code points are allowed in query strings, for example, user-entered names for digital certificate authorities and DNS lookups, which are used to match against stored identifiers.
Although query strings can use unassigned code points, you should not use them in your applications. Even a user-supplied query string presents a risk of a "spoofing" attack. In this type of attack, the unscrupulous host site reroutes users from the site they intend to access to another site that might provide sensitive information to a third party. For example, copying a string from an incoming e-mail can present the same risks as clicking on a link in a browser.
Convert Domain Names to ASCII Names
Your application can use the IdnToAscii function and certain mitigation functions to convert IDNs to ASCII.
Because strings with very different binary representations can compare as identical, this function can raise certain security concerns. For more information, see the discussion of comparison functions in Security Considerations: International Features.
NLS: Internationalized Domain Name (IDN) Conversion Sample demonstrates the use of the IDN conversion functions. NLS: Internationalized Domain Name (IDN) Mitigation Sample demonstrates the use of the IDN mitigation functions.