Writing, Mapping, and Sorting EUDC and PUA Characters

Applications write end-user-defined characters (EUDCs) and private use area (PUA) characters to the screen or printer just as they write other characters, by using output functions such as TextOut and ExtTextOut. These functions automatically retrieve character information from EUDC or PUA character fonts if EUDC is enabled. For more information, see End-User_Defined and Private Use Area Characters.

When writing EUDCs or PUA characters, the operation of the text output function depends on the currently selected font. If the selected font is an integrated EUDC or PUA character font, the function retrieves character information from that font. If the selected font is a double-byte character set (DBCS) TrueType font that has an associated separate EUDC font, the function retrieves information from the specified EUDC font. Similarly, if the selected font is a Unicode TrueType font that has an associated separate PUA character font, the function retrieves information from the PUA character font. If the selected font does not have an associated EUDC or PUA character font, the function retrieves information from the system default EUDC font. If the character is not in the system default EUDC font or there is no system default EUDC font, the function writes the default character defined by the selected font.

Applications can map EUDCs to and from Unicode by using the MultiByteToWideChar and WideCharToMultiByte functions. The MultiByteToWideChar function maps most EUDCs to characters in the Unicode PUA. However, to support certain national or regional standards, some EUDCs can be mapped to non-PUA Unicode code points. The WideCharToMultiByte function maps a character in the PUA to its EUDC counterpart, if such a mapping exists and if the code point does not have a valid non-PUA mapping in Unicode. Not all code pages have an EUDC range. The code page specified in a call to WideCharToMultiByte must contain an EUDC code range for the mapping to the EUDC range to occur. If the code page does not contain an EUDC code range, the function retrieves the default character for any characters in the Unicode PUA.

MultiByteToWideChar and WideCharToMultiByte do not guarantee round-trip mapping. In other words, it is possible to start with a particular multibyte string containing EUDCs, map the string to Unicode with MultiByteToWideChar and map it back to the original DBCS with WideCharToMultiByte, and end up with a result that is not identical to the original string. Applications relying on mapping EUDCs to Unicode should ensure that all necessary characters can round-trip between the appropriate code page EUDC area and the Unicode PUA.

Applications should not attempt to map EUDCs from one code page to another. If an application starts with an EUDC from one code page, maps it to Unicode with MultiByteToWideChar, and maps to a different DBCS with WideCharToMultiByte, there are no guarantees about the results. The original character might be mapped to a different EUDC in the destination code page, or it might be mapped as an undefined character. Similarly, mapping a Unicode string to a code page that has an EUDC range can have unintended results. If the Unicode string contains a PUA code point, it is possible that the code point will be mapped to an EUDC that does not represent the same character.

Applications can compare DBCS strings that contain EUDCs by using the ANSI version of the CompareString function. The function effectively maps the characters to Unicode before comparing character values. Applications can create a sort key for the string by using the ANSI version of the LCMapString function and the LCMAP_SORTKEY value. This function effectively maps characters to Unicode first. All characters in the PUA are sorted after all other Unicode characters. Within the area, characters are sorted in numerical order. If an application attempts to retrieve CTYPE information for an EUDC by using the GetStringTypeA function, the function retrieves NULL for each character.

Using Unicode and Character Sets