Unicode and Character Sets

Microsoft Windows provides support for the many different written languages of the international marketplace through Unicode and traditional character sets.

Unicode is a worldwide character encoding standard that provides a unique number to represent each character used in modern computing, including technical symbols and special characters used in publishing. Unicode is required by modern standards, such as XML and ECMAScript (JavaScript), and is the official mechanism for implementing ISO/IEC 10646. It is supported by many operating systems, all modern browsers, and many other products. New Windows applications should use Unicode to avoid the inconsistencies of varied code pages and to aid in simplifying localization.

Traditional character sets are the previous character encoding standards — such as Windows code pages that use 8-bit code values or combinations of 8-bit values to represent the characters used in a specific language or geographical region.

This overview describes the character set functions and explains how to use them in your applications.

Handling Internationalized Domain Names (IDNs)

Using Unicode Normalization to Represent Strings