Character Sets
A "character set" is a mapping of characters to their identifying code values. The character set most commonly used in computers today is Unicode, a global standard for character encoding. Internally, Windows applications use the UTF-16 implementation of Unicode. In UTF-16, most characters are identified by two-byte codes. The less commonly used supplementary characters are each represented by a surrogate pair, which is a pair of two-byte codes. For more information, see Surrogates and Supplementary Characters.
Some Windows applications must work with the older character sets that are native to Windows Me/98/95. Windows code pages allow your application to work with these character sets. These character sets can be divided into:
- Single-byte character sets (SBCS). In an SBCS, each character is identified by a value one byte wide.
- Multibyte character sets, in particular the double-byte character sets (DBCS). Multibyte character sets provide a means to represent the large number of characters in many Asian languages.
For more information, see the following topics:
- Code Pages
- Double-byte Character Sets
- Single-byte Character Sets
- Surrogates and Supplementary Characters
- Unicode
Related topics