Edit

Share via


char, wchar_t, char8_t, char16_t, char32_t

The types char, wchar_t, char8_t, char16_t, and char32_t are built-in types that represent alphanumeric characters, nonalphanumeric glyphs, and nonprinting characters.

Syntax

char     ch1{ 'a' };  // or { u8'a' }
wchar_t  ch2{ L'a' };
char16_t ch3{ u'a' };
char32_t ch4{ U'a' };

Remarks

The char type was the original character type in C and C++. The char type stores characters from the ASCII character set or any of the ISO-8859 character sets, and individual bytes of multi-byte characters such as Shift-JIS or the UTF-8 encoding of the Unicode character set. In the Microsoft compiler, char is an 8-bit type. It's a distinct type from both signed char and unsigned char. By default, variables of type char get promoted to int as if from type signed char unless the /J compiler option is used. Under /J, they're treated as type unsigned char and get promoted to int without sign extension.

The type unsigned char is often used to represent a byte, which isn't a built-in type in C++.

The wchar_t type is an implementation-defined wide character type. In the Microsoft compiler, it represents a 16-bit wide character used to store Unicode encoded as UTF-16LE, the native character type on Windows operating systems. The wide character versions of the Universal C Runtime (UCRT) library functions use wchar_t and its pointer and array types as parameters and return values, as do the wide character versions of the native Windows API.

The char8_t, char16_t, and char32_t types represent 8-bit, 16-bit, and 32-bit wide characters, respectively. (char8_t is new in C++20 and requires the /std:c++20 or /std:c++latest compiler option.) Unicode encoded as UTF-8 can be stored in the char8_t type. Strings of char8_t and char type are referred to as narrow strings, even when used to encode Unicode or multi-byte characters. Unicode encoded as UTF-16 can be stored in the char16_t type, and Unicode encoded as UTF-32 can be stored in the char32_t type. Strings of these types and wchar_t are all referred to as wide strings, though the term often refers specifically to strings of wchar_t type.

In the C++ standard library, the basic_string type is specialized for both narrow and wide strings. Use std::string when the characters are of type char, std::u8string when the characters are of type char8_t, std::u16string when the characters are of type char16_t, std::u32string when the characters are of type char32_t, and std::wstring when the characters are of type wchar_t.

Other types that represent text, including std::stringstream and std::cout have specializations for narrow and wide strings.