WideCharToMultiByte function (stringapiset.h)
Maps a UTF-16 (wide character) string to a new character string. The new character string is not necessarily from a multibyte character set.
Data converted from UTF-16 to non-Unicode encodings is subject to data loss, because a code page might not be able to represent every character used in the specific Unicode data. For more information, see Security Considerations: International Features.
Syntax
int WideCharToMultiByte(
[in] UINT CodePage,
[in] DWORD dwFlags,
[in] _In_NLS_string_(cchWideChar)LPCWCH lpWideCharStr,
[in] int cchWideChar,
[out, optional] LPSTR lpMultiByteStr,
[in] int cbMultiByte,
[in, optional] LPCCH lpDefaultChar,
[out, optional] LPBOOL lpUsedDefaultChar
);
Parameters
[in] CodePage
Code page to use in performing the conversion. This parameter can be set to the value of any code page that is installed or available in the operating system. For a list of code pages, see Code Page Identifiers. Your application can also specify one of the values shown in the following table.
[in] dwFlags
Flags indicating the conversion type. The application can specify a combination of the following values. The function performs more quickly when none of these flags is set. The application should specify WC_NO_BEST_FIT_CHARS and WC_COMPOSITECHECK with the specific value WC_DEFAULTCHAR to retrieve all possible conversion results. If all three values are not provided, some results will be missing.
Value | Meaning | ||||||
---|---|---|---|---|---|---|---|
|
Convert composite characters, consisting of a base character and a nonspacing character, each with different character values. Translate these characters to precomposed characters, which have a single character value for a base-nonspacing character combination. For example, in the character è, the e is the base character and the accent grave mark is the nonspacing character. Note Windows normally represents Unicode strings with precomposed data, making the use of the WC_COMPOSITECHECK flag unnecessary.
Your application can combine WC_COMPOSITECHECK with any one of the following flags, with the default being WC_SEPCHARS. These flags determine the behavior of the function when no precomposed mapping for a base-nonspacing character combination in a Unicode string is available. If none of these flags is supplied, the function behaves as if the WC_SEPCHARS flag is set. For more information, see WC_COMPOSITECHECK and related flags in the Remarks section.
|
||||||
|
Windows Vista and later: Fail (by returning 0 and setting the last-error code to ERROR_NO_UNICODE_TRANSLATION) if an invalid input character is encountered. You can retrieve the last-error code with a call to GetLastError. If this flag is not set, the function replaces illegal sequences with U+FFFD (encoded as appropriate for the specified codepage) and succeeds by returning the length of the converted string. Note that this flag only applies when CodePage is specified as CP_UTF8 or 54936. It cannot be used with other code page values. | ||||||
|
Translate any Unicode characters that do not translate directly to multibyte equivalents to the default character specified by lpDefaultChar. In other words, if translating from Unicode to multibyte and back to Unicode again does not yield the same Unicode character, the function uses the default character. This flag can be used by itself or in combination with the other defined flags.
For strings that require validation, such as file, resource, and user names, the application should always use the WC_NO_BEST_FIT_CHARS flag. This flag prevents the function from mapping characters to characters that appear similar but have very different semantics. In some cases, the semantic change can be extreme. For example, the symbol for "∞" (infinity) maps to 8 (eight) in some code pages. |
For the code pages listed below, dwFlags must be 0. Otherwise, the function fails with ERROR_INVALID_FLAGS.
- 50220
- 50221
- 50222
- 50225
- 50227
- 50229
- 57002 through 57011
- 65000 (UTF-7)
- 42 (Symbol)
[in] lpWideCharStr
Pointer to the Unicode string to convert.
[in] cchWideChar
Size, in characters, of the string indicated by lpWideCharStr. Alternatively, this parameter can be set to -1 if the string is null-terminated. If cchWideChar is set to 0, the function fails.
If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.
If this parameter is set to a positive integer, the function processes exactly the specified number of characters. If the provided size does not include a terminating null character, the resulting character string is not null-terminated, and the returned length does not include this character.
[out, optional] lpMultiByteStr
Pointer to a buffer that receives the converted string.
[in] cbMultiByte
Size, in bytes, of the buffer indicated by lpMultiByteStr. If this value is 0, the function returns the required buffer size, in bytes, including any terminating null character, and makes no use of the lpMultiByteStr buffer.
[in, optional] lpDefaultChar
Pointer to the character to use if a character cannot be represented in the specified code page. The application sets this parameter to NULL if the function is to use a system default value. To obtain the system default character, the application can call the GetCPInfo or GetCPInfoEx function.
For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to NULL. Otherwise, the function fails with ERROR_INVALID_PARAMETER.
[out, optional] lpUsedDefaultChar
Pointer to a flag that indicates if the function has used a default character in the conversion. The flag is set to TRUE if one or more characters in the source string cannot be represented in the specified code page. Otherwise, the flag is set to FALSE. This parameter can be set to NULL.
For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to NULL. Otherwise, the function fails with ERROR_INVALID_PARAMETER.
Return value
If successful, returns the number of bytes written to the buffer pointed to by lpMultiByteStr. If the function succeeds and cbMultiByte is 0, the return value is the required size, in bytes, for the buffer indicated by lpMultiByteStr. Also see dwFlags for info about how the WC_ERR_INVALID_CHARS flag affects the return value when invalid sequences are input.
The function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError, which can return one of the following error codes:
- ERROR_INSUFFICIENT_BUFFER. A supplied buffer size was not large enough, or it was incorrectly set to NULL.
- ERROR_INVALID_FLAGS. The values supplied for flags were not valid.
- ERROR_INVALID_PARAMETER. Any of the parameter values was invalid.
- ERROR_NO_UNICODE_TRANSLATION. Invalid Unicode was found in a string.
Remarks
The lpMultiByteStr and lpWideCharStr pointers must not be the same. If they are the same, the function fails, and GetLastError returns ERROR_INVALID_PARAMETER.
WideCharToMultiByte does not null-terminate an output string if the input string length is explicitly specified without a terminating null character. To null-terminate an output string for this function, the application should pass in -1 or explicitly count the terminating null character for the input string.
If cbMultiByte is less than cchWideChar, this function writes the number of characters specified by cbMultiByte to the buffer indicated by lpMultiByteStr. However, if CodePage is set to CP_SYMBOL and cbMultiByte is less than cchWideChar, the function writes no characters to lpMultiByteStr.
The WideCharToMultiByte function operates most efficiently when both lpDefaultChar and lpUsedDefaultChar are set to NULL. The following table shows the behavior of the function for the four possible combinations of these parameters.
lpDefaultChar | lpUsedDefaultChar | Result |
---|---|---|
NULL | NULL | No default checking. These parameter settings are the most efficient ones for use with this function. |
Non-null character | NULL | Uses the specified default character, but does not set lpUsedDefaultChar. |
NULL | Non-null character | Uses the system default character and sets lpUsedDefaultChar if necessary. |
Non-null character | Non-null character | Uses the specified default character and sets lpUsedDefaultChar if necessary. |
Starting with Windows Vista, this function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function to produce valid UTF-8 strings will behave the same way as on earlier Windows operating systems.
Starting with Windows 8: WideCharToMultiByte is declared in Stringapiset.h. Before Windows 8, it was declared in Winnls.h.
WC_COMPOSITECHECK and related flags
As discussed in Using Unicode Normalization to Represent Strings, Unicode allows multiple representations of the same string (interpreted linguistically). For example, Capital A with dieresis (umlaut) can be represented either precomposed as a single Unicode code point "Ä" (U+00C4) or decomposed as the combination of Capital A and the combining dieresis character ("A" + "¨", that is U+0041 U+0308). However, most code pages provide only composed characters.The WC_COMPOSITECHECK flag causes the WideCharToMultiByte function to test for decomposed Unicode characters and attempts to compose them before converting them to the requested code page. This flag is only available for conversion to single byte (SBCS) or double byte (DBCS) code pages (code pages < 50000, excluding code page 42). If your application needs to convert decomposed Unicode data to single byte or double byte code pages, this flag might be useful. However, not all characters can be converted this way and it is more reliable to save and store such data as Unicode.
When an application is using WC_COMPOSITECHECK, some character combinations might remain incomplete or might have additional nonspacing characters left over. For example, A + ¨ + ¨ combines to Ä + ¨. Using the WC_DISCARDNS flag causes the function to discard additional nonspacing characters. Using the WC_DEFAULTCHAR flag causes the function to use the default replacement character (typically "?") instead. Using the WC_SEPCHARS flag causes the function to attempt to convert each additional nonspacing character to the target code page. Usually this flag also causes the use of the replacement character ("?"). However, for code page 1258 (Vietnamese) and 20269, nonspacing characters exist and can be used. The conversions for these code pages are not perfect. Some combinations do not convert correctly to code page 1258, and WC_COMPOSITECHECK corrupts data in code page 20269. As mentioned earlier, it is more reliable to design your application to save and store such data as Unicode.
Examples
ISDSC_STATUS DiscpUnicodeToAnsiSize(
IN __in PWCHAR UnicodeString,
OUT ULONG *AnsiSizeInBytes
)
/*++
Routine Description:
This routine will return the length needed to represent the unicode
string as ANSI
Arguments:
UnicodeString is the unicode string whose ansi length is returned
*AnsiSizeInBytes is number of bytes needed to represent unicode
string as ANSI
Return Value:
ERROR_SUCCESS or error code
--*/
{
_try
{
*AnsiSizeInBytes = WideCharToMultiByte(CP_ACP,
0,
UnicodeString,
-1,
NULL,
0, NULL, NULL);
} _except(EXCEPTION_EXECUTE_HANDLER) {
return(ERROR_NOACCESS);
}
return((*AnsiSizeInBytes == 0) ? GetLastError() : ERROR_SUCCESS);
}
Requirements
Requirement | Value |
---|---|
Minimum supported client | Windows 2000 Professional [desktop apps | UWP apps] |
Minimum supported server | Windows 2000 Server [desktop apps | UWP apps] |
Target Platform | Windows |
Header | stringapiset.h (include Windows.h) |
Library | Kernel32.lib |
DLL | Kernel32.dll |