_snprintf fails when given "%ls" format to display WCHAR string containing DBCS characters

Question

_snprintf fails when given "%ls" format to display WCHAR string containing DBCS characters

Alan 21

Short version: If I have a WCHAR string containing characters that would require DBCS code page 936 to map successfully in ANSI output, the ANSI-based CRT _snprintf_s returns -1 instead of attempting to map them. Regardless of whether the ANSI code page is currently 936 or not. Anyone know what the CRT functions are balking at here when asked to process a WCHAR string into ANSI output?

Okay, so I'm asking _snprintf_s, an ANSI-based string print, to include a WCHAR string as part of the output. I've done this by specifying the format string "%ls" (or "%ws", your choice) so that _snprintf_s knows the parameter is to a WCHAR string buffer.

My expectation was that I should get a result similar to calling WideCharToMultiByte( CP_ACP ) against the WCHAR string. Where _snprintf_s will print "as many characters as will actually map to the current ANSI code page", but then a placeholder character like "?" will be substituted for every character which did not have a representation in the current ANSI code page.

What actually happens is that I get a -1 return from _snprintf_s instead. As though there is "insufficient buffer", even though my source WCHAR string is only 36 characters including terminating NULL, would take 39 characters to encode using ANSI (if using DBCS code page 936), and has a 255-character output buffer size indicated and available.

What's more unusual is that this happens "even if ANSI conversion of the entire WCHAR string is possible." Meaning, when I actually am on a machine with the ANSI code page set to 936, and all the characters in my WCHAR string are available in ANSI code page 936. _snprintf_s continues to return -1 instead of successfully processing the "%ls" format specification to include the WCHAR string.

For example, if my test string is "未更新 123456789012345678901234567.txt" ("\u672a\u66f4\u65b0" for those first three characters) and I'm on a machine with ANSI code page 1252 set, the _snprintf_s call fails with -1. "Maybe I understand that" since indeed the ANSI code page can't represent those characters. I might have "preferred" those characters be substituted with the default character like "?", but fine: Failure.

But on a machine where the ANSI code page (both current user and system default) are code page 936 (Windows 10 21H1 Chinese Traditional), the _snprintf_s still fails with -1. As though "simply presence of these characters" is the real problem, and not "the ability to match them in the current ANSI code page for output." Yes, if I simply remove those characters, or replace them with "123" instead of "未更新", the same calls with the same buffers and format string work fine.

The "real problem" is that I encounter this issue with StringCchVPrintfA (and the entire strsafe.h family). I'm picking on the CRT _snprintf_s implementation only because the CRT's _vsnprintf_l is how StringCchVPrintfA is implemented (as seen under the debugger). StringCchVPrintfA gives me ERROR_INSUFFICIENT_BUFFER (0x8007007A) in response to the -1 _vsnprintf_l returned when "%ls" was specified to display the WCHAR string.

The kicker, of course, is that using a Windows API instead of the CRT or strsafe.h, the expected successful behavior occurs. Meaning through wsprintfA() on a machine using ANSI code page 936, the WCHAR string gets fully represented in the ANSI output. And on an ANSI code page 1252 machine where the characters aren't available, I get the default characters "???" in place of the characters which were impossible to represent.

But of course wsprintfA() doesn't allow controlling the output buffer extent, and is "unsafe".

The Windows console test application I'm using to demonstrate this issue is attached. Note I'm linking this against the debug CRT under Visual Studio 2019, but I do not seem to be getting any invocation of the invalid parameter handler when this issue occurs. Which I guess is "true", since "insufficient buffer" is not actually the problem here.

132442-stringcchtestcpp.txt

David Lowndes 4,726 Reputation points

2021-09-15T19:58:50.397+00:00

I suggest that you report this as a (potential) bug against VS using the Help, Send Feedback, Report a Problem facility. I say potential bug as there may be something in the C++ standards that makes this unexpected behaviour correct.

FWIW I get the same results as you with the latest VS2022 preview.

Please post a link to your bug report back here so that anyone interested can follow up on it.

Accepted answer

0 additional answers

Your answer

David Lowndes 4,726 Reputation points

2021-09-15T19:58:50.397+00:00

I suggest that you report this as a (potential) bug against VS using the Help, Send Feedback, Report a Problem facility. I say potential bug as there may be something in the C++ standards that makes this unexpected behaviour correct.

FWIW I get the same results as you with the latest VS2022 preview.

Please post a link to your bug report back here so that anyone interested can follow up on it.

Answer 1

Igor Tandetnik 1,116

Add setlocale(LC_ALL, ".936"); at the top of the program, or setlocale( LC_ALL, "" ); if on a machine where the system codepage is 936. The program starts with "C" locale initially, it doesn't automatically pick up the system locale.

The C standard does in fact require printf family of functions to fail when a wide character cannot be converted to multibyte. errno should be set to EILSEQ

Alan 21 Reputation points

2021-09-16T19:11:14.857+00:00

Thanks. That indeed appears to be the behavior being encountered here. So at least now I know why strsafe.h family is failing this way, even if ultimately I still cannot achieve the desired behavior like wsprintfA() provided while using the ANSI versions of the srtsafe.h functions.

Honestly it was a surprise to me that strsafe.h was implemented by the CRT. I had expected this was a set of Windows APIs for some reason. Documentation such as https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-wsprintfa recommended using strsafe.h as the replacement, without mentioning that it doesn't actually provide the same behavior.

But that the CRT locale behavior is different, and the CRT failure requirement is different, certainly makes sense. Thanks for explaining that.

Share via

_snprintf fails when given "%ls" format to display WCHAR string containing DBCS characters

0 additional answers

Your answer