Short version: If I have a WCHAR string containing characters that would require DBCS code page 936 to map successfully in ANSI output, the ANSI-based CRT _snprintf_s returns -1 instead of attempting to map them. Regardless of whether the ANSI code page is currently 936 or not. Anyone know what the CRT functions are balking at here when asked to process a WCHAR string into ANSI output?
Okay, so I'm asking _snprintf_s, an ANSI-based string print, to include a WCHAR string as part of the output. I've done this by specifying the format string "%ls" (or "%ws", your choice) so that _snprintf_s knows the parameter is to a WCHAR string buffer.
My expectation was that I should get a result similar to calling WideCharToMultiByte( CP_ACP ) against the WCHAR string. Where _snprintf_s will print "as many characters as will actually map to the current ANSI code page", but then a placeholder character like "?" will be substituted for every character which did not have a representation in the current ANSI code page.
What actually happens is that I get a -1 return from _snprintf_s instead. As though there is "insufficient buffer", even though my source WCHAR string is only 36 characters including terminating NULL, would take 39 characters to encode using ANSI (if using DBCS code page 936), and has a 255-character output buffer size indicated and available.
What's more unusual is that this happens "even if ANSI conversion of the entire WCHAR string is possible." Meaning, when I actually am on a machine with the ANSI code page set to 936, and all the characters in my WCHAR string are available in ANSI code page 936. _snprintf_s continues to return -1 instead of successfully processing the "%ls" format specification to include the WCHAR string.
For example, if my test string is "未更新 123456789012345678901234567.txt" ("\u672a\u66f4\u65b0" for those first three characters) and I'm on a machine with ANSI code page 1252 set, the _snprintf_s call fails with -1. "Maybe I understand that" since indeed the ANSI code page can't represent those characters. I might have "preferred" those characters be substituted with the default character like "?", but fine: Failure.
But on a machine where the ANSI code page (both current user and system default) are code page 936 (Windows 10 21H1 Chinese Traditional), the _snprintf_s still fails with -1. As though "simply presence of these characters" is the real problem, and not "the ability to match them in the current ANSI code page for output." Yes, if I simply remove those characters, or replace them with "123" instead of "未更新", the same calls with the same buffers and format string work fine.
The "real problem" is that I encounter this issue with StringCchVPrintfA (and the entire strsafe.h family). I'm picking on the CRT _snprintf_s implementation only because the CRT's _vsnprintf_l is how StringCchVPrintfA is implemented (as seen under the debugger). StringCchVPrintfA gives me ERROR_INSUFFICIENT_BUFFER (0x8007007A) in response to the -1 _vsnprintf_l returned when "%ls" was specified to display the WCHAR string.
The kicker, of course, is that using a Windows API instead of the CRT or strsafe.h, the expected successful behavior occurs. Meaning through wsprintfA() on a machine using ANSI code page 936, the WCHAR string gets fully represented in the ANSI output. And on an ANSI code page 1252 machine where the characters aren't available, I get the default characters "???" in place of the characters which were impossible to represent.
But of course wsprintfA() doesn't allow controlling the output buffer extent, and is "unsafe".
The Windows console test application I'm using to demonstrate this issue is attached. Note I'm linking this against the debug CRT under Visual Studio 2019, but I do not seem to be getting any invocation of the invalid parameter handler when this issue occurs. Which I guess is "true", since "insufficient buffer" is not actually the problem here.
132442-stringcchtestcpp.txt