Bug in WIN32 API ReadConsoleW for UTF-16 surrogate pairs for echoed output ?!

Florian Thake 5 Reputation points
2023-01-12T17:53:08.0966667+00:00

Hello,

I am developing an command line application which shall be able to fully support Unicode.

To be able to do this, the application
a) runs in modern "Windows Terminal"
b) uses ReadConsoleW / WriteConsoleW instead of the C and C++ standard library functions.

After b was ensured for all code pathes, everything is working fine, except one thing:

I use ReadConsoleW for get an input line from the user. ReadConsoleW has some handy features, which I want to use, like a built in history (arrow up/down) and also implement proper cursor movement/backspace/del key and so on.
Unfortunately it struggles on the automatic echoed user input.
It seems to be that Unicode glyphs which are encoded as an UTF-16 surrogate pair (means 2 UTF-16 words) cannot be displayed via that automatic echo.
More details:
Use ReadConsoleW on the STD_INPUT_HANDLE with (default) enabled ENABLE_LINE_INPUT and ENABLE_ECHO_INPUT for read in user input on the Terminal. It will have garbage in its produced echo for all Unicode chars which are assembled as UTF-16 surrogate pairs (need 2 16 bit words), e.g. with 🚀 🍀 🔥.

It works if ENABLE_LINE_INPUT is removed, but then all handy input features are also gone (e.g. no input delete when backspace, no history when press UP/DOWN ARROW, etc...)

So, the ENABLE_LINE_INPUT is the default mode for STD_INPUT_HANDLE, I expect that it works with the full set of Unicode characters and not only with the first Unicode plane.

Is there a way to make it working?

Or is there an alternative way to get the handy input features when ENABLE_LINE_INPUT is set?

Do you consider it as a bug?
here is a screenshot:
ReadConsoleW_garbage_echo

here is a minimal example program

#include <cstdlib> // EXIT_SUCCESS
#include <Windows.h>

int wmain( int argc, wchar_t **argv )
{ 
    ::WriteConsoleW( ::GetStdHandle( STD_OUTPUT_HANDLE ), L"Type a string: ", 15, 0x0, 0x0);

    HANDLE h = ::GetStdHandle( STD_INPUT_HANDLE );
    DWORD mode = 0;
    ::GetConsoleMode( h, &mode );
    // this is set per default already, just for name it here explicitly. 
    ::SetConsoleMode( h, mode | (ENABLE_LINE_INPUT | ENABLE_ECHO_INPUT) );

    wchar_t  wbuf[128] = {};
    ::memset( wbuf, 0, sizeof( wbuf ) );
    DWORD read = 0;

    //the automatic produced echo is garbage for UTF-16 surrogate pairs
    if( !::ReadConsoleW( h, wbuf, ARRAYSIZE( wbuf ) - 2, &read, 0x0 ) ) {
        return EXIT_FAILURE;
    }
    if( read > 0 && IS_HIGH_SURROGATE( wbuf[read - 1] ) ) {
        // try to read one more character
        DWORD extra = 0;
        if( ::ReadConsoleW( h, wbuf + read, 1, &extra, NULL ) && extra == 1 ) {
            ++read;
        }
    }
    
    wbuf[read] = L'\0'; // ensure zero terminated.
    
    // output it via WriteConsoleW (this works correctly)
    ::WriteConsoleW( ::GetStdHandle( STD_OUTPUT_HANDLE ), wbuf, read, 0x0, 0x0 );

    ::SetConsoleMode( h, mode );

    return EXIT_SUCCESS;
}

Thank you very much in advance!
Kind Regards,
Florian Thake

C++
C++
A high-level, general-purpose programming language, created as an extension of the C programming language, that has object-oriented, generic, and functional features in addition to facilities for low-level memory manipulation.
3,702 questions
{count} vote

2 answers

Sort by: Most helpful
  1. Minxin Yu 11,581 Reputation points Microsoft Vendor
    2023-01-13T07:37:22.2266667+00:00

    Hi, @Florian Thake
    You could report the problem to Developer Community.
    I tested simple command in Windows Terminal. The image displays correctly if the Windows Terminal is resized.

    echo ??
    

    Resized-> echo: 🚀

    Best regards,

    Minxin Yu


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment". Note: Please follow the steps in our [documentation][3] to enable e-mail notifications if you want to receive the related email notification for this thread.


  2. Xiaopo Yang - MSFT 12,726 Reputation points Microsoft Vendor
    2023-02-02T09:43:21.8733333+00:00

    According to ENABLE_ECHO_INPUT,

    Characters read by the ReadFile or ReadConsole function are written to the active screen buffer as they are typed into the console. This mode can be used only if the ENABLE_LINE_INPUT mode is also enabled.

    ENABLE_ECHO_INPUT could echo per byte but It's also possible that Windows Terminal uses UTF-16LE to display the screen buffer.

    MicrosoftTeams-image (15)

    In both ways you can implement your echo using Console APIs which manipulate the console screen buffer directly. See Reading and Writing Blocks of Characters and Attributes for a similar scene.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.