Getting the wchar_t string length in C++

Question

Getting the wchar_t string length in C++

thebluetropics 1,046

Win32 uses UTF-16 encoding for wchar_t to store strings.
Each character can eat from 1 wchar_t to 2 wchar_t(s), depending on the character.

However, either wcslenW() and lstrlenW() does not count character with 2 wchar_t's as single character.

This Japanese Kanji uses only 1 code unit (1 wchar_t):

   wchar_t text[] = L"私";  
   int length = lstrlenW(text); // Outputs 1, as expected

However, when I use Chinese Character U+2070E, that uses 2 code units (2 wchar_t's), it counts as two character instead of 1 character.
I can't put the code here for some reason, here is the link to the code.

So, I assume that lstrlenW() and wcslen() is counting the total of wchar_t, not the total of characters :I

Is there a way to get the correct length of wchar_t in Win32 applications?

David Lowndes 4,726 Reputation points

2022-10-05T08:38:34.197+00:00

Maybe you're looking for something like this: https://github.com/andlabs/libui/blob/master/windows/graphemes.cpp
What are you going to do with this value once you've got it?
thebluetropics 1,046 Reputation points

2022-10-05T08:54:10.963+00:00

Character selection (indexing through characters), and for other useful purposes...
Xiaopo Yang - MSFT 12,731 Reputation points Microsoft External Staff

2022-10-05T09:33:02.32+00:00

IS_SURROGATE_PAIR is no problem. IsDBCSLeadByte may also be ok.
Viorel 122.5K Reputation points

2022-10-05T09:36:05.73+00:00

The solution already exists: _mbstrlen.
Xiaopo Yang - MSFT 12,731 Reputation points Microsoft External Staff

2022-10-05T09:44:23.63+00:00

But I haven't found the way using _mbstrlen to convert Double-byte character. but Single-byte character like that Japanese example is ok.

Accepted answer

0 additional answers

Your answer

David Lowndes 4,726 Reputation points

2022-10-05T08:38:34.197+00:00

Maybe you're looking for something like this: https://github.com/andlabs/libui/blob/master/windows/graphemes.cpp
What are you going to do with this value once you've got it?
thebluetropics 1,046 Reputation points

2022-10-05T08:54:10.963+00:00

Character selection (indexing through characters), and for other useful purposes...
Xiaopo Yang - MSFT 12,731 Reputation points Microsoft External Staff

2022-10-05T09:33:02.32+00:00

IS_SURROGATE_PAIR is no problem. IsDBCSLeadByte may also be ok.
Viorel 122.5K Reputation points

2022-10-05T09:36:05.73+00:00

The solution already exists: _mbstrlen.
Xiaopo Yang - MSFT 12,731 Reputation points Microsoft External Staff

2022-10-05T09:44:23.63+00:00

But I haven't found the way using _mbstrlen to convert Double-byte character. but Single-byte character like that Japanese example is ok.

Answer 1

Xiaopo Yang - MSFT 12,731 Microsoft External Staff

As Double-byte Character Sets pointed, use _mbs version function. To get the length based on locale, use _mbstrlen. see the example.

Share via

Getting the wchar_t string length in C++

0 additional answers

Your answer