GetShortPathNameW returns ? in the shortened 8.3 name

Pablo Glomby 186 Reputation points
2022-03-14T16:36:41.407+00:00

Hi!
I have a special case. Suppose you create files in the file system that uing a DBCS.
For reference, suppose you have:
182899-image.png

In text (in case you want to reproduce it) this is the path:
C:\temp\UnicodePV\BIS\新しいフォルダー\喀 媾 彌 拿 杤 歃 濬\彌 拿 杤 歃 濬 畚 秉綵 臀 藹.xlsx

See I mix several DBCS languages.

Doing a dir /x gives me this:
182900-image.png

Notice that the short file name is also in DBCS. Remember that to have this short file name with Unicode characters you must use a DBCS in the "Language for non-Unicode programs" in the regional settings.

In this case now I have "Korean (korea)" in my "Language for non-Unicode programs".
Now I do a little program that receives a parameter that is a file path and it just calls GetShortPathNameW and then it calls WideCharToMultiByte and it displays it in the msgbox. In my case I have this:
182945-image.png

This is the part of the program I used:
182919-image.png

Why does this API return a Unicode character that cannot be transformed into ANSI? I tried with FindFirstFileW but it's the same.
The 8.3 file name is supposed to work with non-Unicode programs and what I do is what is mentioned in many sites... the code is OK since if I copy the content of the command prompt and I paste it in Notepad and I save the Notepad .txt document using ANSI, when I open that .txt again I see the ? characters

Thanks

Windows API - Win32
Windows API - Win32
A core set of Windows application programming interfaces (APIs) for desktop and server applications. Previously known as Win32 API.
2,523 questions
C++
C++
A high-level, general-purpose programming language, created as an extension of the C programming language, that has object-oriented, generic, and functional features in addition to facilities for low-level memory manipulation.
3,637 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Xiaopo Yang - MSFT 12,231 Reputation points Microsoft Vendor
    2022-03-15T02:38:24.46+00:00

    According to Double-byte Character Sets,

    Each DBCS code page supports different characters, but no page supports the full breadth of characters provided by Unicode. Each DBCS code page supports a different subset, differently encoded. Data converted from one DBCS code page to another is subject to corruption because the same data value on different code pages can encode a different character. Data converted from Unicode to DBCS is subject to data loss, because a given code page might not be able to represent every character used in that particular Unicode data.

    Also Several Unicode and character set functions allow your applications to handle code pages. An application can use the GetCPInfo and GetCPInfoEx functions to obtain information about a code page. This information includes the default character used when a character in a converted string has no corresponding entry in the code page. And You'd better check what is CP_OEMCP(The current system OEM code page).