GetShortPathNameW returns ? in the shortened 8.3 name

Question

Hi!
I have a special case. Suppose you create files in the file system that uing a DBCS.
For reference, suppose you have:

In text (in case you want to reproduce it) this is the path:
C: emp\UnicodePV\BIS\新しいフォルダー\喀媾彌拿杤歃濬\彌拿杤歃濬畚秉綵臀藹.xlsx

See I mix several DBCS languages.

Doing a dir /x gives me this:

Notice that the short file name is also in DBCS. Remember that to have this short file name with Unicode characters you must use a DBCS in the "Language for non-Unicode programs" in the regional settings.

In this case now I have "Korean (korea)" in my "Language for non-Unicode programs".
Now I do a little program that receives a parameter that is a file path and it just calls GetShortPathNameW and then it calls WideCharToMultiByte and it displays it in the msgbox. In my case I have this:

This is the part of the program I used:

Why does this API return a Unicode character that cannot be transformed into ANSI? I tried with FindFirstFileW but it's the same.
The 8.3 file name is supposed to work with non-Unicode programs and what I do is what is mentioned in many sites... the code is OK since if I copy the content of the command prompt and I paste it in Notepad and I save the Notepad .txt document using ANSI, when I open that .txt again I see the ? characters

Thanks

Answer

According to Double-byte Character Sets,

Each DBCS code page supports different characters, but no page supports the full breadth of characters provided by Unicode. Each DBCS code page supports a different subset, differently encoded. Data converted from one DBCS code page to another is subject to corruption because the same data value on different code pages can encode a different character. Data converted from Unicode to DBCS is subject to data loss, because a given code page might not be able to represent every character used in that particular Unicode data.

Also Several Unicode and character set functions allow your applications to handle code pages. An application can use the GetCPInfo and GetCPInfoEx functions to obtain information about a code page. This information includes the default character used when a character in a converted string has no corresponding entry in the code page. And You'd better check what is CP_OEMCP(The current system OEM code page).

Share via

GetShortPathNameW returns ? in the shortened 8.3 name

1 answer