(In)Security of MultiByteToWideChar and WideCharToMultiByte (Part 1)

There are a few well-known unsafe APIs in the standard C library, such as strcpy and memcpy.  These routines are unsafe as buffer and destination buffer size are not taken into consideration.  Buffer overflows may take place because destination buffer is not large enough to hold incoming data.  Safe version of APIs checks that destination buffer has enough spaces to hold the source data.  In addition, strcpy has another potential risk as it does not guarantee production of null-terminating strings.

A couple functions that are almost as dangerous are still lurking without corresponding safe versions.  They are MultiByteToWideChar and WideCharToMultiByte, which are used for translating strings between Unicode and ANSI.  https://msdn.microsoft.com/en-us/library/cc500362.aspx.  Similar to strcpy, they are dangerous because they are not guaranteed to produce null-terminated strings.  Unlike strcpy, they return 0 when destination buffer is not large enough.  Even though they perform size checks, I will explain the security pitfalls of using MultiByteToWideChar and WideCharToMultiByte. 

The first installment focuses on the possibility of producing non-null terminating strings.  Let's start with explaining the return values of MultiByteToWideChar and WideCharToMultiByte.  They return 0 when destination buffer is not large enough, or fails with other reasons.  They return the number of characters saved in the destination buffer if execution is successful.  Non-null terminated strings can be produced by both successful and unsuccessful execution of these APIs.  Let's illustrate them with a couple examples.  WideCharToMultiByte is used in all examples, as the same principle applies to MultiByteToWideChar too.

    wchar_t wstrTest1[] = L"123456789";
char strTest1[9];

    // Case 1: Parse 2 characters without a null-terminating character
int result = WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR) wstrTest1, 2, strTest1, sizeof(strTest1), NULL, NULL);

    // Result: 2 (Success) strTest1: 2 characters long without null-terminated
printf("Result: %d\n", result);
printf("Converted: %s\n", strTest1);

    // Case 2: Parse all characters, which are larger than the destination buffer
result = WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR) wstrTest1, 10, strTest1, sizeof(strTest1), NULL, NULL);

    // Result: 0 (Fail) strTest1: 9 characters long without null-terminated
printf("Result: %d\n", result);
printf("Converted: %s\n", strTest1);

From these two code examples, it illustrates that both successful and unsuccessful execution can result in non-null terminated strings.  In the first code example, WideCharToMultiByte is asked to process two characters without a trailing null character.  WideCharToMultiByte successfully converts the first two characters, but does not terminate the output string with a null character.  This case is somewhat acceptable as "garbage in, garbage out" principle applies.

However, the second case is worse. Destination buffer is not large enough to hold translated data.  WideCharToMultiByte fails correctly by returning 0, but destination buffer is nevertheless filled with data without a null trailing character.  If destination buffer is used without checking return value, a non-null terminating string will propagate through subsequent use of the string.

Some developers are aware of this issue, and they have an elegant way of handling the return value.  They add a null character to the destination buffer at the index indicated by the return value. 

    wchar_t wstrTest1[] = L"123456789";
char strTest1[9];

    // Parse 2 characters without a null-terminating character
int result = WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR) wstrTest1, 2, strTest1, sizeof(strTest1), NULL, NULL);

    // Some developers add this check
strTest1[result] = NULL;

    // Result: 2 (Success) strTest1: 2 characters long and null-terminated
printf("Result: %d\n", result);
printf("Converted: %s\n", strTest1);

When WideCharToMultiByte fails, it returns 0, and strTest1[0] is set to NULL.  It is a reasonable outcome because failed execution should return an empty string.  On the other hand,  When WideCharToMultiByte succeeds, it returns the number of characters written to the destination buffer. strTest1[result] means that a null character is appended to the return string.

The above code works very elegantly in most cases, except a boundary case.  It is always the boundary case, isn’t it?  It has a subtle non-exploitable buffer overflow bug.  What happen if the return value is equal to number of characters allowed in the destination buffer? 

When that occurs, a null character is written past the end of the destination buffer.  It can potentially destabilize the system and cause denial of service. There are many ways to fix this issue.  A fix is to add a conditional statement to handle where return value is equal to the number of characters in the buffer.  Another fix is to increase the size of destination buffer by 1 to accommodate the extra null character.  This example illustrates the first fix.

    wchar_t wstrTest1[] = L"123456789";
char strTest1[10];

    // Parse all characters
int result = WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR) wstrTest1, 10, strTest1, sizeof(strTest1), NULL, NULL);
if (result == sizeof(strTest1))
strTest1[result-1] = NULL;
else
strTest1[result] = NULL;

    // Result: 10 (Success) strTest1: 9 characters long and null-terminated
printf("Result: %d\n", result);
printf("Converted: %s\n", strTest1);

Developer Checklist

1. Always check the return values of MultiByteToWideChar and WideCharToMultiByte.  If not, code reviewers should file them as bugs
2. Adopt the suggested fixes if you don’t need any custom error handling

 

There are more common pitfalls related to using these two APIs.  It will be in the next installment.