An Overlooked Construct and an Integer Overflow Redux

 

Michael Howard
Secure Windows Initiative

September 19, 2003

Summary: Michael Howard investigates an often-ignored code construct that can lead to serious buffer overruns, and then looks an alternate way to perform arithmetic without overflow side effects. (7 printed pages)

A Look at Constructs

It's funny how many security guidance documents warn people about dangerous functions. In C and C++, there are very few dangerous functions, but one thing I can say for sure is that there are plenty of dangerous developers using C and C++.

So, you're probably asking, "Michael, what on earth are you talking about?"

I have to admit I'm tired of seeing documents that say certain functions are dangerous and you should replace them with safer versions. For example, "Don't use strcpy, it's evil. You should use strncpy instead, because it's safe." Nothing could be further from the truth. It's possible to have safe code that uses strcpy, and unsafe calls to strncpy.

Functions like strcpy are potentially dangerous because the source data is larger than the destination buffer and it came from an untrusted source. If the source data comes from a trusted source or is tested for validity before being copied, then the call to strcpy is safe:

void func(char *p) {
   const int MAX = 10;
   char buf[MAX + 1];
   memset(buf,0,sizeof(buf));

   if (p && strlen(p) <= MAX) 
      strcpy(buf,p);
}

Believe it or not, I'm going somewhere with this example. There's an often-overlooked construct that can lead to buffer overruns, and it's not a function call. It's this:

while (<some condition>) 
   *d++ = *s++;

No function call here—this is the coding construct in DCOM that led to the Blaster worm. You can read more about the bug fix in Buffer Overrun In RPC Interface Could Allow Code Execution.

The code looked like this:

HRESULT GetMachineName(WCHAR *pwszPath) {
    WCHAR  wszMachineName[N + 1]) 
    LPWSTR pwszServerName = wszMachineName;
    while (*pwszPath != L'\\' )
        *pwszServerName++ = *pwszPath++;   
    ...
}

The problem here is that the while loop is bounded by some character in the source string. It is not restricted by the size of the destination buffer. In other words, it's ripe for a buffer overrun if the source data is untrusted.

I wrote a simple Perl script to search for these kinds of constructs in C and C++ code. Be aware that every instance flagged by this script is not a bug and you need to determine if the source data is trusted or not.

use strict;
use File::Find;

my $RECURSE = 1;

###################################################
foreach(@ARGV) {
  next if /^-./;
  if ($RECURSE) {
      finddepth(\&processFile,$_);
  } else {
      find(\&processFile,$_);
  }
}
###################################################
sub processFile {
  my $FILE;
  my $filename = $_;
  
  if (!$RECURSE && ($File::Find::topdir ne $File::Find::dir)) {
    $File::Find::prune = 1;
    return;
  } 
  
  # Only accept C/C++ and header extensions
  return if (!(/\.[ch](?:pp|xx)?$/i));
     
  warn "$!\n" unless open FILE, "<" . $filename;
  
  # reset line number
  $. = 0;

  while (<FILE>) {
    chomp;
    s/^\s+//;
    s/\s+$//;

  if (/\*\w+\+\+\s{0,}=\s{0,}\*\w+\+\+) {
   print $filename . " " . $_ . "\n";
  }
}

Note This script only finds the *p++ construct, and not the *++p construct.

Assuming you find a bug, one way to make the code more secure is to limit the copied data to be no larger than the destination:

HRESULT GetMachineName(WCHAR *pwszPath) {
    WCHAR  wszMachineName[N + 1]) 
    LPWSTR pwszServerName = wszMachineName;

    size_t cbMachineName = N;
    while (*pwszPath != L'\\' && --cbMachineName)
        *pwszServerName++ = *pwszPath++;   
    ...
}

Finally, any memory copy function or construct that is not constrained by the size of the destination should be heavily scrutinized.

A Little More about Integer Overflows

In a previous article, Reviewing Code for Integer Manipulation Vulnerabilities, I discussed security defects associated with simple mathematical operations called integer overflows.

I recently gave a lecture to Microsoft engineers as part of the ongoing Trustworthy Computing Engineering Series about integer overflows. In the talk, I outlined how to find them and how to fix them. I was amazed at the number of e-mails I received saying my remedy was okay, but was fraught with danger. Allow me to explain.

In the column, I stated that code like this:

if (A + B > MAX) return -1;

Should be changed to this:

if (A + B >= A && A + B < MAX) {
   // cool!
}

Someone pointed out that three years from now, someone will look at this code, wonder what it's for, and remove the A+B >= A portion because it seems redundant, and now you have an integer overflow regression. Oops!

In response, I wrote the following header file, which is a little more obvious in its intent. Yes, it looks like gibberish, but that gibberish is x86 assembly language. I used assembly language because it gave me direct access to the jc operand, which is jump-on-carry. In other words, it detects if the math operation that led to an overflow.

#ifndef _INC_INTOVERFLOW_
#define _INC_INTOVERFLOW_

#ifdef _X86_

inline bool UAdd(size_t a, size_t b, size_t *r) {

   __asm {
      mov         eax,dword ptr [a] 
      add         eax,dword ptr [b] 
      mov         ecx,dword ptr [r] 
      mov         dword ptr [ecx],eax 
      jc         short j1
      mov         al,1 
      jmp         short j2
j1:

#ifdef _DEBUG
      int         3
#endif
      xor         al,al
j2:
   };
}

inline bool UMul(size_t a, size_t b, size_t *r) {

   __asm {
      mov         eax,dword ptr [a] 
      mul         dword ptr [b] 
      mov         ecx,dword ptr [r] 
      mov         dword ptr [ecx],eax 
      jc         short j1
      mov         al,1 
      jmp         short j2
j1:

#ifdef _DEBUG
      int         3
#endif
      xor         al,al
j2:
   };
}

inline bool UMulAdd(size_t mul1, size_t mul2, size_t add, size_t *r) {

   size_t t = 0;
   if (UMul(mul1,mul2,&t))
      return UAdd(t,add,r);
   return false;
}

#else
#   error \\"This code compiles only on 32-bit x86
#endif // _X86_

#endif // _INC_INTOVERFLOW_

View this file, which includes simple documentation explaining the functions.

Spot the Security Flaw

It took a while, but a number of people found the defect. Comparing strings, when used to make security decisions, should be culture invariant or be a byte-wise comparison. In the example, I painted the code that would have allowed access to sensitive data in Turkey, because the Turkish language has four instances of the letter "I"—two lowercase and two uppercase. You can read about it at https://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemstringclasscomparetopic5.asp.

Now let's move on to this month's error. What's up with this code?

void func(char *p) {
   char buf[10+1];
   memset(buf,0,sizeof(buf));
   
   // limit string to 10 chars
   sprintf(buf,"%10s",p);
   printf("Hello, %s\n",buf);
}

A Little Puzzle

This puzzle has absolutely nothing whatsoever to do with security, but I dragged this up from the depths of my memory when I thought about people being confused by the integer overflow detection code. What does this code do?

int a = 0x42;
int b = 0x69;

a ^= b;
b ^= a;
a ^= b;

The rules of this game are simple—you cannot compile or interpret this code. Try to determine what it does simply by looking at it.

Michael Howard is a Senior Security Program Manager in the Secure Windows Initiative group at Microsoft and is the coauthor of Writing Secure Code, now in its second edition, and the main author of Designing Secure Web-based Applications for Windows 2000. His main focus in life is making sure people design, build, test, and document nothing short of a secure system. His favorite line is "One person's feature is another's exploit."