Review It

Expert Tips for Finding Security Defects in Your Code

Michael Howard

This article assumes you're familiar with Security, C#, and Visual Basic .NET

Level of Difficulty123

SUMMARY

Reviewing code for security defects is a key ingredient in the software creation process, ranking alongside planning, design, and testing. Here the author reflects over his years of code security reviews to identify patterns and best practices that all developers can follow when tracking down potential security loopholes. The process begins by examining the environment the code runs in, considering the roles of the users who will run it, and studying the history of any security issues the code may have had. After gaining an understanding of these background issues, specific vulnerabilities can be hunted down, including SQL injection attacks, cross-site scripting, and buffer overruns. In addition, certain red flags, such as variable names like "password", "secret," and other obvious but common security blunders, can be searched for and remedied.

Contents

Allocating Time and Effort
Buffer Overruns in C and C++
Integer Overflows in C and C++
Database Access Code in Any Language
Web Page Code in Any Language
Secrets and Cryptography in Any Language
ActiveX Controls in Visual Basic and C++
Conclusion

Alarge part of my job involves reviewing other people's code, looking for security errors. Admittedly, it's not my number one task—that tends to be design review and threat modeling—but I sure get to see quite a lot of code.

Hopefully, you realize that reviewing other people's code, while a good thing to do, is not how you create secure software. You produce secure software by having a process to design, write, test, and document secure systems, and by building time into the schedule to allow for security review, training, and use of tools. Simply designing, writing, testing, and documenting a project, and then looking for security bugs doesn't create secure software. Code reviewing is just one part of the process, but by itself does not create secure code.

In this article I will not discuss the nature of code vulnerabilities such as integer overflow attacks, SQL injection, and buffer overruns; you can learn about these issues in books (such as my book Writing Secure Code, Microsoft Press®, 2002). Instead, I will be taking a high-level look at issues I watch for during a code review. Before I get started, though, I want to point out that this is how I review code for security bugs; it's not necessarily how you should review code, nor can I guarantee it to be the most complete form of review for certain kinds of defects. I want to document what goes on in my head when I look at code in the hopes that it will prove helpful to you as well.

As I see it, there are three ways to review code: in-depth analysis, rapid analysis, and a hybrid approach. I tend to focus on a hybrid approach since it has the advantage of covering a lot of ground quickly; if I see something that I think requires more in-depth analysis I will flag it for a future code review, possibly involving other experts in the area of concern. But for the moment, let me discuss the initial rapid code review or, as I like to call it, the Sweep 'n' Tag method—sweep the code quickly, and tag code requiring further review. The following outlines my process.

Allocating Time and Effort

I have a ranking system I use to determine how much relative time I need to spend reviewing the code. The system is based on the damage potential if a vulnerability is exploited and the potential for attack. The quota system is based on the following traits:

  • Does the code run by default?
  • Does the code run with elevated privileges?
  • Is the code listening on a network interface?
  • Is the network interface unauthenticated?
  • Is the code written in C/C++?
  • Does the code have a prior history of vulnerability?
  • Is this component under close scrutiny by security researchers?
  • Does the code handle sensitive or private data?
  • Is the code reusable (for example, a DLL, C++ class header, library, or assembly)?
  • Based on the threat model, is this component in a high-risk environment or subject to many high-risk threats?

If I get more than three or four positive hits from this list, then I'll review the code in more depth. In fact, if the code is listening on a Transmission Control Protocol (TCP) or User Datagram Protocol (UDP) socket and is running by default, be prepared to spend a lot of time reviewing the code.

In looking for security errors, I tend to review three major categories of code C/C++ code, web-server application code (such as ASP, ASP.NET, CGI, and Perl), and managed code (C# mainly, and some Visual Basic® .NET).

There are some nuances of each language of which you should be aware. First, the number one issue with C and C++ is buffer overruns. Admittedly, there are other issues too, but when you hear the words "buffer" and "overrun" in the same sentence you can almost always guarantee that C or C++ is involved. Higher-level languages such as C#, Visual Basic .NET, and Perl should have no buffer overruns. If there is one, the bug is probably in the runtime environment and not in the code under review. However, such languages are often used in Web server application code and are subject to other kinds of defects. Buffer overruns are nasty because the attacker can inject code into the running process and hijack it. So let's look at buffer overruns first.

Buffer Overruns in C and C++

Buffer overruns are the scourge of the software industry and you should do everything in your power to remove them from your code. Better yet, make sure that they don't get into the code in the first place.

There are two ways I review for buffer overruns. The first is to identify all the entry points into the application, especially network entry points, tracing the data as it moves through the code and questioning how that data is being manipulated. I assume that all data is malformed. When looking at any code that touches (reads from or writes to) the data, I ask, "Is there a version of the data that can cause this code to fail?" This method is thorough, but very time consuming. Another technique is to look for known and potentially dangerous constructs and trace the data back to the point of entry. For example, take a look at this code:

void fuction(char *p) { char buff[16]; ••• strcpy(buff,p); ••• }

If I see code like this, I'll trace the variable p back to its source, and if it came from somewhere I don't trust, or is not checked for validity close to the point at which it's copied, then I know I've found a security bug. Note that strcpy by itself is not dangerous or insecure. Rather, it's the data that makes functions like this scary. If you check that the data is well formed, then strcpy can be safe. Of course, if you are wrong, then you have a security bug. I also check for the "n" string manipulation functions, such as strncpy, because you also need to check that the buffer size calculation is correct.

I'm wary of code that handles tagged file formats. By tagged I mean files that are made of chunks, where each chunk has a header that describes the next chunk of data. A good example is the MIDI music format. A serious security bug was found and fixed in a Windows component called quartz.dll that handles MIDI files. A malformed MIDI construct caused the code that was handling the file to fail, or worse. You can read more about this bug at Unchecked Buffer in DirectX Could Enable System Compromise.

Another construct I look out for is this:

while (*s != '\\') *d++ = *s++;

This loop is constrained by a character in the source; it is not constrained by the destination size. Basically, I scan for *x++ = *y++ using the following regular expression:

\*\w+\+\+\s?=\s?\*\w+\+\+

Of course, people can also use *++x = *++y, so you need to scan for this, too. I want to stress once again that this construct is not dangerous unless the source isn't trusted, so you need to determine the trustworthiness of the source data.

Next is another kind of issue related to buffer overruns that you should be aware of: the integer overflow vulnerability.

Integer Overflows in C and C++

If you don't know what integer overflow attacks are and how to fix them, you should read my article at Development Impacts of Security Changes in Windows Server 2003 first. The real security breach occurs with these defects when arithmetic is performed to calculate a buffer size and the calculation causes an overflow or underflow. Look at the following example:

void func(char *b1, size_t c1, char *b2, size_t c2) { const size_t MAX = 48; if (c1 + c2 > MAX) return; char *pBuff = new char[MAX]; memcpy(pBuff,b1,c1); memcpy(pBuff+c1,b2,c2); }

The code looks fine until you realize that there's an issue if you add c1 and c2 together and the result wraps beyond 232-1. For example, the result of adding 0xFFFFFFF0 and 0x40 together is 0x30 (48 decimal). When these values are used for c1 and c2, the addition passes the size check, and the code then copies nearly 4GB into a 48-byte buffer. You just got a buffer overrun! Many bugs like this one are exploitable, allowing an attacker to inject code into your process.

When reviewing C and C++ code for integer overflows, I look for all instances of the operator new and dynamic memory allocation functions (alloca, malloc, calloc, HeapAlloc, and so on) and then determine how the buffer size is calculated. I then ask myself questions like the following:

  • Can the values wrap beyond some maximum value?
  • Can the values wrap below zero?
  • Is the data truncated (copying a 32-bit value into a 16-bit value, and then copying the 32-bit size)?

There's a rule of thumb employed by a colleague of mine at Microsoft: if you perform a mathematical operation in an expression that is used in a comparison, then you have a potential to overflow and underflow data. This can be doubly bad if the calculation is used to determine a buffer size, especially if one or more of the buffer-size calculation elements is cracked by an attacker.

Database Access Code in Any Language

As a general rule, developers write database applications in higher-level languages like C#, scripting languages, and the like. Relatively speaking, very little database code is written in C and C++, but some people use various C/C++ class libraries such as the CDatabase class in MFC.

There are two issues that can be detected. The first is connection strings that include hardcoded passwords or connect using administrative accounts. The second issue that can be detected is SQL injection attack vulnerabilities.

When I'm looking at managed code, the first thing I do is search all the code for the System.Data namespace, especially System.Data.SqlClient. The alarm bells ring when I see these! Next, I look for words like "connect" in the code (usually a connection string is close by). The connection string has two interesting properties to look for: the connection id (often, uid) and the password (often, pwd). Something like this is a potential security hole:

DRIVER={SQL Server};SERVER=hrserver;UID=sa;PWD=$esame

Actually, there are two bugs in this example. First, the connection is made as the sysadmin account, sa; this defeats the principle of granting the least privilege necessary. Code should never connect to a database as the sysadmin account because such an account can wreak havoc on the database when used by people with evil intentions. Second, the password is hardcoded. This is wrong for two reasons: first, it will be discovered, and second, what if the password changes? (You would have to update all the clients.)

The next topic is SQL injection attacks. The crux of SQL injection is using string concatenation to build SQL statements. When scanning the code, I look to see where SQL statements are created. Generally speaking, this involves searching for words like "update", "select", "insert", "exec", and any table or database names I know are used. To help out, I use the following ildasm.exe on the managed assembly under scrutiny:

ildasm /adv /metadata /out:file test.exe

I then look at the "User Strings" section in the resulting output. If I find any database query using string concatenation, it's a potential security defect and must be fixed by using parameterized queries instead.

Using string concatenation to build stored procedures is no cure for SQL injection either. In short, string concatenation plus SQL statements is bad, but string concatenation plus SQL statements plus sysadmin account equals disaster.

Web Page Code in Any Language

The most common errors in Web-based applications are cross-site scripting (XSS) issues. Although there are other issues I look for, such as SQL injection and poor cryptography, XSS bugs are quite common. The core XSS vulnerability is the potential to display untrusted user input in the victim's browser, so first I search for any code construct that sends data to the user. For example, in ASP I look for Response.Write and <%= %> tags. Next, I look at the data being written to see where it came from. An XSS bug exists if the data came from an HTTP entity, such as a form or a querystring, and was not checked for validity and is sent to a user's browser. Here's a grossly simple, but not uncommon XSS example:

Hello, <% Response.Write(Request.QueryString("Name")) %>

As you can see, the "Name" parameter is sent back to the user without first checking that it is valid and well-formed.

Secrets and Cryptography in Any Language

Some developers have a penchant for storing secret data in code, such as passwords and cryptographic keys, and creating their own magic cryptographic algorithms. Don't do either!

I first look for variable names and functions with names that include "key," "password," "pwd," "secret," "cipher," and "crypt." Any hit needs analysis. You can often get false positives for "key," but the others are interesting and may yield embedded secret data, or "magic" cryptography systems. When searching for cryptographic algorithms, I also look for XOR operators, as they are commonly used in crypto. The worst code is code that uses an embedded key to XOR a data stream!

ActiveX Controls in Visual Basic and C++

I always ask one question when I review a new ActiveX® control: why isn't it written in managed code? I ask this question because managed code allows for partial-trust scenarios, which ActiveX does not.

Next, I look at all the methods and properties on the control (the .IDL file is the best source for this) and I put myself in the mindset of a bad guy. What nasty things could I do with each of these methods or properties? Generally, many methods are named with a VerbNoun format, such as ReadRegistry, WriteFile, GetUserName, and NukeKey, so I look for onerous-sounding verbs, and nouns (resources) that pertain to sensitive resources.

For example, a SendFile method is potentially dangerous if the attacker can access any file on the user's hard disk, and send it to any location, such as a Web site under an attacker's control! Anything that accesses resources on the user's machine is subject to further scrutiny.

I do extra review work if the control is marked safe for scripting (SFS) because it could then be invoked in the Web browser without warning the user. You can determine if the control is SFS if it implements the ATL IObjectSafetyImpl interface or sets the following "safe for scripting" or "safe for activation" implementation categories at install time:

[HKEY_CLASSES_ROOT\CLSID\<GUID>\Implemented Categories\{7DD95801-9882-11CF-9FA9-00AA006C42C4}] [HKEY_CLASSES_ROOT\CLSID\<GUID>\Implemented Categories\{7DD95802-9882-11CF-9FA9-00AA006C42C4}]

Earlier I mentioned that accessing and sending users files through a SendFile method is bad. In fact, it's a privacy bug if I can access the SendFile method and determine that a file exists on the user's hard drive based on the error code returned by the method.

Conclusion

This is a very high-level look at the first pass I take when reviewing code. Many of these bugs are simple, and one could argue that developers should not make such errors, but they do. Knowing that your code will be reviewed for security, however, usually leads you to write more secure code in the first place.

You may also have noticed that a common theme with some of the defect types is that many are caused by untrusted input. When reviewing code, you should always ask where that data comes from, and whether you can trust it.

For background information see:
Security Developer Center

Michael Howard is a senior program manager, a founding member of the Secure Windows Initiative group at Microsoft, and a coauthor of Writing Secure Code (Microsoft Press, 2002). He focuses on designing, building, testing, and deploying applications designed to withstand attack, and yet still be usable by millions of non-technical users.