2019-01-14

December 2017

Volume 32 Number 12

[C++]

Visual C++ Support for Stack-Based Buffer Protection

By Hadi Brais | December 2017

When software does something it’s not supposed to do according to its functional specification, it’s said to have defects or bugs. The rules in that specification that determine when accesses and modifications to data and other resources should be allowed collectively constitute a security policy. The security policy essentially defines what it means for the software to be secure, and when a particular defect should be considered as a security flaw rather than just another bug.

Given various threats from around the world, security is more important than ever today and, as such, must be an integral part of the software development lifecycle (SDL). This includes choices such as where to store data, which C/C++ runtime APIs to use, and which tools can help make the software more secure. Following the C++ Core Guidelines (bit.ly/1LoeSRB) substantially helps in writing correct, maintainable code. In addition, the Visual C++ compiler offers many security features that are easily accessible through compiler switches. These can be classified as either static or dynamic security analyses. Examples of static security checks include using the /Wall and /analyze switches and the C++ Core Guidelines checkers. These checks are performed statically and don’t affect the generated code, though they do increase compilation time. In contrast, dynamic checks are inserted in the emitted executable binaries by the compiler or the linker. I’ll discuss in this article specifically one dynamic security analysis option, namely /GS, which provides protection against stack-based buffer overflows. I’ll explain how the code is transformed when that switch is turned on and when it can or can’t secure your code. I’ll be using Visual Studio Community 2017.

You might wonder, why not just turn on all these compiler switches and be done with it. In general, you should employ all the recommended switches regardless of whether you understand how they work. However, knowing the details of how a particular technique works enables you to determine the impact that it may have on your code and how to better make use of it. Consider, for example, buffer overflows. The compiler does offer a switch to deal with such defects, but it uses a detection mechanism that forces the program to crash when a buffer overflow is detected. Does that improve security? It depends. First, while all buffer overflows are bad, not all are security vulnerabilities and so it doesn’t necessarily mean an exploitation took place. And even if it did, the damage might have already been done by the time the detection mechanism was triggered. Moreover, depending on how your application is designed, abruptly crashing the program may not be suitable because it could by itself be a denial-of-service (DoS) vulnerability or lead to a potentially worse situation involving data loss or corruption. As I’ll explain in this article, the only reasonable thing to do is to make the application resilient to such crashes, rather than disabling or changing the protection mechanism.

I’ve written a number of articles on compiler optimizations for MDSN Magazine (you’ll find the first at msdn.com/magazine/dn904673). The goal was mainly to improve execution time. Security can also be viewed as an objective of compiler transformations. That is, rather than optimizing execution time, security would be optimized by reducing the number of potential security flaws. This perspective is useful because it suggests that when you specify multiple compiler switches to improve both execution time and security, the compiler might have multiple, potentially conflicting goals. In this case, it has to somehow balance or prioritize these goals. I’ll discuss the impact that /GS has on some aspects of your code, particularly speed, memory consumption, and executable file size. That’s another reason to understand what these switches do to your code.

In the next section, I’ll provide an introduction to control flow attacks with particular focus on stack buffer overflows. I’ll discuss how they occur and how an attacker can exploit them. Then I’ll look in detail at how /GS impacts your code and the degree to which it can mitigate such exploits. Finally, I’ll demonstrate how to use the BinSkim static binary analysis tool to perform a number of critical verification checks on a given executable binary without requiring the source code.

Control Flow Attacks

A buffer is a block of memory used to temporarily store data to be processed. Buffers can be allocated from the runtime heap, the thread stack, directly using the Windows VirtualAlloc API, or as a global variable. Buffers can be allocated from the runtime heap either using the C memory allocation functions (such as malloc) or the C++ new operator. Buffers can be allocated from the stack using either an automatic array variable or the _alloca function. The minimum size of a buffer can be zero bytes and the maximum size depends on the size of the largest free block.

Two particular features of the C and C++ programming languages that truly distinguish them from other languages, such as C#, are:

You can do arbitrary arithmetic on pointers.
You can successfully dereference any pointer anytime as long as it points to allocated memory (from the point of view of the OS), though the behavior of the application may be undefined if it doesn’t point at the memory it owns.

These features make the languages very powerful, but they constitute a great threat at the same time. In particular, a pointer that’s intended to be used to access or iterate over contents of a buffer might be erroneously or maliciously modified so that it points outside the bounds of the buffer to either read or write adjacent or other memory locations. Writing beyond the largest address of a buffer is called a buffer overflow. Writing before the smallest address of a buffer (which is the address of the buffer) is called a buffer underflow.

A stack-based buffer overflow vulnerability has been discovered recently in an extremely popular piece of software (which I won’t name). This resulted from using the sprintf function unsafely, as shown in the following code:

sprintf(buffer, "A long format string %d, %d", var1, var2);

The buffer is allocated from the thread stack and it’s a fixed size. However, the size of the string to be written to the buffer depends on the number of characters required to represent two specified integers. The size of the buffer isn’t sufficient to hold the largest possible string, resulting in a buffer overflow when large integers are specified. When an overflow occurs, adjacent memory locations higher up in the stack get corrupted.

To demonstrate why this is dangerous, consider where a buffer allocated from the stack would typically be located in the stack frame of the declaring function according to standard x86 calling conventions and taking into consideration compiler optimizations, as shown in Figure 1.

Figure 1 A Typical x86 Stack Frame

First, the caller pushes any arguments that aren’t passed through registers onto the stack in a certain order. Then, the x86 CALL instruction pushes the return address onto the stack and jumps to the first instruction in the callee. If frame pointer omission (FPO) optimization doesn’t take place, the callee pushes the current frame pointer onto the stack. If the callee uses any exception-handling constructs that haven’t been optimized away, an exception-handling frame would next be placed onto the stack. That frame contains pointers to and other information about exception handlers defined in the callee. Non-static local variables that haven’t been optimized away and that can’t be held in registers or that spill from registers are allocated from the stack in a certain order. Next, any callee-saved registers used by the callee must be saved on the stack. Finally, dynamically sized buffers allocated using _alloca are placed at the bottom of the stack frame.

Any of the data items on the stack may have certain alignment requirements, so padding blocks may be allocated as required. The piece of code in the callee that sets up the stack frame (except for the arguments) is called the prolog. When the function is about to return to its caller, a piece of code called the epilog is responsible for deallocating the stack frame up to and including the return address.

The main difference between the x86/x64 and ARM calling conventions is that the return address and frame pointer are held in dedicated registers in ARM rather than on the stack. Nonetheless, stack buffer out-of-bounds accesses do constitute a serious security issue on ARM because other values on the stack may be pointers.

A stack buffer overflow (writing beyond the upper bound of a buffer) may overwrite any of the code or data pointers that are stored above the buffer. A stack buffer underflow (writing below the lower bound of a buffer) may overwrite the values of the callee-saved registers, which may also be code or data pointers. An arbitrary out-of-bounds write will either cause the application to crash or behave in an undefined way. However, a maliciously crafted attack enables the attacker to take control of the execution of the application or the whole system. This can be achieved by overwriting a code pointer (such as the return address) so that it points to a piece of code that executes the attacker’s intent.

GuardStack (GS)

To mitigate stack-based out-of-bounds accesses, you could manually add the necessary bounds checks (adding if statements to check that a given pointer is within the bounds) or use an API that performs these checks (for example, snprintf). However, the vulnerability may still persist for different reasons, such as incorrect integer arithmetic or type conversions used to determine the bounds of buffers or to perform bounds checking. Therefore, a dynamic mitigation mechanism to prevent or reduce the possibility of exploitation is required.

General mitigation techniques include randomizing the address space and using non-executable stacks. Dedicated mitigation techniques can be classified according to whether the goal is to prevent out-of-bounds accesses from occurring by capturing them before they occur, or to detect out-of-bounds accesses at some point after they occur. Both are possible, but prevention adds substantial performance overhead.

The Visual C++ compiler offers two detection mechanisms that are somewhat similar, but have different purposes and different performance costs. The first mechanism is part of the runtime error checks, which can be enabled using the /RTCs switch. The second is GuardStack (called Buffer Security Check in the documentation and Security Check in Visual Studio), which can be enabled using the /GS switch.

With /RTCs, the compiler allocates additional small memory blocks from the stack in an interleaving manner such that every local variable on the stack is sandwiched between two such blocks. Each of these additional blocks is filled with a special value (currently, 0xCC). This is handled by the prolog of the callee. In the epilog, a runtime function is called to check whether any of these blocks were corrupted and report a potential buffer overflow or underflow. This detection mechanism adds some overhead in terms of performance and stack space, but it’s designed to be used for debugging and ensuring program correctness, not just as a mitigation.

GuardStack, on the other hand, was designed to have lower overhead and as a mitigation that can actually work in a production, potentially malicious, environment. So /RTCs should be used for debug builds and GuardStack should be used for both builds. In addition, the compiler doesn’t allow you to use /RTCs with compiler optimizations, while GuardStack is compatible and doesn’t interfere with compiler optimizations. By default, both are enabled in the Debug configuration while only GuardStack is enabled in the Release configuration of a Visual C++ project. In this article, I’ll only discuss GuardStack in detail.

When GuardStack is enabled, a typical x86 call stack would look like what’s shown in Figure 2.

Figure 2 A Typical x86 Stack Frame Protected Using GuardStack (/GS)

There are three differences compared to the stack layout shown in Figure 1. First, a special value, called a cookie or a canary, is allocated just above the local variables. Second, local variables that are more likely to exhibit overflows are allocated above all other local variables. Third, some of the arguments that are particularly sensitive to buffer overflows are copied to an area below local variables. Of course, to make these changes happen, a different prolog and epilog are used, as I’ll discuss now.

The prolog of a protected function would include roughly the following additional instructions on x64:

sub         rsp,8h
mov         rax,qword ptr [__security_cookie] 
xor         rax,rbp 
mov         qword ptr [rbp],rax

An additional 8 bytes is allocated from stack and is initialized to a copy of the value of the __security_cookie global variable XOR’d with the value held in the RBP register. When /GS is specified, the compiler automatically links the object file built from gs_cookie.c source file. This file defines __security_cookie as a 64-bit or 32-bit global variable of the type uintptr_t on x64 and x86, respectively. Therefore, each Portable Executable (PE) image compiled with /GS includes a single definition of that variable used by the prologs and epilogs of the functions of that image. On x86, the code is the same except that 32-bit registers and cookies are used.

The basic idea behind using a security cookie is to detect, just before the function returns, whether the value of the cookie has become different from that of the reference cookie (the global variable). This indicates a potential buffer overflow caused by either an exploitation attempt or just an innocent bug. It’s crucial that the cookie has very high entropy to make it extremely difficult for an attacker to guess. If an attacker is able to determine the cookie used in a particular stack frame, GuardStack fails. I’ll discuss more about what GuardStack can and can’t do later in this section.

The reference cookie is given an arbitrary constant value when the image is emitted by the compiler. Therefore, it must be carefully initialized, basically before any code is executed. Recent versions of Windows are aware of GuardStack and will initialize the cookie to a high-entropy value at load time. When /GS is enabled, the first thing the entry point of an EXE or DLL does is initialize the cookie by calling the __security_init_cookie defined in gs_support.c and declared in process.h. This function initializes the image’s cookie if it hasn’t been appropriately initialized by the Windows loader.

Note that without XOR’ing with RBP, merely leaking the reference cookie at any point during execution (using an out-of-bounds read, for example) is sufficient to subvert GuardStack. XOR’ing with RBP enables you to efficiently generate different cookies and the attacker would need to know both the reference cookie and the RBP to figure out the cookie for one stack frame. RBP by itself isn’t guaranteed to have high entropy because its value depends on how the compiler optimized the code, the stack space consumed so far, and the randomization performed by address space layout randomization (ASLR), if enabled.

The epilog of a protected function would include roughly the following additional instructions on x64:

mov         rcx,qword ptr [rbp]
xor         rcx,rbp 
call        __security_check_cookie
add         esp,8h

First, the cookie on the stack is XOR’d to produce a value that’s supposed to be the same as the reference cookie. The compiler emits instructions to ensure that the value of RBP used in the prolog and epilog is the same (unless it somehow got corrupted).

The __security_check_cookie function, declared in vcruntime.h, is linked by the compiler and its purpose is to validate the cookie that’s on the stack. This is done mainly by comparing the cookie with the reference cookie. If the check fails, the code jumps to the __report_gsfailure function, which is defined in gs_report.c. On Windows 8 and later, the function terminates the process by calling __fastfail. On other systems, the function terminates the process by calling UnhandledExceptionFilter after removing any potential handler. Either way, the error is logged by Windows Error Reporting (WER) and it contains information in which stack frame the security cookie got corrupted.

When /GS was first introduced in Visual C++ 2002, you could override the behavior of a failed stack cookie check by specifying a callback function. However, since the stack is in an undefined state and since some code already got executed before the overflow was detected, there’s almost nothing that can be reliably done at that point. Therefore, later versions starting with Visual C++ 2005 eliminated this feature.

The Overhead of GuardStack

To minimize overhead, only those functions that the compiler considers vulnerable are protected. Different versions of the compiler may use different undocumented algorithms to determine whether a function is vulnerable, but in general, if a function defines an array or a large data structure and obtains pointers to such objects, it’s likely that it will be considered vulnerable. You can specify that a particular function not be protected by applying __declspec(safebuffers) to its declaration. However, this keyword is ignored when applied to a function that’s inlined in a protected function or when a protected function is inlined in it. You can also force the compiler to protect one or more functions using the strict_gs_check pragma. The security development lifecycle (SDL) checks, enabled using /sdl, specifies strict GuardStack on all source files and other dynamic security checks.

GuardStack copies vulnerable parameters to a safer location below local variables so that if an overflow occurrs, it would be more difficult to corrupt those parameters. A parameter that’s a pointer or a C++ reference may qualify as a vulnerable parameter. Refer to the documentation on /GS for more information.

I’ve conducted a number of experiments using C/C++ production applications to determine the overhead related to both performance and image size. I’ve applied strict_gs_check on all source files so the results are independent of what the compiler considers vulnerable functions (I refrained from using /sdl because it enables other security checks, which have their own overheads). The largest performance overhead I got was 1.4 percent and the largest image size overhead was 0.4 percent. The worst-case scenario would occur in a program that spends most of its time calling protected functions that do very little work. Well-designed real programs don’t exhibit such behavior. Keep in mind also that GuardStack incurs a potentially non-negligible stack space overhead.

On the Effectiveness of GuardStack

GuardStack is designed to mitigate only a specific type of vulnerability, namely stack buffer overflow. More important, using GuardStack by itself against this vulnerability may not provide a high degree of protection because there are ways for an attacker to go around it:

The detection of a corrupt cookie occurs only when the function returns. A lot of code might get executed between the time the cookie is corrupted and the time that corruption is detected. That code might be using other values from the stack, above or below the cookie, that have been overwritten. This creates an opportunity for an attacker to take (partial) control of the execution of the application. In that case, detection may not even take place at all.
A buffer overflow can still occur without overwriting the cookie. The most dangerous case would be overflowing a buffer allocated using _alloca. Even protected arguments and callee-saved registers can be overwritten in this case.
It may be possible to leak some of the cookies using out-of-bounds memory reads. Because different images use different reference cookies, and because cookies are XOR’d with the RBP, it can be more challenging for an attacker to make use of leaked cookies. However, the Windows Subsystem for Linux (WSL) might have introduced another way to leak cookies. WSL provides an emulation of the fork Linux system call by which a new process is created that duplicates the parent process. If the application being attacked forks a new process to handle incoming client requests, a malicious client can issue a fairly small number of requests to determine the values of the security cookies.
A number of techniques have been proposed to guess an image’s reference cookie in certain situations. I’m not aware of any successful real attacks in which the reference cookie was guessed, but the probability of success isn’t tiny enough to dismiss it. XOR’ing with RBP adds another very important layer of defense against such attacks.
GuardStack mitigates vulnerabilities by introducing different potential vulnerabilities, in particular, DoS and data loss. When the corruption of a cookie is detected, the application is abruptly terminated. For a server application, the attacker can cause the server to crash, potentially losing or corrupting valuable data.

Therefore, it’s important that you first strive to write correct, secure code with the help of static analysis tools. Then, following the defense-in-depth strategy, employ GuardStack and other dynamic mitigations offered by Visual C++ (many of which are enabled by default in the Release build) in the code you ship.

/GS with /ENTRY

The default entry point function (*CRTStartup) specified by the compiler when you compile to produce an EXE or a DLL file does four things in order: initializes the reference security cookie; initializes the C/C++ runtime; calls the main function of your application; and terminates the application. You can use the /ENTRY linker switch to specify a custom entry point. However, combining a custom entry point with the effects of /GS can lead to interesting scenarios.

The custom entry point and any functions it calls are candidates for protection. If the Windows loader appropriately initialized the cookie, then any protected functions will use a copy of a reference cookie that’s the same in their prologs and epilogs. So no problem will occur.

If Windows didn’t appropriately initialize the cookie and the first thing the custom entry point does is to call __security_init_cookie, then all protected functions will use the correct reference cookie except for the entry point. Recall that a copy of the reference cookie is made in the epilog. Therefore, if the entry point returns normally, the cookie will be checked in its epilog and the check will fail, resulting in a false positive. To avoid this problem, you should call a function to terminate the program (such as exit) rather than returning normally.

If Windows didn’t appropriately initialize the cookie and the entry point didn’t call __security_init_cookie, then all protected functions will use the default reference cookie. Fortunately, since this cookie is XOR’d with RBP, the entropies of the used cookies won’t be zero. So you’ll still get some protection, especially with ASLR. However, it’s recommend that you properly initialize the reference cookie by calling __security_init_cookie.

Using BinSkim to Verify GuardStack

BinSkim is a light, static binary analysis tool that verifies the correctness of the usage of some of the security features used in a given PE binary. One particular feature that BinSkim supports is GuardStack. BinSkim is open source (github.com/Microsoft/binskim) under MIT license and written completely in C#. It supports x86, x64 and ARM Windows binaries that are compiled with recent versions of Visual C++ (2013+). You can either use it as a stand-alone tool or, more interestingly, include (part of) it in your code. For example, if you have an application that supports PE plug-ins, you can use BinSkim to verify that a plug-in employs the recommended security features and refuse to load it otherwise. I’ll discuss in this section how to use BinSkim as a stand-alone tool.

As far as GuardStack is concerned, the tool verifies that the specified binary adheres to the following four rules:

EnableStackProtection: Checks the corresponding flag that’s stored in the associated PDB file. If the flag isn’t found, the rule fails. Otherwise, it passes.
InitializeStackProtection: Iterates the list of global functions as defined in the associated PDB file to find the functions __security_init_cookie and __security_check_cookie. If both aren’t found, the tool considers that /GS wasn’t enabled. In this case, EnableStackProtection should fail. If __security_init_cookie wasn’t defined, the rule fails. Otherwise, it passes.
DoNotModifyStackProtectionCookie: Looks up the location of the reference cookie using the load configuration data of the image. If the location isn’t found, the rule fails. If the load configuration data indicates that a cookie is defined, but its offset is invalid, the rule fails. Otherwise, the rule passes.
DoNotDisableStackProtectionForFunctions: Uses the associated PDB file to determine if there are any functions with the __declspec(safebuffers) attribute applied on them. The rule fails if any are found. Otherwise, it passes. Using __declspec(safebuffers) is disallowed by the Microsoft SDL.

To use BinSkim, first download the source code from the GitHub repository and build it. To run BinSkim, execute the following command in your favorite shell:

binskim.exe analyze target.exe --output results.sarif

To analyze more than one image, you can use the following command:

binskim.exe analyze myapp\*.dll --recurse --output results.sarif --verbose

Note that you can use wild cards in file paths. The --recurse switch specifies that BinSkim should analyze images in subdirectories, too. The --verbose switch tells BinSkim to include in the results file the rules that passed—not just the ones that failed.

The results file is in the Static Analysis Results Interchange Format (SARIF). If you open it in a text editor, you’ll find entries that look like what’s shown in Figure 3.

Figure 3 BinSkim Analysis Results File

{
  "ruleId": "BA2014",
  "level": "pass",
  "formattedRuleMessage": {
    "formatId": "Pass ",
    "arguments": [
      "myapp.exe",
    ]
  },
  "locations": [
    {
      "analysisTarget": {
        "uri": "D:/src/build/myapp.exe"
      }
    }
  ]
}

Every rule has a rule ID. The rule ID BA2014 is the ID of the DoNotDisableStackProtectionForFunctions rule. The Microsoft SARIF SDK (github.com/Microsoft/sarif-sdk) includes the source code of a Visual Studio extension that views SARIF files in Visual Studio.

Wrapping Up

The GuardStack dynamic mitigation technique is an extremely important detection-based mitigation against stack buffer overflow vulnerabilities. It’s enabled by default in both the Debug and Release builds in Visual Studio. It was designed to have a negligible overhead for most programs so that it can be widely used. However, it doesn’t provide an ultimate solution for such vulnerabilities. Buffer overflows are common for buffers allocated from the stack, but they can also occur in any allocated memory region. Most prominently, heap-based buffer overflows are just as perilous. For these reasons, it’s very important to use other mitigation techniques offered by Visual C++ and Windows such as Control Flow Guard (CFG), Address Space Layout Randomization (ASLR), Data Execution Prevention (DEP), Safe Structured Exception Handling (SAFESEH), and Structured Exception Handling Overwrite Protection (SEHOP). All of these techniques work synergistically to harden your application. For more information on these techniques and others, refer to bit.ly/2iLG9rq.

Hadi Brais is a doctorate scholar at the Indian Institute of Technology Delhi, researching compiler optimizations, computer architecture, and related tools and technologies. He blogs on hadibrais.wordpress.com and can be contacted at hadi.b@live.com.

Thanks to the following technical experts for reviewing this article: Shayne Hiet-Block (Microsoft), Mateusz Jurczyk (Google), Preeti Ranjan Panda (IITD), Andrew Pardoe (Microsoft)

Discuss this article in the MSDN Magazine forum

Share via