From the August 2001 issue of MSDN Magazine

MSDN Magazine

Optimize and Trim Your Code with New Switches in Visual C++ .NET
John Robbins

ometimes the best things in life don't get talked about much. While Microsoft® .NET and managed code is getting all the coverage, I took a peek at Visual C++ .NET the other day. I figured there wouldn't be much new because all the work was going into C# and the new version of Visual Basic®. Boy, was I wrong! As an old-time C programmer who started with C 7.0 and the Programmer's WorkBench, I'm happy to report that this is one of the most impressive upgrades to a compiler/development environment I have seen in years. I have been using a prerelease version of Visual Studio .NET Beta 2 and I like it so much that I have been debugging and testing most of my code with it. Once I wring the bugs out of the code, I back port it to Visual C++ 6.0 for release because Visual C++ .NET is that much better! I predict that all of you will be doing the same as soon as Beta 2 ships.
      Many of you are thinking that I am going to talk about the new code editor and class browsers. However, I only use the IDE for debugging because for years my beloved Visual SlickEdit has had all of the new editing features found in Visual Studio .NET. The other thing you might think I will talk about is the new attributed programming, but that's already been covered by Richard Grimes in the April 2001 issue of MSDN® Magazine (see C++ Attributes: Make COM Programming a Breeze with New Feature in Visual Studio .NET. The important pieces of Visual C++®, in my opinion, are the debugger, the compiler, the linker, and the C runtime. You can't live without these particular pieces. You can always edit code in Notepad, but you need tools to generate the code and debug it.
      The new debugger is simply outstanding. Microsoft has fixed the major problems I had with the Visual Studio® 6.0 debugger and added many tweaks to help you debug even faster. The fact that the Breakpoints, Modules, and Threads dialogs are now dockable (which they should have been all along) is reason enough to start using the new debugger on your existing code. Throw in the multiple memory windows (finally!), multiple process debugging and cross-machine debugging and you have a great debugger! When Beta 2 is released, I'll write a column concentrating on just the debugger because some things are still changing as of this writing.
      While the new debugger is excellent, the surprising thing that got me all excited are some simple compiler and linker switches. Yes, I plead guilty to being a geek. But these aren't just your regular compiler and linker switches. These switches will, among other things, help you debug and solve the nastiest problems faster than ever as well as deliver the smallest, tightest code imaginable. This month I'll talk about those switches and cover a few other interesting new things offered by the compiler and linker. Once I get done, I can guarantee that you will be chomping at the bit to convert your existing projects to Visual C++ .NET!

The Compiler Runtime Error Checks Switch

      If the only feature added to the Visual C++ .NET compiler was the /RTCx (Runtime Check) switch, I would still tell everyone that it's a mandatory upgrade. As the name implies, the four RTC switches watch your code, and when you have certain errors at runtime the debugger pops up and tells you about them. Figure 1 shows an error where some code overwrote past the end of a local variable. As you can see in the message box, the particular local that was blown is shown as well. The part you can't see is that this message box pops up at the end of the function where the error occurred, so it's trivial to find and fix the problem! The RTC switches, by the way, are only allowed in non-optimized builds.

Figure 1 New Runtime Error Message
Figure 1 New Runtime Error Message

      The first switch, /RTCs, whose error report is shown in Figure 1, does all sorts of runtime stack checking. It starts by initializing all local variables to nonzero values. This helps you find those pesky release build problems that don't show up in debug builds: specifically the case where you have an uninitialized pointer variable on the stack. All local variables are filled with 0xCC, the INT 3 (breakpoint) opcode.
      The second helpful stack-checking feature is that /RTCs does stack pointer verification around every function call you make to combat against calling convention mismatches. For example, if you declare a function as __cdecl, but it is exported as __stdcall, the stack will be corrupted upon returning from the __stdcall function. If you have been paying attention to compiler switches over the years, you might have guessed that the first two stack-checking options for /RTCs are what the /GZ switch in Visual C++ 6.0 used to do.
      Fortunately for us, Microsoft extended the /RTCs switch to also do overrun and underrun checking of all multi-byte local variables such as arrays. It does this by adding four bytes to the front and end of those arrays and checking them at the end of the function to ensure those extra bytes are still set to 0xCC. The local checking works with all multi-byte locals, except those that require the compiler to add padding bytes. In almost all cases, padding is added only when you use the __declspec(align) directive, the /Zp structure member alignment switch, or the #pragma pack(n) directive.
      Some of you might be thinking that with the /RTCs switch, you can stop using your third-party error detection tools. While /RTCs is very cool, it's not going to replace tools like Rational's Purify or NuMega's BoundsChecker any time soon. /RTCs only catches one type of error, whereas both of those tools catch hundreds more.
      The second switch, /RTCu, checks your variable usage and will pop the warning if you use any without initializing them first. If you have been a loyal Bugslayer reader over the years, you might be wondering why this switch is important. Since all loyal readers are already compiling their code with warning level 4 (/W4) and treating all warnings as errors (/WX), you know that compiler warnings C4700 and C4701 will always tell you at compile time where you are definitely using and where you may be using variables without initialization, respectively. Unfortunately, while there are, I hope, legions of Bugslayer readers, not all of you are using /W4 and /WX. With the /RTCu switch, those who aren't will still be told they have bugs in their code. What's interesting about the /RTCu switch is the code to check for uninitialized variables is inserted if the compiler detects a C4700 or C4701 condition.
      The third switch, /RTC1, is just shorthand for combining /RTCu and /RTCs.

Figure 2 Setting Basic Runtime Checks
Figure 2 Setting Basic Runtime Checks

      The final switch, /RTCc, checks for data truncation assignments—for example, if you try to assign 0x101 to a char variable. Like the /RTCu, if you are compiling with /W4 and /WX, data truncation will produce a C4244 error at compilation time. If you get an /RTCc error, you have to either mask off the bits you need or cast to the appropriate value. The project settings dialog, shown in Figure 2, only allows you to set /RTCu, /RTCs, or /RTC1 in the Basic Runtime Checks. In order to turn on /RTCc, you will need to select the Smaller Type Check option above the Basic Runtime Checks, as shown in Figure 3. At first, I couldn't see why the /RTCc switch was not turned on by the /RTC1. A little thinking showed that /RTCc can show errors on legal C code such as the following.
    char LoByte(int a)
      return ((char)a) ;

If /RTCc were included in /RTC1, people might think that the whole Runtime Check switches are reporting false positives. However, I vote for always turning on /RTCc because I want to know about any potential problems whenever I run my code.

Figure 3 Setting Smaller Type Checks
Figure 3 Setting Smaller Type Checks

      With the switch descriptions out of the way, I want to turn to the notification you get when you do have an error. When running your programs outside the debugger, the runtime checking code uses the standard C runtime assertion message box. For those of you writing services or code that can't have a user interface, you'll need to redirect the message box using the _CrtSetReportMode function with the _CRT_ASSERT report type parameter. While you might think there was a single, standard way for the /RTCx switches to notify the user, that's not the case. When running under the debugger, there's a completely different way to do the notifications.
      If you happen to look at the new Exceptions dialog in the Visual Studio .NET IDE, you might notice several new classes of exceptions added. The interesting new exception class, shown in Figure 4, is Native Run-Time Checks. As you look down the list, you'll recognize the four different exceptions as matching up with the /RTCx switches. That's your hint that when your program encounters a runtime check while running under the debugger, your program will throw a special exception so the debugger can handle it. The resulting dialog was shown in Figure 1.

Figure 4 Exception Classes in VisualStudio.NET
Figure 4 Exception Classes in VisualStudio .NET

      Since most of you will be using the new Visual Studio .NET debugger, there's no problem with the runtime checks using RaiseException to communicate with the debugger. However, if you are running under WinDBG, the runtime check code is smart enough to see that and ends up firing a break instruction exception. To determine what the error was in WinDBG, you will need to look at the stack. If _RTC_CheckStackVars is in the stack, you are looking at an /RTCs error where you underwrote or overwrote a stack variable. There's no way of telling which local you corrupted. If _RTC_CheckEsp is in the stack, you are looking at an /RTCs stack corruption where you had a calling convention mismatch. If you see _RTC_Check_x_to_y, where x and y are numeric values showing the conversion, you are looking at an /RTCc error. Finally, if _RTC_UninitUse is in the stack, you are looking at an /RTCu error.

Controlling Runtime Check Output Yourself

      While the default method of output will suffice for many situations, you might want to handle the error output yourself. Figure 5 shows a sample custom error handler. The parameter list for runtime check error handlers is a little different in that it takes variable parameters. Evidently, Microsoft is planning to add quite a few different runtime error checks in the future to account for this flexibility. Since your custom handler gets the same parameters as the default version, you can show the errors with variable information and everything else. As you can see from Figure 5, it's up to you how you choose to inform the user. The code from Figure 5 is also included with this month's code distribution, so you can play with the error handling all you want.
      Setting your custom error handler function is trivial; just pass it as the parameter to _RTC_SetErrorFunc. There are a few other functions to assist you when handling runtime check errors. The first, _RTC_GetErrDesc, retrieves the description string for a particular error. _RTC_NumErrors returns the total number of errors supported by the current version of the compiler and runtime. One function, which I find a little dangerous, is _RTC_SetErrorType. You can use this function to turn off error handling for any or all of the specific runtime checks.
      Since the runtime checks rely on the C runtime, if your program doesn't use the C runtime you might think you would completely lose the benefits of the RTC switches. If you are wondering why you would ever have a program without the C runtime, think about ATL and building with _ATL_MIN_CRT. Without the C runtime, you need to call _RTC_Initialize if you've used __MSVC_RUNTIME_CHECKS. You must also provide a function named _CRT_RTC_INIT, which returns your custom error handler.
      When I first started playing with a custom output handler, I immediately ran into a small problem. I couldn't debug my handler! If you think about it, you can see why it happens. As I've already discussed, the runtime checking code can determine if you are running under a debugger or not and display the output in either the debugger or through the normal C runtime assertion dialog. When you are running under a debugger, the runtime checking code sees that it's under a debugger and just generates the special exception code to talk to the debugger, completely bypassing your custom output handler. Hopefully, Microsoft will get this fixed before the release version of Visual Studio .NET.

The Buffer Security Check Switch

      The runtime checks are very cool, but another switch that you should always turn on is /GS, the Buffer Security Check switch. The purpose of /GS is to monitor the return address for a function to see if it is overwritten, which is a common technique used by viruses and Trojan horse programs to take over your application. /GS works by reserving space on the stack before the return address. At the function entry, the function prolog fills in that spot with a security cookie XOR'd with the return address. That security cookie is computed as part of the module load so it's unique to each module. When the function exits, a special function, _security_check_cookie, checks to see if the value stored at the special spot is the same as it was when entering the function. If they are different, the code pops up a message box and terminates the program. If you want to see the security code in action, read the source files SECCINIT.C, SECCOOK.C, and SECFAIL.C in the C runtime source code.
      As if the security-checking capability of the /GS switch wasn't enough, the switch is also a wonderful debugging aid. While the /RTCx switches will track numerous errors, a random write to the return address will still sneak through. With the /GS, you get that checking in your debug builds as well. Of course, the Redmondtonians were thinking of us when they wrote the /GS switch, so you can replace the default message box function with your own handler by calling _set_security_error_handler. If you do whack the stack, your handler should call ExitProcess after logging the error.

Other Interesting New Compiler Things

      The /RTCx and /GS switches are the very big additions to the compiler. However, there are a few other switches and preprocessor items that I feel are very much worth mentioning. The first is the new /showIncludes switch, which, as the name implies, shows you the hierarchical list of include files for each module. If you are getting odd definitions or other include problems, /showIncludes can save your bacon. One interesting note about /showIncludes is that it will not show anything you include with your precompiled headers. It only works on active includes placed after your precompiled header file directive. You can turn on /showIncludes by going to Project | Properties | C/C++ | Advanced.
      The warning switches have been given a thorough burnishing to make them easier to use. The most interesting one is /Wall, which will completely turn on all warnings, including those that are off by default. This makes the new compiler a very credible LINT utility and is a switch that you want to turn on before you check in your code to ensure you've scoured it of all slightly ambiguous problems. The /w switch has been completely redefined. In Visual C++ 6.0, it turned off all warnings. Now you can selectively enable, disable, and show once each warning. The final new warning switch is /Wp64 to help you find 64-bit portability errors. I've found /Wp64 to be quite helpful, except that the SetWindowLongPtr/GetWindowLongPtr macros, which are supposed to aid 32- and 64-bit access, are defined wrong when doing 32-bit compiles. So with /Wp64, you will have extra warnings around them unless you turn them off with the #pragma warning directives.
      One option that finally showed up many years after it was needed is the new /GH switch. The compiler has had the /Gh switch, which inserts a call to a user-defined _penter function inside each prolog call. If you remember the Smooth Working Set tool I wrote for the October 2000 and December 2000 issues, you'll recall that I used _penter to record all functions as they were called. The new /GH switch puts in a _pexit call in the function's epilog so it's easier than ever to match up calls and returns. As you can imagine, you can now easily make a code profiler that automatically records function times with both the /Gh and /GH switches.
      The compiler's preprocessor has also been beefed up to offer more helpful functionality. The most impressive items are the __FUNCTION__, __FUNCSIG__, and __FUNCDNAME__ macros. These predefined macros are analogous to the original __FILE__ and __LINE__ macros. The __FUNCTION__ macro expands out to the undecorated name of the enclosing function. The __FUNCSIG__ is the complete signature of the enclosing function, and __FUNCDNAME__ is the decorated name of the enclosing function. Each of these will allow you to write better diagnostic codes because retrieving the current function is now a piece of cake!
      Previous versions of the compilers required quite a bit of groveling to consistently retrieve the return address for a function. The new _ReturnAddress and _AddressOfReturnAddress intrinsics allow you to get the return address and the address of the return address very easily. Unlike most intrinsics, you must declare these as I did in the following code. Once you have declared them, you can use _ReturnAddress just like a function.
  #ifdef __cplusplus
#define EXTERNC extern "C"
#define EXTERNC 

EXTERNC void * _AddressOfReturnAddress ( void ) ; EXTERNC void * _ReturnAddress ( void ) ;

#pragma intrinsic ( _AddressOfReturnAddress ) #pragma intrinsic ( _ReturnAddress )

It's Super Optimizer!

      With each release of the compiler, the optimizations get better and better. This release of Visual C++ introduces a new twist to optimizations; the linker gets in the act. In almost all compilers, the only optimizations that take place are limited to the single file being compiled (also known as the compiland). With Visual C++ .NET, the linker assist means that for the first time on Microsoft platforms the optimizations can apply to the whole program at once! A number of new optimizations can now take place by specifying /GL on the compiler and /LTCG on the linker when doing your release builds.
      Figure 6 is a three-file program that shows all the optimizations. I built OPTIMIZATIONS.CPP and ANOTHERMODULE.CPP with the following optimizations in Visual C++ .NET:
  CL:  /GL /O1 /Ob1 /Oy /GF /FD /EHsc /MD /Gy 

      I built YETANOTHER.CPP with the same settings, except I substituted /Ob2 (inline function expansion, any suitable) for /Ob1 (inline function expansion, only those marked as inline or __inline). I also built the program with Visual C++ 6.0 with the same optimization switches, except for /GL to CL and /LTCG to LINK so I could give you before and after comparisons. Comparing the wmain disassembly for Visual C++ 6.0 (see Figure 7) and Visual C++ .NET (see Figure 8) shows a major improvement in tight code generation. As you should know, when it comes to fast code, smallness is next to goodness. I've included the Optimizations program with this month's code distribution so you can see the before and after scenarios yourself.
      The first new optimization that is pretty amazing is the custom calling convention. A calling convention is the protocol for how parameters are passed between functions and how the stack is cleaned up. Instead of passing items on the stack like almost all regular calling conventions, the optimizer will look at passing them in registers. This optimization takes place during the link, so it truly is custom. In Figure 6, the function InitializeVariables takes two pointer parameters. In the following commented snippet from a Visual C++ .NET compile, you can see the first parameter is passed in the EAX register and the second is passed on the stack.
00401054: mov    dword ptr [eax],0000000Ah  ; EAX holds the first
                                                ; parameter.
0040105A: mov    eax,dword ptr [esp+004h]   ; The second parameter
                                            ; is on the stack in
                                            ; the spot where the
                                            ; first parameter used
                                            ; to go!
0040105E: mov    dword ptr [eax],0000000Bh  ; Init param 2.
00401064: ret                               ; Return to caller.

      The second new type of optimization is cross-module inlining, which means that instead of making a function call, the compiler/linker will look to jam the function code directly inline with the caller. I compiled YETANOTHER.CPP with /Ob2 to tell the compiler to inline anything that it could. In Figure 6, the ReturnValue function simply returns the value 0xBB. The following code shows the call as traditionally compiled in Visual C++ 6.0:
  0040104B:   call      ReturnValue    ; Call the function.

      The code for the ReturnValue function in Visual C++ 6.0 would resemble the following:
0040108E:   mov   eax,000000BBh    ; Put the return value in EAX.
00401093:   ret                    ; Return to caller.

      In Visual C++ .NET, the linker sees that the code is nothing more than a hardcoded return value, so it generates the following code in wmain instead of making the call.
  00401040:   push      000000BBh

The code in wmain is getting the value returned by the ReturnValue function so it can print it out. Since the function is always returning a constant, the compiler or linker just turn the whole ReturnValue function into a hardcoded number. The push instruction is just getting it on the stack for the wprintf function.
      The final new optimization I want to mention is the small Thread Local Storage (TLS) displacement. The compiler and linker now pick the most frequently used TLS variables and specifically allocates them in first, instead of just assigning them in order. This means the code generator can use byte and word instructions more easily. This saves three bytes per TLS reference.
      These are the three main optimizations you will see. There are a few others, such as stack double alignment, improved memory disambiguation for global variables, and interprocedural register allocations. All the optimization improvements add up rather quickly. The sample program I used to figure out the optimizations was 20,544 bytes when compiled by Visual C++ 6.0, but Visual C++ .NET compiled it down to 4,096 bytes!

Wrap Up

      The new compiler switches and code generation optimizations alone make Visual C++ .NET a mandatory upgrade for your existing projects. Throw in the new attributed programming and the excellent debugger and you have a package that will make life much better for everyone. Throw your projects at Visual C++ .NET; they'll love you for it.
      I just noticed that this marks the fourth year of the Bugslayer column! Thanks to all of you who've read the column and written me e-mails—I really appreciate everyone reading. Thank you!

The Tips!

      It's the middle of summer and the heat is killing you. Inclusion of your tips in the Bugslayer column will definitely cool you off. Send them to me at
Tip 45 One of the advantages of having the Windows 2000 symbols installed is getting perfect call stacks because the supplied symbols include just the public functions, and, most importantly, the Frame Pointer Omission (FPO) data. The FPO data is necessary to completely walk the stack. However, if your code is out in the field, you probably don't want to distribute your existing PDB files, since they have all your secrets in them. A new feature of the Visual C++ .NET linker is the /PDBSTRIPPED switch. It will automatically produce a PDB file that contains nothing but public symbols, object file names, and FPO data. With this new switch, you should never want for a call stack from the field again.
Tip 46 If you want to automate your development builds without starting the new Visual Studio .NET IDE, run "DEVENV /?" from the command line. The resulting dialog will show you all the command-line options for building and running your projects. You can easily automate your complete system build straight from the command line so you can start it late at night without any user intervention.

Send questions and comments for John to
John Robbins is a cofounder of Wintellect, a software consulting, education, and development firm that specializes in programming in Windows and COM. He is the author of Debugging Applications (Microsoft Press, 2000). You can contact John at