Reduce EXE and DLL Size with LIBCTINY.LIB | |
Matt Pietrek | |
ay back in my October 1996 column in MSJ, I addressed a question concerning the size of executable files. Back then, a simple Hello World program compiled to a 32KB executable. Two compiler versions later, the problem is only slightly better. The same program with the Visual C++® 6.0 compiler is now 28KB. EXE and DLL SizeBefore jumping into the code for my replacement runtime library, it's worth taking the time to review why simple EXEs and DLLs are bigger than you might expect. Consider the canonical Hello World program:
Let's compile this program for size, and generate a map file. Using the command-line Visual C++ compiler, the syntax would be:
First, look at the .MAP file; a trimmed down version is shown in Figure 1. From looking at the addresses of main (0001:00000000) and of printf (0001:0000000C), you can infer that function main's code is only 0xC bytes in length. Looking at the last line of the file, the __chkstk function at address 0001:00003B10, you can also infer that there's at least 0x3B10 bytes of code in the executable. That's over 14KB of code to send Hello World to the screen. What About the C++ Runtime Library DLL?Alert readers might say, "Hey Matt! Why don't you just use the DLL version of the runtime library?" In the past, I could make the argument that there was no consistently named version of the C++ runtime library DLL available on Windows® 95, Windows 98, Windows NT 3.51, Windows NT® 4.0, and so forth. Luckily, we've moved past those days, and in most cases you can rely on MSVCRT.DLL being available on your target machines.Making this switch and recompiling Hello.CPP, the resulting executable is now only 16KB. Not bad, but you can do better. More importantly, you're just shifting all of this unneeded code to someplace else (that is, to MSVCRT.DLL). In addition, when your program starts up, another DLL will have to be loaded and initialized. This initialization includes items like locale support, which you may not care about. If MSVCRT.DLL suits your needs, then by all means use it. However, I believe that using a stripped-down, statically linked runtime library still has merit. I may be tilting at windmills here, but my e-mail conversations with readers show that I'm not alone. There are people out there who want the leanest possible code. In this day of writeable CDs, DVDs, and fast Internet connections, it's easy not to worry about code size. However, the best Internet connection I can get at home is only 24Kbps. I hate wasting time downloading bloated controls for a Web page. As a matter of principle, I want my code to have as small a footprint as possible. I don't want to load any extra DLLs that I don't really need. Even if I might need a DLL, I'll try to delayload it so that I don't incur the cost of loading it until I use the DLL. Delayloading is a topic I've described in previous columns, and I strongly encourage you to become familiar with it. See Under the Hood in the December 1998 issue of MSJ for starters. Digging DeeperNow that I've beaten up the unneeded code within the program, let's turn to the executable file itself. If you were to run DUMPBIN /HEADERS on my Hello.EXE, you'd see the following two lines in the output:
The second line is interesting. It says that every code and data section in the executable is aligned on a 4KB (0x1000) byte boundary. Because sections are stored contiguously in a file, it's not hard to see the potential for wasting up to 4KB between the end of one section and the start of the next.
The key difference is that the alignment between sections is only 512 bytes (0x200). There's much less space available to waste. In Visual C++ 6.0, the linker defaults were changed to make the file alignment of sections equal to the alignment in memory. This provides a slight load-time performance improvement on Windows 9x, but makes executables bigger. LIBCTINY: A Minimal Runtime LibraryNow that you understand the problem of why simple EXEs and DLLs are so large, it's time to introduce my new and improved replacement runtime library. In the October 1996 column (mentioned earlier), I created a small static .LIB file designed to replace or augment the Microsoft LIBC.LIB and LIBCMT.LIB libraries. I called this replacement runtime library LIBCTINY.LIB, since it was a very stripped-down version of Microsoft's own runtime library sources.LIBCTINY.LIB is intended for simple applications that don't require a huge amount of runtime library support. Thus, it's not suitable for MFC applications or other complicated scenarios that make extensive use of the C++ runtime. LIBCTINY's ideal target is small programs or DLLs that call some Win32 APIs and perhaps display some simple output. There are two guiding principles behind LIBCTINY.LIB. First, it replaces the standard Visual C++ startup routines with much simpler code. This simpler code doesn't refer to any of the more esoteric runtime library functions like __crtLCMapStringA. Because of this, much less extraneous code is linked into your binary. As I'll show shortly, the LIBCTINY routines perform a bare minimum of tasks before calling your WinMain, main, or DllMain routines. The second guiding principle of LIBCTINY.LIB is to implement relatively large functions like malloc or printf with code that's already in the Win32 system DLLs. Beyond the minimal startup code, most of the other LIBCTINY source files are simple implementations of standard C++ runtime library functions such as malloc, free, new, delete, printf, strupr, strlwr, and so on. Take a look at the implementation of printf in printf.cpp (see Figure 2) to get an idea of what I'm talking about. In my original version of LIBCTINY.LIB there were two restrictions that annoyed me. First, the original version did not support DLLs. You could make tiny console and GUI executable programs, but if you wanted to create a tiny DLL, you were out of luck. Second, the original LIBCTINY did not support static C++ constructors and destructors. By this, I mean constructors and destructors declared at global scope. In the new version, I've added the basic code that implements this support. Along the way, I learned quite a bit about how the compiler and runtime library play a complicated game to make static constructors and destructors work. The Dark Underbelly of ConstructorsWhen the compiler processes a source file that has a static constructor, it generates two things. The first is a small blob of code with a name like $E2 that calls the constructor. The second thing the compiler emits is a pointer to this blob of code. This pointer is written to a specially named section in the .OBJ called .CRT$XCU.Why the funny section name? It's a bit complicated. Let me throw another piece of data at you to help explain. If you examine the Visual C++ runtime library sources (for instance, CINITEXE.C), you'll find the following:
The previous lines of code create two data segments, .CRT$XCA and .CRT$XCZ. In each segment it places a variable (__xc_a and __xc_z, respectively). Note that the segment names are very similar to the .CRT$XCU segment to which the compiler emits the constructor code pointer. LIBCTINY's Minimal Startup RoutinesNow let's take a look at LIBCTINY's new support for small DLLs. As with EXEs, the trick is to make the DLL's entry point code as small as possible and omit calls to unneeded routines that bring in lots of other code. Figure 4 shows the minimal DLL startup code. When your DLL is loaded, it is this code, not your DllMain routine, that executes first.The _DllMainCRTStartup is the very first place execution begins in your DLL. In LIBCTINY's implementation, it first checks to see if the DLL is in its DLL_PROCESS_ATTACH call. If so, the code calls _atexit_init (described earlier), and _initterm to invoke any static constructors. The heart of the function is the call to DllMain, which is the routine you supply as part of your DLL's code. This DllMain call is made for all four notification types (process attach/detach, and thread attach/detach). The last thing DllMainCRTStartup does is to check if the DLL is in its DLL_PROCESS_DETACH code. If so, the code calls _DoExit. As described earlier, this causes any static destructors to be called. If you're curious about the startup code for console and GUI mode EXEs, be sure to check out CRT0TCON.CPP and CRT0TWIN.CPP, respectively. (These modules accompany the code download, found at the link at the top of this article.) One other thing worth checking out in DLLCRTO.CPP (see Figure 4) is this line near the top: ``` #pragma comment(linker, "/OPT:NOWIN98") ``` This puts a linker directive into the DLLCRT0.OBJ file that tells the linker to use the /OPT:NOWIN98 switch. The benefit is that you don't have to manually add /OPT:NOWIN98 to your make files or project files by hand. I figure if you're using LIBCTINY, you'd probably want to use /OPT:NOWIN98 as well. Using LIBCTINY.LIBUsing LIBCTINY is very simple. All you have to do is add LIBCTINY.LIB to the linker's list of .LIB files to search. If you're using the Visual Studio® IDE, this would be in the Projects | Settings | Link tab. It doesn't matter what type of binary you're building (console EXE, GUI EXE, or DLL), since LIBCTINY.LIB contains appropriate entry point routines for each of them.Take a look at TEST.CPP in Figure 5. This program simply exercises a few of the routines that LIBCTINY.LIB implements, and includes a static constructor and destructor invocation. When I compile it normally with Visual C++ 6.0,
the resulting executable is 32768 bytes. By simply adding LIBCTINY.LIB to the command line
the resulting executable shrinks to 3072 bytes. Metadata Article CorrectionIn my October 2000 MSDN Magazine article "Avoiding DLL Hell: Introducing Application Metadata in the Microsoft .NET Framework," I said that using the Visual C++ 6.0 #import directive causes the compiler to read in a COM type library and generate ATL-ready header files for all the interfaces contained within. While header files are generated by #import, it turns out they don't use ATL.Richard Grimes, author of Professional ATL COM Programming (Wrox Press, 1998), kindly pointed out to me that #import generates what Microsoft calls "compiler COM support classes," which are supported by the COMDEF.H header. Richard goes on to say, "There are many differences between the COM compiler support classes and the equivalent in ATL. The most important is that ATL does not use C++ exceptions. In fact, the ATL classes are more lightweight than the COM compiler support classes and so I would have preferred if Microsoft had decided to generate ATL code." I have to confess that I should have studied this more before I wrote it. My experience with ATL is limited to the wizards in Visual C++, and tweaking the resulting code. I have used #import on a few occasions, but not enough to have made the connection that the resulting code wasn't ATL. Thanks to Richard for pointing this out to me, and for giving me even more incentive to verify everything before I write about it. |
|
Matt Pietrek does advanced research for the NuMega Labs of Compuware Corporation, and is the author of several books. His Web site, at https://www.wheaty.net, has a FAQ page and information on previous columns and articles. |
From the January 2001 issue of MSDN Magazine