Investigating a GSCookie Corruption
GSCookie helps in finding buffer overrun cases on the stack and terminates the application when this is detected. In the .NET environment this would happen when calling into a native function when the function overrides the buffer on the stack corrupting the GSCookie put in place. Here is a similar case.
I have a sample app which crashed with the following callstack:
0:000> kb
ChildEBP RetAddr Args to Child
0012e96c 7c90e89a 7c801e36 ffffffff c0000409 ntdll!KiFastSystemCallRet
0012e970 7c801e36 ffffffff c0000409 0012ecb8 ntdll!ZwTerminateProcess+0xc
0012e980 7a2888bf ffffffff c0000409 76faacd8 KERNEL32!TerminateProcess+0x20
0012ecb8 7a0ca47e 7a122f89 001a1d88 0012f348 mscorwks!__report_gsfailure+0x102
0012ecbc 7a122f89 001a1d88 0012f348 0012eea8 mscorwks!DoJITFailFast+0x5
0012eccc 79e8f09c 00000000 79f8d752 00193b58 mscorwks!CrawlFrame::SetCurGSCookie+0x1c
0012eeb4 79e8dc07 0012eedc 79f8d752 0012f208 mscorwks!Thread::StackWalkFramesEx+0x794
0012f1e4 79f8d650 79f8d752 0012f208 00000500 mscorwks!Thread::StackWalkFrames+0xb8
0012f214 79f9338b 00193b58 00000002 00000002 mscorwks!CNameSpace::GcScanRoots+0x119
0012f258 79f92cbf 00000001 00000000 7a3b8ae0 mscorwks!WKS::gc_heap::mark_phase+0x93
0012f27c 79f93245 00000000 7a3b8bc8 00080101 mscorwks!WKS::gc_heap::gc1+0x62
0012f290 79f92f5a 00000002 00000001 00193b58 mscorwks!WKS::gc_heap::garbage_collect+0x253
0012f2bc 79f3d229 00000002 00000000 0012f2ec mscorwks!WKS::GCHeap::GarbageCollectGeneration+0x1a9
0012f2cc 79f3d265 00000002 00000000 00193b58 mscorwks!WKS::GCHeap::GarbageCollectTry+0x33
0012f2ec 79f3d124 00000002 00000000 00000000 mscorwks!WKS::GCHeap::GarbageCollect+0x67
0012f398 79269944 792ed8b4 0000000a 00193b58 mscorwks!GCInterface::CollectGeneration+0xaa
0012f404 00401240 00000001 003b5070 003b5a50 mscorlib_ni+0x1a9944
0012f460 79e71b4c 00000000 00193b58 0012f4ac TestG!__tmainCRTStartup+0x10f [f:\rtm\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 583]
0012f490 79e821b1 0012f560 00000000 0012f530 mscorwks!CallDescrWorker+0x33
0012f510 79e96501 0012f560 00000000 0012f530 mscorwks!CallDescrWorkerWithHandler+0xa3
__report_gsfailure evidently points that the crash is due to the GS cookie being overrun and hence the process is terminated. Lets have a look at the code:
C++/CLI
#include "windows.h"
using namespace System;
typedef int (*NATIVE_FUNCTION_PTR) (TCHAR**, int );
#pragma managed
void Foo()
{
HINSTANCE hinstLib = LoadLibrary(L"NativeDll.dll");
NATIVE_FUNCTION_PTR Proc = (NATIVE_FUNCTION_PTR) GetProcAddress(hinstLib, "NativeFunction");
TCHAR* str = NULL;
(Proc)(&str, 100);
Console::WriteLine("Done");
GC::Collect(2);
}
#pragma managed
int main()
{
Foo();
return 0;
}
Note: Please keep in mind that calling GC.Collect is not a good idea. I have explicitly called GC.Collect because its when the GC runs it checks to see if the GSCookie has been overrun or not.
So we are calling NativeFunction residing in NativeDll.dll and its fairly very simple code. Convinced that we don’t have a problem here, we would definitely want to have a look at the NativeFunction and the following is its definition:
extern "C"
__declspec(dllexport) int NativeFunction(TCHAR** str, int i)
{
*str = new TCHAR[i];
_tcscpy(*str, _T("Hello World from Native function\n"));
return 1;
}
Hmm. So we are passing an integer value 100 to the second argument and its allocating buffer large enough to hold the string. So this cant cause any problems. Now, even if the buffer was small, the above code should crash due to corruption of GC heap and not the stack. So whats happening here ?
Apart from overriding a buffer causing GSCookie corruption another reason could be the calling convention used. By default, .NET uses the cdecl convention to call a native function. Since I had the project file for the dll, it is easy for me to look into the Project Properties->Configuration Properties->C/C++->Advanced->Calling Convention (in Visual Studio) to see which calling convention is used. Now if I don’t have the project file and then I could disassemble it and investigate :
0:000> uf nativedll!NativeFunction
NativeDll!NativeFunction [d:\blog\gscookie\testg_test\testg\nativedll\nativedll.cpp @ 12]:
12 100018c0 55 push ebp
12 100018c1 8bec mov ebp,esp
12 100018c3 51 push ecx
13 100018c4 33c9 xor ecx,ecx
13 100018c6 8b450c mov eax,dword ptr [ebp+0Ch]
13 100018c9 ba02000000 mov edx,2
13 100018ce f7e2 mul eax,edx
13 100018d0 0f90c1 seto cl
13 100018d3 f7d9 neg ecx
13 100018d5 0bc8 or ecx,eax
13 100018d7 51 push ecx
13 100018d8 e823f7ffff call NativeDll!operator new[] (10001000)
13 100018dd 83c404 add esp,4
13 100018e0 8945fc mov dword ptr [ebp-4],eax
13 100018e3 8b4508 mov eax,dword ptr [ebp+8]
13 100018e6 8b4dfc mov ecx,dword ptr [ebp-4]
13 100018e9 8908 mov dword ptr [eax],ecx
14 100018eb 6828220010 push offset NativeDll!GS_ExceptionPointers+0x148 (10002228)
14 100018f0 8b5508 mov edx,dword ptr [ebp+8]
14 100018f3 8b02 mov eax,dword ptr [edx]
14 100018f5 50 push eax
14 100018f6 ff1580200010 call dword ptr [NativeDll!_imp__wcscpy (10002080)]
14 100018fc 83c408 add esp,8
16 100018ff b80a000000 mov eax,0Ah
17 10001904 8be5 mov esp,ebp
17 10001906 5d pop ebp
17 10001907 c20800 ret 8 à So the native function is cleaning up the stack which means it’s a stdcall
Since there is no calling convention specified by the caller (C++/CLI), cdecl is used. So we are in the situation where both the caller and the callee clean up the stack thereby corrupting the stack. So simple ways to correct this would be
Change the calling convention used by the caller :
typedef int ( __stdcall *NATIVE_FUNCTION_PTR) (TCHAR**, int );
or
if possible changing the nativedll to compile with /Gd (__cdecl calling convention)
Thank you GS Cookie in fixing this bug.
-Sanjay Kumar K