The hierarchy in flat memory: Heap and Stack
The hierarchy in flat memory: Heap and Stack
This section discusses Heap, related heap corruption/memory leak, and how to use pageheap to troubleshoot.
Heap is designed for efficiency use of the flat memory space
In Chinese version, it discusses why we need heap, and how it is built on the top of flat memory management. Then I go through different scenarios in detail. Here I just brief some of the points, and put the time to discuss the real cases.
Due to the nature of the heap, it pains when we meet the following issues:
1. Heap use after free
2. Heap use underflow and overflow
3. Double free
4. Multiple thread uses.
Pageheap is a OS built facility to enable debugging trace of heap manager. Please refer to:
How to use Pageheap.exe in Windows XP and Windows 2000
https://support.microsoft.com/kb/286470/en-us
Pageheap.exe download is available at:
https://www.heijoy.com/debugdoc/pageheap.zip
https://blogs.msdn.com/lixiong/attachment/2792912.ashx
A good resource is:
Debug Tutorial Part 3: The Heap
https://www.codeproject.com/debug/cdbntsd3.asp
Look at the following code, compile it in release mode:
char *p=(char*)malloc(1024);
p[1024]=1;
It overwrites 1 byte. In release mode, it does not crash. However, if we enable pageheap with the following command:
C:\Debuggers\pageheap>pageheap /enable mytest.exe /full
C:\Debuggers\pageheap>pageheap
mytest.exe: page heap enabled with flags (full traces )
Rerun it with pageheap enabled, the application crashes. However, if we change the code a little:
char *p=(char*)malloc(1023);
p[1023]=1;
Does it crashes even the pageheap is enabled?
It does not crashes even if pageheap is enabled with default setting. To debug such issue, we need to use /unaligned switch.
A similar case is the following code:
char *p=new char[1023];
p[-1]='c';
To debug it, we need to use /backwards switch.
Let’s perform other tests on above code. If we compile in debug mode, even with pageheap enabled, do them crash? Based on my test, they do not crash no matter what switch we use. Do you know why?
It is due to CRT debug heap. The debug version of CRT allocates extra memory for trace use at the end of normal block. The extra 1 byte overwriting just occurs on the extra memory, thus the crashes does not happen. This is a case that debug version does not really help debug.
Another sample is double free. Let’s check the following code:
char *p=(char*)malloc(1023);
free(p);
free(p);
Then try to test with the following conditions:
1. Disable pageheap, test debug build and release build.
2. Enable pageheap, test debug build and release build.
You should observe different behaviors. What’s the reason?
It is also due to debug CRT version. When CRT detects double free, it uses own way to report.
Besides heap corruption, another issue is heap fragmentation.
Heap fragmentation is often caused by one of the following two reasons
1. Small heap memory blocks that are leaked (allocated but never freed) over time
2. Mixing long lived small allocations with short lived long allocations
Both of these reasons can prevent the NT heap manager from using free memory efficiently since they are spread as small fragments that cannot be used as a single large allocation
For detailed info, please refer to:
The Windows XP Low Fragmentation Heap Algorithm Feature Is Available for Windows 2000
https://support.microsoft.com/?id=816542
For a vivid analysis, please refer to:
.NET Memory usage - A restaurant analogy
https://blogs.msdn.com/tess/archive/2006/09/06/742568.aspx
Another important use of pageheap is memory allocation trace. When enables trace function, heap manager records the callstack when heap operation occurs. It allows us to find out the recent callstacks of the heap operation when debugging heap issue. Look at the following sample:
char * getmem()
{
return new char[100];
}
void free1(char *p)
{
delete p;
}
void free2(char *p)
{
delete [] p;
}
int main(int, char*)
{
char *c=getmem();
free1(c);
free2(c);
return 0;
}
Enable pageheap with trace, run the application in windbg:
0:000> g
===========================================================
VERIFIER STOP 00000007: pid 0x1324: block already freed
015B1000 : Heap handle
003F5858 : Heap block
00000064 : Block size
00000000 :
===========================================================
(1324.538): Break instruction exception - code 80000003 (first chance)
eax=00000000 ebx=015b1001 ecx=7c81b863 edx=0012fa7f esi=00000064 edi=00000000
eip=7c822583 esp=0012fbe8 ebp=0012fbf4 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
ntdll!DbgBreakPoint:
7c822583 cc int 3
With pageheap enabled, when heap manager detects issue, it triggers break point exception to stop the debugger. It also dumps detailed information in debugger such as block already freed. With kb command, we can list the callstack when the second free occurs:
0:000> kb
ChildEBP RetAddr Args to Child
0012fbe4 7c85079b 015b1000 0012fc94 0012fc70 ntdll!DbgBreakPoint
0012fbf4 7c87204b 00000007 7c8722f8 015b1000 ntdll!RtlpPageHeapStop+0x72
0012fc70 7c873305 015b1000 00000004 003f5858 ntdll!RtlpDphReportCorruptedBlock+0x11e
0012fca0 7c8734c3 015b1000 003f0000 01001002 ntdll!RtlpDphNormalHeapFree+0x32
0012fcf8 7c8766b9 015b0000 01001002 003f5858 ntdll!RtlpDebugPageHeapFree+0x146
0012fd60 7c860386 015b0000 01001002 003f5858 ntdll!RtlDebugFreeHeap+0x1ed
0012fe38 7c81d77d 015b0000 01001002 003f5858 ntdll!RtlFreeHeapSlowly+0x37
0012ff1c 78134c3b 015b0000 01001002 003f5858 ntdll!RtlFreeHeap+0x11a
0012ff68 00401016 003f5858 003f5858 00000064 MSVCR80!free+0xcd
0012ff7c 00401198 00000001 003f57e8 003f3628 win32!main+0x16 [d:\xiongli\today\win32\win32\win32.cpp @ 77]
0012ffc0 77e523cd 00000000 00000000 7ffde000 win32!__tmainCRTStartup+0x10f
0012fff0 00000000 004012e1 00000000 78746341 kernel32!BaseProcessStart+0x23
The return address is 00401016 , thus Free occurs in the previous line of 00401016 . The problematic heap address is 0x3f5858 , with !heap command, we can get the saved callstack of the recent heap operation:
0:000> !heap -p -a 0x3f5858
address 003f5858 found in
_HEAP @ 3f0000
in HEAP_ENTRY: Size : Prev Flags - UserPtr UserSize - state
3f5830: 0014 : N/A [N/A] - 3f5858 (70) - (free DelayedFree)
Trace: 004f
7c860386 ntdll!RtlFreeHeapSlowly+0x00000037
7c81d77d ntdll!RtlFreeHeap+0x0000011a
78134c3b MSVCR80!free+0x000000cd
401010 win32!main+0x00000010
77e523cd kernel32!BaseProcessStart+0x00000023
Based on above saved callstack, at the previous line of 0x401010, a Free call already occurred. 00401016 and 00401010 nears, let’s check what they are:
0:000> uf 00401010
win32!main [d:\xiongli\today\win32\win32\win32.cpp @ 74]:
74 00401000 56 push esi
75 00401001 6a64 push 0x64
75 00401003 e824000000 call win32!operator new[] (0040102c)
75 00401008 8bf0 mov esi,eax
76 0040100a 56 push esi
76 0040100b e828000000 call win32!operator delete (00401038)
77 00401010 56 push esi
77 00401011 e81c000000 call win32!operator delete[] (00401032)
77 00401016 83c40c add esp,0xc
78 00401019 33c0 xor eax,eax
78 0040101b 5e pop esi
79 0040101c c3 ret
Based on above information, the double free is due to a delete call and a delete[] call. The corresponding source is in line 74. We can also check the 0x3f5858 address:
0:000> dd 0x3f5848
003f5848 7c88c580 0025a5f0 00412920 dcbaaaa9
003f5858 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0
003f5868 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0
003f5878 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0
003f5888 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0
003f5898 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0
003f58a8 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0
003f58b8 f0f0f0f0 a0a0a0a0 a0a0a0a0 00000000
Here the dcba is a flag. The address before the flag is used to save the callstack:
0:000> dds 00412920
00412920 00000000
00412924 00000001
00412928 0005004f
0041292c 7c860386 ntdll!RtlFreeHeapSlowly+0x37
00412930 7c81d77d ntdll!RtlFreeHeap+0x11a
00412934 78134c3b MSVCR80!free+0xcd
00412938 00401010 win32!main+0x10
0041293c 77e523cd kernel32!BaseProcessStart+0x23
This flag is useful for troubleshoot memory leak. The leaked memory is usually allocated in the same place. When there are a lot of leaked memories, there should be a lot of leaked heap pointers. Since every heap pointer contains the flag, by searching the flag, we can get corresponding callstack. If some callstack occurs frequently, usually the callstack is related to the leaked memory. The real case is:
Why out of memory when only 300MB memory is allocated.
The malloc call fails in customer application when total memory occupation is only 300MB. By checking the dump file, the issue is caused by heap fragmentation.
After enabling pageheap and captured the dump again, I used the following command to search dcba flag:
0:044> s -w 0 L?60030000 0xdcba
00115e9e dcba 0000 0000 ef98 0012 893d 0047 efc8 ..........=.G...
…
19b90fe6 dcba cfe8 02d8 afe8 2ca3 cfe8 02d8 b22a .........,....*.
19b92fe6 dcba cfe8 1a52 8fe8 1dff cfe8 1af6 f44f ....R.........O.
19b9cfce dcba efd0 23d8 cfd0 1c58 8fd0 15ac c0c0 .....#..X.......
…
2b06efe6 dcba cfe8 02d8 8fe8 258b cfe8 02d8 a6d2 .........%......
2b074fce dcba 2fd0 1c0f afd0 1c4d dfd0 0e69 c0c0 .../....M...i...
…
2e860fe6 dcba afe8 02d8 2fe8 2ef3 afe8 02d8 0a0b ......./........
2e868fce dcba afd0 0881 2fd0 2e92 afd0 0881 c0c0 ......./........
Based on the search result, I use the following command to print the callstack randomly:
0:044> dds poi(19b92fe6 -6)
005bba0c 005cbe90
005bba10 00031c49
005bba14 00122ddb
005bba18 77fa8468 ntdll!RtlpDebugPageHeapAllocate+0x2f7
005bba1c 77faa27a ntdll!RtlDebugAllocateHeap+0x2d
005bba20 77f60e22 ntdll!RtlAllocateHeapSlowly+0x41
005bba24 77f46f5c ntdll!RtlAllocateHeap+0xe3a
005bba28 0046b404 Customer_App+0x6b404
005bba2c 0046b426 Customer_App+0x6b426
005bba30 00427612 Customer_App+0x27612
0:044> dds poi(19b9cfce -6)
005bba0c 005cbe90
005bba10 00031c49
005bba14 00122ddb
005bba18 77fa8468 ntdll!RtlpDebugPageHeapAllocate+0x2f7
005bba1c 77faa27a ntdll!RtlDebugAllocateHeap+0x2d
005bba20 77f60e22 ntdll!RtlAllocateHeapSlowly+0x41
005bba24 77f46f5c ntdll!RtlAllocateHeap+0xe3a
005b8024 0046b404 Customer_App+0x6b404
005b8028 0046b426 Customer_App+0x6b426
005b802c 00427a82 Customer_App+0x27a82
0:044> dds poi(2b06efe6 -6)
005bba0c 005cbe90
005bba10 00031c49
005bba14 00122ddb
005bba18 77fa8468 ntdll!RtlpDebugPageHeapAllocate+0x2f7
005bba1c 77faa27a ntdll!RtlDebugAllocateHeap+0x2d
005bba20 77f60e22 ntdll!RtlAllocateHeapSlowly+0x41
005bba24 77f46f5c ntdll!RtlAllocateHeap+0xe3a
005bd5d4 0046b404 Customer_App+0x6b404
005bd5d8 0046b426 Customer_App+0x6b426
005bd5dc 00427612 Customer_App+0x27612
In normal condition, the callstack to allow memory should be random. However, above analysis shows that most of the heap pointers are allocated by the same callstack. The callstack is likely the root cause. By matching with PDB, I got the function name, and the customer confirmed the leak in that function.
Comments
- Anonymous
September 12, 2007
PingBack from http://alright.wordpress.com/2007/09/13/heap-corruption-diagnosis-links/