Manual Stack unwinding – just for the fun of it
There is rarely a need to unwind a healthy stack manually since the debugger can do this for you, but if you suspect some stack corruption or are investigating stack fault issues, this information may lead you to your problem.
It is important to keep in mind that we are ‘undoing’ what has already been done. We subtract where we have previously added and vice-versa. Remember that stacks start high and grow DOWN. I also find that color coding different frame sections helps in keeping your place and identifying potential issues.
I chose a small example that demonstrates the process and have posted the source so you can follow along on your own device. When compiled and run we end up with a call stack such as:
0x1602d844 COREDLL!_winput() line 230 + 16 bytes
0x1602fb8c COREDLL!swscanf(_FILEX {...}) line 82
0x1602fbe0 UNWINDER!WinMain(HINSTANCE__ * 0x62a3cdfa, HINSTANCE__ * 0x00000000, …
0x1602fe1c UNWINDER!WinMainCRTStartup() line 21 + 20 bytes
0x1602fe3c COREDLL!MainThreadBaseFunc(HINSTANCE__ * 0x83fdc930, unsigned long 0x00000000…
Note, if you do not see the stack frame addresses to the right of your stack window, right-click and choose Frame Pointer so these values are displayed. It is also likely that the VMBase (the 0x16…) will be different on your device, but consistent within the stack frame. The following example was cut & pasted from an actual Platform Builder session – you can get the stack frame memory information directly from the Memory Window (View -> Debug Windows -> Memory), Stack window and dissasembler windows.
The first step is always to find the current value of the stack pointer (SP). This value will be located in the Registers debug window and depending on where you stopped will like be pointing at the last function entered. This is the low point in your stack frame and gives you a convenient place to start. Unwinding the stack begins in the function prolog where you can witness the stack growth due to the needs of the current function.
Disasm PROLOG for _input() 223: int __cdecl _input ( 224: FILEX *stream, 225: const unsigned char *format, 226: va_list arglist 227: ) 228: #endif 229: 230: { 03F47B9C stmdb sp!, {r4 - r11, lr} 03F47BA0 ldr r12, [pc, #0x1EC] 03F47BA4 add sp, sp, r12 $M17570: 03F47BA8 mov lr, r1 03F47BAC mov r5, r0 Register map for first call: R0 = 1602FB8C R1 = 0001106C R2 = 1602FBD8 R3 = 00000010 R4 = 0001106C R5 = 1602FED8 R6 = 00000000 R7 = 62A3CDFA R8 = 01FF89E0 R9 = 1602FED8 R10 = 62A3CDFA R11 = 1602FE3C R12 = FFFFDCDC Sp = 1602D844 Lr = 03F37748 Pc = 03F47BAC Cpsr = 60000010 |
Stack frame memory +--Addr--+--Value--+ 1602D844 00000000 >> Bottom of stack, SP 1602D848 00000000 1602D84C 00000000 1602D850 00000000 … 1602FB54 00000000 1602FB58 00000000 1602FB5C 00000000 1602FB60 00000000 1602FB64 00000000 1602FB68 0001106C >> (SP – R12), Load R4 1602FB6C 1602FED8 >> R5 1602FB70 00000000 >> R6 1602FB74 62A3CDFA >> R7 1602FB78 01FF89E0 >> R8 1602FB7C 1602FED8 >> R9 1602FB80 62A3CDFA >> R10 1602FB84 1602FE3C >> R11 1602FB88 03F37748 >> LR 1602FB8C 1602FBF0 >> SP CoreDLL!swscanf() 1602FB90 00000010 … |
For our example, we use a contrived program that is currently executing in _input –if you look at the bottom-most line in the prolog you will see SP incrementing by the value stored in R12:
03F47BA4 add sp, sp, r12
Since we are actually unwinding the stack we will ADD this value: 1602D844 + FFFFDCDC = 1602FB68. If you do the math, you will see this is a fairly large value and is needed to store the local variables used in _input().
03F47B9C stmdb sp!, {r4 - r11, lr}
The next that is important to the stack is responsible for storing several register values and decrement the stack at the same time. Also important here is that we store the value of the LR register which contains the address we want to jump back to when this function returns.
The SP is now pointing at the previous frame UNWINDER!WinMain() and we dissect it using the same Prolog method:
Disasm PROLOG for _swscanf() 58: 59: int __cdecl swscanf ( 60: REG2 const wchar_t *string, 61: const wchar_t *format, 62: ... 63: ) 64: { 03F37700 mov r12, sp 03F37704 stmdb sp!, {r0 - r3} 03F37708 stmdb sp!, {r4, r12, lr} 03F3770C sub sp, sp, #0x38 $M16727: 03F37710 mov r4, r1 Register map for end of second call: R0 = 1602FBF0 R1 = 0001106C R2 = 1602FBE8 R3 = 00000000 R4 = 00000005 R5 = 1602FED8 R6 = 00000000 R7 = 62A3CDFA R8 = 01FF89E0 R9 = 1602FED8 R10 = 62A3CDFA R11 = 1602FE3C R12 = 1602FBE0 Sp = 1602D844 Lr = 00011114 Pc = 03F47BAC Cpsr = 60000010 |
Stack frame memory +--Addr--+--Value--+ … … 1602FB84 1602FE3C >> R11 1602FB88 03F37748 >> LR 1602FB8C 1602FBF0 >> SP CoreDLL!swscanf() 1602FB90 00000010 … … 1602FBAC 80130728 1602FBB0 04F02001 1602FBB4 0C0D4570 Storage for stack vars in 1602FBB8 0A01CFF0 swscanf() -> 0x038 1602FBBC 00000009 1602FBC0 000002B8 1602FBC4 00000005 >> new SP, Store R4 1602FBC8 1602FBE0 >> R12 1602FBCC 00011114 >> LR (note: no VMBase) 1602FBD0 1602FBF0 >> R0 1602FBD4 0001106C >> R1 1602FBD8 1602FBE8 >> R2 1602FBDC 00000000 >> R3, 1602FBE0 00000004 >> SP UnWinder!WinMain() 1602FBE4 00000001 1602FBE8 00000000 … |
Always starting bottom up, we find the prolog of this function:
03F3770C sub sp, sp, #0x38
Take the current stack pointer and subtract 0x038 DWORDs, this is the storage space used by the local variables. Take now of how much smaller the requirements _swcanf() (0x038) has than _input() (0x02324)? This is an important lesson when writing efficient code.
03F37708 stmdb sp!, {r4, r12, lr}
Next we store the values in R4, R12 and LR while incrementing our SP.
03F37704 stmdb sp!, {r0 - r3}
The last operation needed for this function is to store the contents of R0 – R3 on the stack while incrementing our SP.
Disasm PROLOG for WinMain() 32: int WINAPI 33: WinMain( 34: HINSTANCE hInstance, 35: HINSTANCE hPrevInstance, 36: LPWSTR lpCmdLine, 37: int iCmdShow 38: ) 39: { 160110B0 mov r12, sp 160110B4 stmdb sp!, {r0 - r3} 160110B8 stmdb sp!, {r12, lr} 160110BC sub sp, sp, #0x89, 30 $M28490: Register map for end of third call: R0 = 62A3CDFA R1 = 00000000 R2 = 1602FED8 R3 = 00000005 R4 = 00000005 R5 = 1602FED8 R6 = 00000000 R7 = 62A3CDFA R8 = 01FF89E0 R9 = 1602FED8 R10 = 62A3CDFA R11 = 1602FE3C R12 = 1602FE1C Sp = 1602D844 Lr = 000111D0 Pc = 03F47BAC Cpsr = 60000010 |
Stack frame memory +--Addr--+--Value--+ … 1602FBD4 0001106C >> R1 1602FBD8 1602FBE8 >> R2 1602FBDC 00000000 >> R3, 1602FBE0 00000004 >> SP UnWinder!WinMain() 1602FBE4 00000001 1602FBE8 00000000 … … 1602FDF0 00000000 SP + (0x89 * 4) 1602FDF4 00000000 1602FDF8 00000000 1602FDFC 00005F79 1602FE00 00011414 1602FE04 1602FE1C >> New SP, store R12 1602FE08 000111D0 >> LR (note: no VM base) 1602FE0C 62A3CDFA >> R0 1602FE10 00000000 >> R1 1602FE14 1602FED8 >> R2 1602FE18 00000005 >> R3 1602FE1C 00000000 >> SP UNWINDER!WinMainCRTStartup() 1602FE20 83FDC930 1602FE24 00000000 … |
Again starting from the bottom we find the prolog of this function:
160110BC sub sp, sp, #0x89, 30
This stack operation is a little tricky since we have a shifter operand, sometimes this requires us to pull our ARM reference manual to see exactly how this shift will affect our final number. This particular math works out to be (0x89 * 4).
160110B8 stmdb sp!, {r12, lr}
Next we store off the contents of R12 & LR.
160110B4 stmdb sp!, {r0 - r3}
As well as the R0 – R3 registers on the stack, which backs us up to UNWINDER!WinMainCRTStartup() Which is dissected the same way and you continue this way until you run out of stack entries.
Note 1: If you look closely at the ASM lines you will see the “!” parameter directly after the SP register name. This is the “base register writeback” flag and is required with this type of operation. As always, your ARM reference manual will have the official documentation regarding syntax.
Note 2: Most stacks are 64kb in size and follow along a 64kb boundary. This is important since we can quickly identify if the entire stack is being unwound by the debugger. Using our example:
… … …
0x1602fe1c UNWINDER!WinMainCRTStartup() line 21 + 20 bytes
0x1602fe3c COREDLL!MainThreadBaseFunc(HINSTANCE__ * 0x83fdc930, unsigned long 0x00000000…
You will see the function COREDLL!MainThreadBaseFunc() begins very near a 64kb boundary and is likely OK – it also helps that the function names sounds like the beginning of a stack. But if we have a stack that looks like:
… … …
0x24025480 GWES!MsgQueue::SendMessageW_I() line 4720 + 28 bytes
0x240254c0 COREDLL!DoSendMessageWInGwe() line 2641 + 32 bytes
0x240254d8 COREDLL!SendMessageW() line 2926
0x240254ec COREDLL!ImmGenerateMessage() line 5559 + 20 bytes
(Careful - Likely not telling you the whole story!)
Note how we are not near a 64kb boundary and the function name doesn’t look like the beginning of call stack? We are most definitely not seeing the entire stack for this call and need to employ the techniques above to help understand what happened.
Note 3: Windows CE also employs the notion of two separate 4kb guard pages that will detect if your stack has run out of room. Whenever the Program Counter (or any memory access) wanders into the first guard page – the kernel will fire an exception and your application has the opportunity to handle it. Venturing into the bottom guard page signals termination of your thread by the kernel and no recovery is possible. You can read more about stacks and memory architecture by looking at the Core OS design section in Platform Builder help.
Comments
- Anonymous
August 06, 2006
equipoise <a href=http://equipoise.275mb.com>equipoise</a> - Anonymous
September 14, 2006
jbroxson@microsoft.com
&nbsp;
Back in February, the Doctor talked about manually unwinding stacks.&nbsp;... - Anonymous
September 11, 2008
- Introduction Stack fault happens whenever a thread’s stack is almost used up, and the Windows CE kernel
Anonymous
October 21, 2008
PingBack from http://www.lovelimitless.net/debugging-stack-fault/Anonymous
October 21, 2008
PingBack from http://www.oomphphoto.net/debugging-stack-fault/Anonymous
January 21, 2009
PingBack from http://www.keyongtech.com/3120574-catch-stack-overflow-in-anAnonymous
July 17, 2009
what if the stack corruption happened? Any suggestion for finding why it occured? Here my error log: 11:30:55.789 Prefetch Abort: Thread=994c2a40 Proc=8090b3c0 'shell32.exe' 11:30:55.789 AKY=00000041 PC=01542300(???+0x01542300) RA=01542300(???+0x01542300) BVA=01542300 FSR=00000005 And the registers got from dumpfile: R0 = 00000102 R1 = 00000002 R2 = 00000010 R3 = 00000000 R4 = 99944846 R5 = 0154583C R6 = 00000000 R7 = 7C089BC0 R8 = 00000102 R9 = 0F9DFE6C R10 = 00000000 R11 = 00000400 R12 = 03F61BA8 Sp = 0F9DFE5C Lr = 01542300 Pc = 01542300 Cpsr = 60000010 how to know Pc = 01542300 pointing to which dll / thread / function? Thanks