Getting the Crashing Stack From a Bugcheck

Sorry for the long delay on posting - I have been slammed lately. I decided to write a post about debugging and take a short break from the bare metal stuff we have been discussing as of late. :)

When a bugcheck occurs in Windows, the following basic sequence of events occurs:

1. An exception ("exception" is used in the loosest sense of the term here) occurs that causes the system to call KeBugCheck()

2. The KeBugCheck routine runs on the thread that called KeBugCheck()

3. The BSOD shows up on the screen

This seems like totally useless information so far - and it is. However, for anyone that has ever had to debug a system crash - this sequence can lead to frustration. Why? Because sometimes you need to see what happened as the bugcheck occurred and you need to examine the stack to find valuable clues to aid your debug efforts. Unfortunately you can't see what the stack held because KeBugCheck() ran and overwrote the stack and all its juicy details. This is a real pain! After encountering this problem for the 11th time I added some code to the bugcheck path for Vista and forward that tries to rectify this problem. The additional code saves the current stack off to a separate area in non-paged pool before the bugcheck code runs. This is really useful for debugging in many situations as the bugcheck code pretty much re-uses the whole stack. This is especially useful for finding out what calls were active at the time of the bugcheck or for digging out stack values.

So how do you use it? Well - it is very easy actually. There is a global public symbol in ntoskrnl.exe (et al) called KiPreBugcheckStackSaveArea.

From the debugger you can just do this:

kd> dds KiPreBugcheckStackSaveArea KiPreBugcheckStackSaveArea+3000

This will dump the stack as dwords (you will need to use "dqs" for 64 bit and it will dump qwords) and resolve symbols if the dwords are within the range of a loaded module. The "+3000" comes from the public header value KERNEL_STACK_SIZE. This allows us to dump the entire stack. Here is a snippet from a dump from my machine:

...
8192bc64 807c70cb storport!RaidpAdapterContinueScatterGather
8192bc68 ad5df8b0
8192bc6c 818b9c58 nt!KeQueryCurrentStackInformation+0xb7
8192bc70 87087030
8192bc74 00000008
8192bc78 ad5e0000
8192bc7c ad5dd000
8192bc80 ad5dfdb8
8192bc84 ad5e0000
8192bc88 00000000
8192bc8c 00000000
8192bc90 ad5df8d0
8192bc94 818d98e1 nt!KiSavePreBugcheckStack+0x66
8192bc98 819293e0 nt!KiPreBugcheckStackSaveArea
8192bc9c ad5dd000
8192bca0 00003000
8192bca4 ad5e0000
8192bca8 00000003
8192bcac ad5dd000
8192bcb0 ad5dfc78
8192bcb4 818d8c2d nt!KeBugCheck2+0x7a
8192bcb8 00000000
...

Notice that you can see the place on the stack where KeBugCheck was called and KiPreBugcheckStackSaveArea was pushed onto the stack. You can look from this address on the stack and higher to see what was running when the bugcheck occurred (remember the stack grows down to lower addresses). There are a few caveats (aren't there always!). One is that if there are parallel bugchecks occurring - the stack save area only records the first one. I am hoping that the probability of parallel bugchecks is low - although they can occur especially in the face of a machine check. The other caveat is that the addresses of the save area are not the same as the stack. This may seem really obvious but it is an important point to call out as it has bitten me several times! :)

Well - I hope that this is useful information for at least one person. Thanks for reading!