Intro to kernel debugging 1
Topic: KD Setup
I am a user-mode developer, but part of the job of working on the Windows team (HoloLens runs on Windows!) requires knowing how to work with a kernel debugger on that OS. Some problems are difficult to debug through user-mode debuggers alone and can be simpler in a kernel debugger . Examples include:
- Failure to launch process
- Early-stage OS boot or similar
- Inter-process communication
But how does one learn how to use the kernel debugger on Windows if the code you write only runs in user mode? Many tutorials are intended for driver authors. I intend to author a brief intro to kernel debugging from the perspective of someone who doesn't write code there. However, my perspective also includes being a Microsoft employee. As such, I have access to source code and symbols that the general public does not have. Kernel debugging is likely more applicable to someone in my position.
There are some topics that you should learn outside of this tutorial that will make you more effective as a kernel debugger:
- Familiarity with debugging, particularly with any one of: {windbg, cdb, kd}
- Difference between kernel mode and user mode execution
- High-level understanding of interrupts and IRQLs
I learned about these topics while on the job or through reading the "Windows Internals" book by Russinovich / Solomon / Ionescu. If you don't want to read that book, find some other way to get familiar with these concepts, as it really helps.
Let's get started with kd!
Note: This tutorial is part of a series. See other parts of the series here:
- Intro to kernel debugging 2
- Debugger Context
- Intro to kernel debugging 3
- Probing, altering user mode memory
Terminology
- Debug target - The machine being interrogated by the debugger
- Debug host - The machine doing the interrogating through the debugger. Likely your dev machine.
Set up apparatus
In the past, setting up a kd was a cumbersome activity. Today, we can whip up a virtual machine and hook up a kernel debugger with a few commands. The following is how a kd is set up through a Hyper-V based machine:
- Install your Windows OS on the VM
- In the Hyper-V settings for the VM, set COM 1 to use a Named Pipe. I will name mine "test1", which ends up creating a named pipe at \\.\pipe\test1
- In an elevated command prompt from within the OS running in the VM (the "debug target"), execute the following commands:
- bcdedit /set {default} debug on
- bcdedit /dbgsettings serial debugport:1 baudrate:115200
- Launch your favorite kernel debugger from your debug host (your dev machine). My favorite is windbg.exe.
- (In windbg, I press control+K to open up the kd window and specify settings through there)
- Set the debugger to connect to connect with the following settings:
- Type = COM
- Baud Rate = 115200
- Connect through Pipe
- Port = \\.\pipe\test1
- Set it to reconnect automatically with Resets = 0
- Reboot the VM OS (the "debug target")
- As soon as the debug target gets far enough along in the boot process, the kernel debugger will automatically attach
You will see something similar to the following when the debugger officially attaches:
Opened \\.\pipe\test1 Waiting to reconnect... Connected to Windows 10 xxxxx x86 compatible target at (Fri Jun 10 14:26:44.374 2016 (UTC - 7:00)), ptr64 FALSE Kernel Debugger connection established.
First kernel debugging commands
After you have connected, you can break in at any moment in order to see what's going on. Press control+break (windbg) or control+c (kd, cdb) to break in:
Break instruction exception - code 80000003 (first chance) ******************************************************************************* * * * You are seeing this message because you pressed either * * CTRL+C (if you run console kernel debugger) or, * * CTRL+BREAK (if you run GUI kernel debugger), * * on your debugger machine's keyboard. * * * * THIS IS NOT A BUG OR A SYSTEM CRASH * * * * If you did not intend to break into the debugger, press the "g" key, then * * press the "Enter" key now. This message might immediately reappear. If it * * does, press "g" and "Enter" again. * * * ******************************************************************************* nt!RtlpBreakWithStatusInstruction: 819a4bc4 cc int 3
The first thing I always do when connecting to a debugger is make sure the symbols are resolved, loaded, and cached. I normally have my sympath set in the _NT_SYMBOL_PATH environment variable, but you can also set it explicitly with the ".sympath" command. Non-Microsoft employees should include the public symbol server at Microsoft as follows:
2: kd> .sympath cache*c:\sym;c:\MySymbolPath1;\\MySymbolServer\SomePath2\foo\bar;srv*https://msdl.microsoft.com/download/symbols Symbol search path is: cache*c:\sym;c:\MySymbolPath1;\\MySymbolServer\SomePath2\foo\bar;srv*https://msdl.microsoft.com/download/symbols Expanded Symbol search path is: cache*c:\sym;c:\mysymbolpath1;\\mysymbolserver\somepath2\foo\bar;srv*https://msdl.microsoft.com/download/symbols ************* Symbol Path validation summary ************** Response Time (ms) Location Deferred cache*c:\sym Deferred c:\MySymbolPath1 Deferred \\MySymbolServer\SomePath2\foo\bar Deferred srv*https://msdl.microsoft.com/download/symbols
Once the sympath is set, try loading all symbols in order to get the major symbols cached locally. If this is your first time doing this, it can take a long time, on the order of 3-5 minutes. Symbol files can be large.
2: kd> .reload /f *.* Press ctrl-c (cdb, kd, ntsd) or ctrl-break (windbg) to abort symbol loads that take too long. Run !sym noisy before .reload to track down problems loading symbols.
If a particular file has trouble loading and you feel you need it, turn on verbose symbol resolving (!sym noisy) and force that one module to reload (.reload /f foo.dll).
Once symbols have loaded, you can use the 'lm' command to see which symbols (if any) are loaded for each module:
2: kd> lm start end module name 81341000 8134c000 kdcom (private pdb symbols) c:\sym\kdcom.pdb\8F834143CF1A42A3BC536396DA9853A91\kdcom.pdb 8181d000 8187e000 hal (private pdb symbols) c:\sym\halmacpi.pdb\EE1E389525C24DBEBF46DDDD10F6F6DF1\halmacpi.pdb 8187e000 81e95000 nt (private pdb symbols) c:\sym\ntkrpamp.pdb\73A615FD399C45A7A669ACFBAEBC82201\ntkrpamp.pdb 82000000 82072000 storport (private pdb symbols) c:\sym\storport.pdb73EE2F1FFD246009F9E36C45650BBF91\storport.pdb 82080000 8208a000 Fs_Rec (private pdb symbols) c:\sym\fs_rec.pdb\765EE27940FF458D95602B9CC799F83D1\fs_rec.pdb 82090000 820b7000 ksecpkg (private pdb symbols) c:\sym\ksecpkg.pdb\E5C81256D7F84A03B25B69E203B2A8261\ksecpkg.pdb 820c0000 820d1000 mpsdrv (private pdb symbols) c:\sym\mpsdrv.pdb\37DB180E8ABE4217B848B65D56E0B8481\mpsdrv.pdb 820f0000 82106000 mountmgr (private pdb symbols) c:\sym\mountmgr.pdb\9F665C8CB77344BE8D230633D672E6C71\mountmgr.pdb 82110000 82117000 intelide (private pdb symbols) c:\sym\intelide.pdb\A0F94CA95D964E47A325BA72DF9CCFF91\intelide.pdb 82120000 8212e000 PCIIDEX (private pdb symbols) c:\sym\pciidex.pdb\27855013369A402FA201B2DAFD92DE1B1\pciidex.pdb 82130000 82139000 atapi (private pdb symbols) c:\sym\atapi.pdb\CC79A012F9464F3BAA9D1B3926F6275D1\atapi.pdb 82140000 82169000 ataport (private pdb symbols) c:\sym\ataport.pdb\2274ED5AC2B344A8807C0CDD155B154A1\ataport.pdb 82170000 82182000 fileinfo (private pdb symbols) c:\sym\fileinfo.pdb\81692D1241464AE898EF26401F6FD0D11\fileinfo.pdb 82190000 821a2000 WimFsf (private pdb symbols) c:\sym\wimfsf.pdb\85AD06671DFF4CCABC15341F12D5571C1\wimfsf.pdb 821b0000 82200000 ks (private pdb symbols) c:\sym\ks.pdb\432B6F981F0441EDBED8C24DA0C4C2151\ks.pdb 82400000 825da000 Ntfs (private pdb symbols) c:\sym\ntfs.pdb\E9EF50EFE34D41FEA4703C0E78C9B0921\ntfs.pdb 825e0000 825ea000 storvsc (private pdb symbols) c:\sym\storvsc.pdb\BDCC5C0D12B44942AFB8CC7339081E401\st
(truncated for brevity)
Having symbols loaded for some critical modules is vital to making any sense of anything. If you aren't getting symbols to load for the 'nt' module, stop what you are doing and figure it out. Most debugger commands will likely NOT work properly unless you have the 'nt' symbols loaded. Most other modules in the kernel space are safe to not have symbols loaded for ordinary debugging, but 'nt' is crucial.
Exploring kernel land
My key to exploring the kernel space is the !process command. I recommend looking at your debugger docs for the command; the debugger docs are well-maintained and very informative. The !process command allows you to enumerate and query data for all processes in the system. Here are some sample uses for !process:
Enumerate all processes
2: kd> !process 0 0 **** NT ACTIVE PROCESS DUMP **** PROCESS 9fc41040 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000 DirBase: 001a9000 ObjectTable: 82804000 HandleCount: 510. Image: System PROCESS a514c8c0 SessionId: none Cid: 0174 Peb: 02471000 ParentCid: 0004 DirBase: 7ffe0020 ObjectTable: 829a8bc0 HandleCount: 46. Image: smss.exe
(truncated for brevity)
Locate a specific process
This example is looking for all instances of meason_test.exe, which is a test app I created that does nothing but Sleep(INFINITE).
1: kd> !process 0 0 meason_test.exe PROCESS ab870bc0 SessionId: 0 Cid: 01dc Peb: 00458000 ParentCid: 0af8 DirBase: 7ffe04c0 ObjectTable: b0533040 HandleCount: 28. Image: meason_test.exe
Query all threads for a specific process
This example references the PROCESS address found in the previous example, in order to restrict output to one specific process (rather than all processes that match the string specified)
1: kd> !process ab870bc0 2 PROCESS ab870bc0 SessionId: 0 Cid: 01dc Peb: 00458000 ParentCid: 0af8 DirBase: 7ffe04c0 ObjectTable: b0533040 HandleCount: 28. Image: meason_test.exe THREAD ac5435c0 Cid 01dc.0fe8 Teb: 00459000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Non-Alertable ffffffff NotificationEvent THREAD ae63ba00 Cid 01dc.095c Teb: 0045a000 Win32Thread: 00000000 WAIT: (WrQueue) UserMode Alertable ae6147c0 QueueObject THREAD 9fcac040 Cid 01dc.07cc Teb: 0045b000 Win32Thread: 00000000 WAIT: (WrQueue) UserMode Alertable ae6147c0 QueueObject
Query call stacks for all threads
This command lets you see the kernel mode portion of the call stack for each thread. Note, I had to obscure the nt and ntdll symbol names and source code paths, as I am not sure what the private symbol server exposes relative to the public symbol server.
1: kd> !process ab870bc0 17 PROCESS ab870bc0 SessionId: 0 Cid: 01dc Peb: 00458000 ParentCid: 0af8 DirBase: 7ffe04c0 ObjectTable: b0533040 HandleCount: 28. Image: meason_test.exe VadRoot ab89f730 Vads Clone 0 Private . Modified . Locked . DeviceMap 82807ba0 Token ad57c438 ElapsedTime 00:00:04.735 UserTime 00:00:00.000 KernelTime 00:00:00.000 QuotaPoolUsage[PagedPool] 16472 QuotaPoolUsage[NonPagedPool] 1472 Working Set Sizes (now,min,max) (, , ) (KB, KB, KB) PeakWorkingSetSize VirtualSize Mb PeakVirtualSize Mb PageFaultCount MemoryPriority BACKGROUND BasePriority CommitCharge THREAD ac5435c0 Cid 01dc.0fe8 Teb: 00459000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Non-Alertable ffffffff NotificationEvent Not impersonating DeviceMap 82807ba0 Owning Process Image: meason_test.exe Attached Process N/A Image: N/A Wait Start TickCount 1386973 Ticks: (0:00:00:04.015) Context Switch Count 39 IdealProcessor: 3 UserTime 00:00:00.000 KernelTime 00:00:00.000 Win32 Start Address meason_test!mainCRTStartup (0x00ad5bc0) Stack Init Current Base Limit Call Priority BasePriority PriorityDecrement IoPriority 2 PagePriority 5 ChildEBP RetAddr Args to Child a9b7cc04 818be3a5 00000000 ac543678 ac5435c0 nt!(omitted)+0x19 (FPO: [Uses EBP] [1,0,4]) [(omitted)] a9b7cc78 818bde89 ac5435c0 a9b7cd1c 81af9101 nt!(omitted)+0x195 (FPO: [Non-Fpo]) (CONV: fastcall) [(omitted)] a9b7cccc 818b495d 00000002 7531db3b 80000032 nt!(omitted)+0x159 (FPO: [Non-Fpo]) (CONV: stdcall) [(omitted)] a9b7ccf8 81af9189 ffffff00 00000000 00000000 nt!(omitted)+0xad (FPO: [Non-Fpo]) (CONV: stdcall) [(omitted)] a9b7cd44 819b2097 00000000 003af8b8 003af8dc nt!(omitted)+0x89 (FPO: [Non-Fpo]) (CONV: stdcall) [(omitted)] a9b7cd44 77302090 00000000 003af8b8 003af8dc nt!(omitted) (FPO: [0,3] TrapFrame @ a9b7cd54) [(omitted)] 003af870 77300bea 76fae258 00000000 003af8b8 ntdll!(omitted) (FPO: [0,0,0]) [(omitted)] 003af874 76fae258 00000000 003af8b8 0ebd6750 ntdll!(omitted) +0xa (FPO: [2,0,0]) [(omitted)] 003af8dc 76fae1af ffffffff 00000000 003af8f8 KERNELBASE!SleepEx+0x98 (FPO: [SEH]) (CONV: stdcall) [(omitted)] 003af8ec 00ad2c38 ffffffff 003af93c 00ad5aff KERNELBASE!Sleep+0xf (FPO: [Non-Fpo]) (CONV: stdcall) [(omitted)] 003af8f8 00ad5aff 00000001 009e1ed8 009e1230 meason_test!main+0x38 (FPO: [Non-Fpo]) (CONV: cdecl) [(omitted)] 003af93c 77291154 00458000 744ffd4f 00000000 meason_test!__mainCRTStartup+0x107 (FPO: [Non-Fpo]) (CONV: cdecl) [(omitted)] 003af984 77291114 ffffffff 7731448a 00000000 ntdll(omitted)+0x3a (FPO: [SEH]) (CONV: stdcall) [(omitted)] 003af994 00000000 00ad5bc0 00458000 00000000 ntdll(omitted)+0x1b (FPO: [Non-Fpo]) (CONV: stdcall) [(omitted)]
(Truncated for brevity)
Summary
In this tutorial, we didn't get very deep into the bowels of the OS. However, we cracked open the door and took a peek. In the next tutorial, we will get a peek at what the kernel debugger looks like when it first breaks in.