PGES-Windows NT Debugging Blog Live Chat (August 13, 2008)
Chat Topic: PGES-Windows NT Debugging Blog Live Chat
Date: Wednesday, August 13, 2008
Please note: Portions of this transcript have been edited for clairty.
Daniel (Moderator):
Hello everyone-- thanks for coming to our chat on Platforms Global Escalation Services. Let's get started with our chat!
Before we begin, I'd like to have our Experts introduce themselves and then they'll get started answering your questions.
Introductions:
Smoke [Windows Core] (Expert):
Hi everyone, I'm an Escalation Engineer with the Window's Core team. I fix bugs for a living.
Matthew [MSFT EE] (Expert):
Hello, I am an Escalation Engineer with the Platforms Global Escalation Services (Windows Core) team.
East - MSFT EE (Expert):
I am East, an Escalation Engineer with the Microsoft Platforms Global Escalation Services. (Windows Core)
Todd Webb - Msft (Expert):
I am an Escalation Engineer with the Microsoft Platforms Global Escalation Services OEM hardware team...
David (Expert):
Hi, I'm an Escalation Engineer with Windows Core - reading code & debugging is my day-to-day.
stheller (Expert):
Hi, I'm a new Escalation Engineer with Platforms GES.
Mr Ninja [MSFT EE] (Expert):
Hi, I am an Escalation Engineer with Microsoft PGES. I debug Windows for a living.
Tate [MSFT EE] (Expert):
Hi, I'm one of the EE's on the Windows team.
Jeff Dailey MSFT EE (Expert):
Hi, my name is Jeff Dailey, I'm a Senior Escalation Engineer on the Microsoft Platforms Global Escalation Services team.
a-hstein (Expert):
Greetings and sorry for the late message. I am an intern in the GES group.
Start of chat:
Smoke [Windows Core] (Expert):**
Q:** How can I track memory allocations through MmAllocateContiguousMemory?
A: You could try poolhittag on MMCM or a breakpoint on MmAllocateContiguousMemory. If you go with the break point, you can use a conditional breakpoint and dump the stack and anything else, then 'go' the system. There will be a perf hit each time you break in.
Tate [MSFT EE] (Expert):
Q: For MmAllocatecontiguousMemory, will !poolused show the total amount used?
A: !poolused 2 will show MmCm
Matthew [MSFT EE] (Expert):
Q: What's the best way to go about troubleshooting pool corruption dumps.
A: Special Pool can be used to track down pool corruption problems. https://msdn.microsoft.com/en-us/library/cc265889.aspx
Mr Ninja [MSFT EE] (Expert):
Q: Could you explain the reasons why a memory dump analysis show an "illegal instruction" exception raised from a valid instruction?
A: There are many reasons this could happen. The instruction that was executed may not be what you see due to hardware problems such as a bit flip in the instruction when it was executed. It is also possible for a hardware problem caused an exception to be raised on a valid instruction. Sometimes software, or hardware, may trigger a jump to the middle of an instruction so that the instruction being executed is not what you think it is.I described a problem where we executed from the middle of an instruction in the blog https://blogs.msdn.com/ntdebugging/archive/2008/04/28/ntdebugging-puzzler-0x00000004-this-didn-t-puzzle-the-debug-ninja-how-about-you.aspx.
Smoke [Windows Core] (Expert):
Q: We use APC's to perform certain operations one of them is to have thread cleanup and exit.Is calling thread exit from an APC recommended ? This used to work fine, but with newer service packs we have threads exiting holding the heaplock!
A: This sounds like a bad idea. I would expect different ways that this could break (just like you have observed).
David (Expert):
Q: We use APC's to perform certain operations one of them is to have thread cleanup and exit.Is calling thread exit from an APC recommended ? This used to work fine, but with newer service packs we have threads exiting holding the heaplock!
A: Part of the problem is that if ExitThread is called, any pending APCs on that thread's queue are lost.
Matthew [MSFT EE] (Expert):
Q: This question is in reference to special pool mentioned already. Is this article essentially the same as the MSDN reference? https://support.microsoft.com/kb/188831/en-us
A: The KB article documents enabling special pool via the registry, rather than verifier. These are two different ways to accomplish the same thing. Enabling it via the registry is sometimes preferred, since verifier enables additional checks beyond special pool.
East - MSFT EE (Expert):
Q: We use APC's to perform certain operations one of them is to have thread cleanup and exit.Is calling thread exit from an APC recommended ? This used to work fine, but with newer service packs we have threads exiting holding the heaplock!
A: Would this help KB254956
- If not we would need to follow-up with you for more information
East - MSFT EE (Expert):
Are there anything additional you want on the blog that we have not done?
Jeff Dailey MSFT EE (Expert):
Q: The final version of the Windows Internals Exam should be available before December 2008. I'd like to thank all the community members that participated in the Beta. Your feedback was very valuable.
East - MSFT EE (Expert):
Q: Are there anything additional you want on the blog that we have not done?
Jeff Dailey MSFT EE (Expert):
Q: When is the next Windows Internals exam scheduled? I would like to plan ahead.
A: The final version of the Windows Internals Exam should be available before December 2008. I'd like to thank all the community members that participated in the Beta. Your feedback was very valuable.
Matthew [MSFT EE] (Expert):
Q: Will we get more puzzler on the blog?
A: We'd like to do more puzzlers, but unfortunately they tend to take a lot of time, so I cannot say for sure when/if we'll have more.
Matthew [MSFT EE] (Expert):
Q: How many of you in the audience are interested in more puzzlers on the ntdebugging blog?
Smoke [Windows Core] (Expert):
Q: Are you planning to write a book?
A: Windows Internals is a great reference book that we all rely upon. Additionally, you can check out: <https://www.amazon.com/Advanced-Debugging-Addison-Wesley-Microsoft-Technology/dp/0321374460>
Tate [MSFT EE] (Expert):
Q: As far as the blog is concerned I'm more a fan of the case studies type posts where you go through how you troubleshooted issues that you have enountered.
A: So are we!!!
Smoke [Windows Core] (Expert):
Q: I'm very interested in puzzlers...
A: Thanks for the feedback. We will try to create some more in the future.
Smoke [Windows Core] (Expert):
Q: Debugging MPI apps - sometimes a crash happens on remote and the local smpd daemon will terminate the process being debugged. Using the debugger, is there a way to guard from TerminateProcess from the child? I guess that would break some security models.
A: I'm not sure what MPI is, but this scenario sounds just like a service. The service control manager will kill the service if it doesn't respond in a timely fashion. With a service, there is a registry key to extend the timeout. If such a mechanism isn't available for you, you should consider instrumentation/logging.
East - MSFT EE (Expert):
Q: I just skimmed over KB254956, we found APC to work. The issue here is that there are alertable waits in library modules like LSA/NDR/I_RPC calls where our APC fires which raises an user exception, gets handled and exits-thread exits holding the heap lock.
A: We would need to discuss this further offline, how can I contact you?
Matthew [MSFT EE] (Expert):
Q: An award of puzzler like next edition of Windows Internals would definitely have my full attention. :)
A: We'll consider it... thanks for the feedback!
Jeff Dailey MSFT EE (Expert):
Q: Have you ever found yourselves with an "unsolvable" case? :P
A: No cases is unsolvable, nothing is truly random. Some cases may take a very long time to resolve through multiple debugging passes, detailed code review, reverse engineering and multiple iterations of instrumentation. In the end we find the problem.
Mr Ninja [MSFT EE] (Expert):
Q: Tri-boot machine - XP, Server 2003 and Server 2000 with 2000 being the last one installed. After awhile, I got an error: "Windows 2000 could not start because the following file is missing or corrupt: \WINDOWS\SYSTEM32\CONFIG\SYSTEMd startup options for"..
A: That is usually a known issue in Windows 2000 caused by the size of the system hive becoming too large. We have several KB articles that describe this issue KB269075, KB306038, KB323148, and KB277222 contain various resolutions you can try. I have found that most often the steps in KB277222, using scrubber in a shutdown script, resolve this problem. Starting with Windows 2003 we changed the boot architecture to prevent this problem, KB302594 describes this improvement.
Tate [MSFT EE] (Expert):
Q: Do you guys use USB debugging in Vista/2008? Why is that there is still one vendor that sells the debug dongle?
A: Serial debugging works well enough most times. Usually only if we have hardward that doesn't have a serial connection for some reason and only has USB or Firewire we try these alternates...
East - MSFT EE (Expert):
Q: I just skimmed over KB254956, we found APC to work. The issue here is that there are alertable waits in library modules like LSA/NDR/I_RPC calls where our APC fires which raises an user exception, gets handled and exits-thread exits holding the heap lock.
A: On a better note it would be best to open a case with Microsoft Support - > <https://support.microsoft.com/> -> Need more help? -> Select a Product to start
Jeff Dailey MSFT EE (Expert):
Q: What companies are in attendance today?
Graham (Expert):
Q: There are lots of post mortum debuggers available, Dr Watson, NTSD, windbg, userdump, WER. Which ones do you usually recrommend your customers to use if you need to be sure to capture a dump from a crash?
A: Userdump.exe is quite reliable for obtaining post-mortem dumps, and is easy to use. It (along with ADPlus, which uses CDB) are good because they attach to the process and monitor exceptions, and can create dumps for times when a JIT debugger would not be able to create a thread in the process to obtain the dump. Normally, I will set up drwtsn32 first, and if it cannot generate the dump, then I will go to userdump.
Smoke [Windows Core] (Expert):
Q: How can I debug cases in which just I have the Minidump for CPU Hog? I tried !runaway and does not works
A: The minidump alone may not be enough information. You could try to look at the stacks and guess at what is using the CPU, but that require familiarity with the application. You should capture a circular perfmon log with thread data. Then get 3-5 dumps of the app. From the perfmon log, you'll see what threads are active (and their activity profile). From the dumps, you'll have a few snapshot of the process in motion. Alternatively you could try a profiler like xperf.
David (Expert):
Q: Are there any free code coverage tools on Windows?
A: This article describes how to obtain code coverage datA:
David (Expert):
A:https://msdn.microsoft.com/en-us/library/ms182496.aspx
stheller (Expert):
https://www.microsoft.com/whdc/devtools/tools/prefast.mspxdiscusses the PREfast static source code analysis tool
East - MSFT EE (Expert):
Q: Are there any free code coverage tools on Windows?
A: Please keep watching our blog site for the next chat - <https://blogs.msdn.com/ntdebugging> or you can submit the question to the our blog site
Daniel (Moderator):
Well we're out of time for today's chat. Thank you very much to all of our guests who joined us today as well as to our Experts for answering so many great questions. Have a great day!