The Case of the Random IE Crash
While I long for the day when I no longer experience the effects of buggy software, there’s something rewarding about solving my own troubleshooting cases. In the process, I often come up with new techniques to add to my bag of tricks and to share with you in my “Case of the Unexplained…” presentations and blog posts. The other day I successfully closed an especially interesting case that opened when Internet Explorer (IE) crashed as I was reading a web page:
Whenever I experience a crash, whether it’s the system or an application, I always take a look at it. There’s no guarantee, but many times after spending just a few minutes I find clues that point at an add-on as the cause and ultimately a fix or workaround. In most cases when it’s an application crash, the faulty process is obvious and I simply launch Windbg (from the free Debugging Tools for Windows package that comes with the Windows SDK and Windows DDK), attach it to the process, and start investigating.
Sometimes however, the faulting process isn’t obvious, like was the case when I saw the IE crash dialog. That’s because I was running IE8, which has a multi-process model where different tabs are hosted in different processes:
I had multiple tabs open as usual, so I had to figure out which IE process of the four that were running (in addition to the parent broker instance) was the one that had crashed. I could have taken the brute-force approach of attaching to each process in turn and searching for the faulting thread, but there’s fortunately a simpler and more direct way to identify the target process.
When a process crashes, the Windows Error Reporting (WER) service launches its own process, called WerFault, in the session of the crashed process to display the error dialog to the user running the session and to generate a crash dump file. So that WerFault knows which process is the one that crashed, the WER service passes the process ID (PID) of the target on WerFault’s command line. You can easily view the command line with Process Explorer. Because I always have Process Explorer running with its icon visible in the tray area of the taskbar, I clicked on the icon to open it and found the WER process in the process tree:
I double-clicked on it to open the process properties dialog and the command line revealed the process ID of the problematic IE process:
Now that I knew it was process 4440 in which I was interested, I started Windbg, pressed F6 to open the process selection dialog, and double-clicked on Iexplore.exe process 4440. With Windbg attached, my next step was to locate the thread that had faulted so that I could examine its stack for signs of a buggy add-on. In some cases, relying on Windbg’s built-in crash analysis heuristics, which you can trigger with the !analyze command, will do the job for you, but it didn’t this time. Finding the faulting thread is fairly straightforward, though.
First, go to Windbg’s View menu and open both the Processes and Threads and the Call Stack dialogs, arranging them side by side. The goal is to find the thread that has functions with the words fault, exception, or unhandled in their names. You can quickly do this by selecting each thread in the Processes and Threads window, pressing Enter, and then scanning the stack that appears in the Call Stack window. After doing this for the first few threads, I came across the thread I was looking for, revealed by functions all over its stack containing the telltale strings:
Unfortunately, I was at an apparent dead end as far as fingering an add-on: all the DLLs shown in the call stack were Microsoft’s. There was one indicator that there might be an add-on hidden from view though, and that was the text reporting that Windbg couldn’t find symbols for at least some of the stack’s frames, so was forced to make guesses about the stack’s layout and was showing an address that didn’t lie within any DLL:
This happens when a DLL uses frame pointer omitted (FPO) calling conventions, which in the absence of symbolic information for the DLL prevents the debugger from finding stack frames just by following the frame-pointer chain. The return addresses for the functions the thread invoked must be on the stack (unless they were overwritten by the bug that caused the crash), but Windbg’s heuristics couldn’t locate them.
There’s a Windbg command that you can use in these cases to hunt for the missing frame function addresses, the Display Words and Symbols command. If you’re debugging a 32-bit process, use the dds version of the command and if it’s a 64-bit process use dqs. You can also use dps (Display Pointer Symbols), which will interpret the function addresses as the appropriate size for a 32-bit or 64-bit process. The address to give to the command as the starting point should be the address of the stack frame immediately above the one where Windbg got lost. To see the address, click on the Addrs button in the call stack dialog:
The address on the frame in question was 2cbc5c8:
I passed it to dds as the argument and pressed enter:
The first page of results didn’t list any functions besides the expected one, KiUserException. I hit the enter key again without typing another command, because for address-based commands like dds, that tells Windbg to repeat the last the last command at the address where it left off. The second page of results yielded something more interesting, the name of a DLL I wasn’t familiar with:
An easy way to see version information for a module without leaving Windbg is to use the lm (List Modules) command. The output of that command told me that Yt.dll (the name of the DLL is the text to the left of the “!”) was part of the Yahoo Toolbar:
This came as a surprise because the system on which the crash occurred was my home gaming system, a computer that I’d only had for a few weeks. The only software I generally install on my gaming systems are Microsoft Office and games. I don’t use browser toolbars and if I did, would obviously use the one from Bing, not Yahoo’s. Further, the date on the DLL showed that it was almost two years old. I’m pretty diligent about looking for opt-out checkboxes on software installers, so the likely explanation was that the toolbar had come onto my system piggybacking on the installation of one of the several video-card stress testing and temperature profiling tools I used while overclocking the system. I find the practice of forcing users to opt-out annoying and not giving them a choice even more so, so was pretty annoyed at this point. A quick trip to the Control Panel and a few minutes later and my system was free from the undesired and out-of-date toolbar.
Using a couple of handy troubleshooting techniques, within less than five minutes I had identified the probable cause of the crash I experienced, made my system more reliable, and probably even improved its performance. Case closed.
Comments
Anonymous
January 01, 2003
Jeff Dowling: Try contacting product support, you may have to pay, but they will help debug the problem and if it is indeed really MS's fault, they may create a hotfix.Anonymous
January 01, 2003
The comment has been removedAnonymous
January 01, 2003
@Remus Good point. I've updated the text to note that 'dps' will do the right thing.Anonymous
June 01, 2010
This was very interesting. But to me it just highlights the much bigger problem - what would my 70 year-old mother have done? "Brian, I got this 'toast' that says something about the program not responding." She calls anything that pops up 'toast' but she would have NO idea what to do here. How can we, as an industry, help her and the growing number of folks like her who just want to use the 'dang puter' to buy something online or get email from her grandkids?Anonymous
June 01, 2010
Mark, thanks for sharing! Next step: interview Carol Bartz.Anonymous
June 01, 2010
The comment has been removedAnonymous
June 01, 2010
Good info Mark! Have you looked over the tools you installed to see if there was an obvious culprit? If you still have the install packages, you could probably just rip them open and look for the Yahoo cab files. I'm sure even the Yahoo folks would like to know who's distributing a 2 year old version of their toolbar with these kinds of bugs in it...Anonymous
June 01, 2010
The comment has been removedAnonymous
June 01, 2010
Once a again a great read with some great tips! I just don't how the home user is expected to get remotely close to solving a problem like that. The vast majority of people would probably just re-install windows wasting a valuable amount of time. Its a shame that even though it is a third party dll IE just crashes with no useful information whatsoever. I long for the day when Windows apps crash they give me some useful information to work with.Anonymous
June 01, 2010
Your technique is very interesting, but it's very hard to reach this level of knowledge! Congrats!Anonymous
June 01, 2010
You could have just looked in Add/Remove programs for anything browser related :) But thanks for this tip. I wasn't familiar with the FPO issue and the dds/dqs commands.Anonymous
June 01, 2010
Fascinating article. But you have to ask why after all these years doesn't microsoft write code that identifies the cause of the problem to a typical computer user in language that they could understand, and offer to uninstall the Yahoo Toolbar for them? This would be a valuable addition to the operating system, and is the type of thing that Microsoft should be doing if they want to see Windows retain its relevance for the typical home user. I'm sure Mark could whip off something like this in a week or two. We need to implement the intelligence and programming skills of Mark in the operating system, so the typical user doesn't require Mark's skills to keep their home installation of Windows running.Anonymous
June 01, 2010
Another awesome entry! Thank you for sharing your knowledge.Anonymous
June 01, 2010
Thanks for this fantastic article, Mark! I could keep reading and reading and reading. And you always make it sound so simple. :-)Anonymous
June 01, 2010
@Samuel Svarc > Now, besides reading this blog, where can I find this type of troubleshooting information? Check out the videos that Mark and David Solomon did a few years back, known as the SysInternals Video Library. They are presented by Mark and Dave and cover Server 2003 and Windows XP analysis and troubleshooting. In my opinion, these videos are the next best thing to taking an in-person class from Mark and/or Dave: www.solsem.com/videolibrary.htmlAnonymous
June 01, 2010
Another great post. Thanks, Mark! I have to chime in with some of the other commenters here though. How is an "ordinary user" supposed to resolve this problem? While I have all of the Sysinternals tools and Windbg installed on my boxes, I'm quite certain that my neighbor doesn't. He also wouldn't have the patience or knowledge to step through a troubleshooting procedure like this. He'd probably just curse Microsoft for making shoddy software (undeserved in this case) and then install Firefox or Chrome. In today's world of stealth installers and add-ons, how can we get Windows to report errors more clearly? That "IE has stopped working" message should say: "Not my fault! Might be the fault of add-on Yahoo! Toolbar." (insert your own joke about NotMyFault.sys here :) ) At least that would point someone in a general troubleshooting direction.Anonymous
June 01, 2010
You can use 'dps' which dumps 'pointer-size'. msdn.microsoft.com/.../ff540455%28v=VS.85%29.aspxAnonymous
June 01, 2010
The comment has been removedAnonymous
June 02, 2010
To all the people who say "why doesn't Microsoft just...": first of all, correctly diagnosing these problems without prior knowledge is very hard to automate. The general problem of figuring out who exactly is to blame for a crash is practically unsolvable -- modules can cause problems that won't show up for a long time and then in somebody else's code. It takes a good deal of sleuthing to identify the real culprit in such cases -- Mark's example here was fairly trivial, the only hurdle being the FPO. Misblaming someone on an automated basis would be a costly mistake. The best you could hope for is an application compatibility database that would say "this version of the Yahoo Toolbar is known to cause instability in IE8, warn the user about that". There already is such a compatibility database, actually (you may have seen it in action in the early days of Vista and Win7), but obviously keeping it up to date is lots of work. I have no idea what Microsoft's policies on it are; if I were Microsoft I'd have a team assigned to dissecting crash reports sent by WER not just for problems in my code but especially for problems in third-party code -- they probably have such a team. Even then you'd want to get permission from Yahoo to warn about incompatibilities with Yahoo's stuff -- otherwise they're opening themselves up to defamation and monopoly lawsuits. In other words -- this stuff costs a lot of money, money Microsoft is probably already investing if they're smart, but a system like that will never be perfect.Anonymous
June 02, 2010
Nice Job! - Reading your articles makes so much fun!Anonymous
June 02, 2010
I have to agree with the others, even advanced users would struggle to figure this one out. Surely a lot of what Mark did could be automated: (1) find which thread caused the exception (2) get its callstack (3) resolve symbols, as much as is possible, automatically via Microsoft's symbol server (4) If an address doesn't lie within any DLL, find the missing frame functions by doing what dds/dqs does automatically (5) show any non-MS modules as suspicious, and as much of a callstack as possible. So why can't an advanced dialog in WER do this?Anonymous
June 02, 2010
@Jeroen, that's simply not true. Everything Mark described in this post could have been automated. Don't let perfect be the enemy of good.Anonymous
June 02, 2010
Mark, you take our attention to one of major tradeoff on current complex OS. More, maybe overlapped, functions loosly controllable by average users versus more rigid but efficient approach. I guise we need to Add profile capabilities to OS (instead make Windows more closed) that CUT off all unneededAnonymous
June 02, 2010
The comment has been removedAnonymous
June 02, 2010
"...I find the practice of forcing users to opt-out annoying and not giving them a choice even more so." Please have a word or two with your colleagues on the Windows Live team. Unless the user opts-out, the Windows Live Essentials "all-in-one installer" will install WL Messenger, WLMail, WLToolbar, WLWriter, WLPhoto Gallery, WLMovie Maker, Silverlight, (and if Outlook is installed) Outlook Connector, and Office Live Add-in by default. -- ~Robear Dyer MS MVP-IE, Mail, Security, Windows Client - since 2002Anonymous
June 02, 2010
"...I find the practice of forcing users to opt-out annoying and not giving them a choice even more so." Please have a word or two with your colleagues on the Windows Live team. Unless the user opts-out, the Windows Live Essentials "all-in-one installer" will install WL Messenger, WLMail, WLToolbar, WLWriter, WLPhoto Gallery, WLMovie Maker, Silverlight, (and if Outlook is installed) Outlook Connector, and Office Live Add-in by default.Anonymous
June 02, 2010
@Tilly: Windows is only targeted because it's a large well known corporation and platform. I wouldn't want to be a bank robber and rob a convenient store with only $50 in the register when i can rob the bank right next door and not be caught either way (in the best case scenario!) that same thought goes for hackers and programmers. We don't go about doing our job as them because we want to annoy users. We do it because we enjoy it. We do NOT however always intend the software to be "buggy" and users not reporting it in detail DOES NOT HELP US to fix the problem for the new update on it. Hackers and malicious programmers write their code to do what they need to and get what information they need from the target. They don't worry if it doesnt work on a few computers due to some minor glitch they didn't see about. They most likely only tested their application on their own virtual network or something under one, maybe two, operating systems and was done with it and started attacking people. Some cases not even the test takes place.Anonymous
June 02, 2010
The comment has been removedAnonymous
June 02, 2010
This is an amazing article!! As always, many thanks Mark!! @ Brian -> For sure you will find interesting to read one of the gems of MSPress: Windows Internals 5th edition, Mark Russinovich & David Solomon with Alex IonescuAnonymous
June 03, 2010
Robear: This is a completely different case. Mark is referring to an installer for App X that installs, along the way, unrelated App Y, usually for commercial gain. The Windows Live Installer is an installer for the Windows Live tools - it's very upfront about what it's doing - it's doing what you expect, installing Windows Live tools. Would I prefer it if all checkboxes were unchecked by default? Probably. But there's a big difference between an unsatisfactory default configuration, like your Windows Live example or Microsoft Office, which automatically opts into installing Word and Excel and many other tools, to an overclocking app that sneaks in a completely unrelated tool.Anonymous
June 07, 2010
Post by post your'e cases are being more interesting.Anonymous
June 07, 2010
@Jeroen Mostert , " if I were Microsoft I'd have a team assigned to dissecting crash reports sent by WER not just for problems in my code but especially for problems in third-party code -- they probably have such a team" You're right, they do have such a team. It's possible that this issue hasn't occurred before, or if it had, it was still under investigation (as you can imagine, there's a lot more bad code than hours to investigate them). And @Jon, a lot of this is likely automated, but not presented to the user. For example, the Watson that is sent to MS obviously has the callstack for the relevant process. No one has to figure out which process to look at. The FPO issue is trickier. Mark knew he was looking for an addin, but an automated tool wouldn't necessarily know to look for an addin. And even if it used a heurstic to favor non MS DLL locations, there could be multiple of them on the stack. I agree that dumping callstacks for the faulting process is useful, and even running things like dds for the user would be a plus. But I'd stop short of making a diagnosis unless the callstack matched a known issue.Anonymous
June 07, 2010
The official SUN Java JRE install has an opt out checkbox for Yahoo toolbar that is very easy to miss. It is also forced upon the user during Java updates, so you have to constantly be looking out for it. It's possible a third party automated this install procedure which accepted defaults because it needed the jvm to do it's thing.Anonymous
June 07, 2010
If their were any justice you would be able to bill yahoo for your time. These drive-by toolbars are no less malware.Anonymous
June 07, 2010
So let me get this straight, IE called a function and didn't trap the error? This is the yahoo toolbar fautl? More like sloppy code from MS. Competent coders handle errors.
- burnt out on years of MS
Anonymous
June 07, 2010
The comment has been removedAnonymous
June 08, 2010
Great post, Mark, and very well documented with good use of screenshots. I don't have your level of knowledge, but maybe I will try to emulate your case to learn something. Of course this is very hard to automate for the OS. Mark's comment about having to opt out of installing unwanted software makes much more sense. To be honest, I like apt-get very much, although de decentralized way of installing programs for Windows has its benefits, like freedom :-)Anonymous
June 08, 2010
Yahoo! Toolbar is one of the most persistent apps that seem to want to take over a users browser even when you uninstall it, it still seems to find a way back onto the system. Yahoo! seriously needs to stop!Anonymous
June 08, 2010
MSFT owns this IE8 trap. 100% reproducible for me on dictionary.com. BUGCHECK_STR: APPLICATION_FAULT_STATUS_BREAKPOINT_NULL_POINTER_READ FAULTING_IP: mshtml!CMarkup::DetachElemCtxStream+64 712481f6 8b07 mov eax,dword ptr [edi] ntdll!KiUserExceptionDispatcher+0xf mshtml!CMarkup::DetachElemCtxStream+0x64 mshtml!CAPProcessor::Evaluate+0x21d mshtml!CDoc::SubmitForAntiPhishProcessing+0x1c4 mshtml!CMarkup::OnLoadStatusDone+0x1da mshtml!CMarkup::OnLoadStatus+0x47 mshtml!CProgSink::DoUpdate+0x549 mshtml!CProgSink::OnMethodCall+0x12 mshtml!GlobalWndOnMethodCall+0xff mshtml!GlobalWndProc+0x10c Image name: mshtml.dll Timestamp: Tue Feb 23 02:49:51 2010 (4B83889F) CompanyName: Microsoft Corporation FileVersion: 8.00.7600.16535 (win7_gdr.100222-1515)Anonymous
June 09, 2010
In the case of a grandmother having this problem clicking the reset button in the advanced tabof Internet Options would have removed all toolbars and extensions and reset any other thing that might have cause this.Anonymous
June 09, 2010
@Brian: "This was very interesting. But to me it just highlights the much bigger problem - what would my 70 year-old mother have done? " She would have done what every other 70-year old mother does: Drop down into the kernel debugger, and start trolling through callstacks... ;-pAnonymous
June 09, 2010
The comment has been removedAnonymous
June 10, 2010
It might be nice for an average person who is not a computer expert could understand, and implement this.Anonymous
June 10, 2010
Now if only werfault.exe was passed the faulting thread id too...Anonymous
June 10, 2010
@Mark What games do you play? (just curious)Anonymous
June 11, 2010
Good informational article - love your software (even though GTA IV wont load with it running shakes fist at sky) i would have looked at ie's 'manage add-ons' window etc before doing this though ;)Anonymous
June 14, 2010
The comment has been removedAnonymous
June 17, 2010
@Use a real OS Read Raymond Chen's blog. You can't always just "trap the error" and carry on. Before the error actually caused some protection fault, it could have trashed stack, program state, etc. This is part of the reason that IE has the multi-process model, it can limit the damage to a single process rather than the entire browser. I could just as easily write a FireFox extension with a similar bug that corrupted stack, etc. To prevent this you either rely on process boundaries or have a browser running plugins in a verifiable way akin to the way that SQL Server can host .NET CLR-based DLLs with great safety. As always, there's history with these things and if Microsoft just switched models overnight there'd be other people claiming how Microsoft is throwing its weight around, etc.Anonymous
June 29, 2010
The comment has been removedAnonymous
July 03, 2010
Mark, Very interesting to see under the hood when things crash. I'm going to try some of these techniques when i have something bite the bit dust on me. @Aunt Tilly. An Apple won't solve the problem. Apple's famous error message is an error occurred of a type unkown. Click on OK to restart.Anonymous
July 05, 2010
Mark, would it be possible to add a print mode to your blog? I really like to print out your posts for offline reading. Before the Blogs platform update, I was able to get a printable version through RSS, but since the update, RSS only gives me a small summary. It would be great if the print mode would show the comments in one long list again as well. There are always some interesting discussions in the comments, but how they are split over multiple pages now is really printer-unfriendly.Anonymous
July 17, 2010
Great work !! Really interesting approach.Anonymous
July 25, 2010
ie8 crashes when installing yahootoolbar WITH its plugin of yahoomail being default mail for IE. install yahoo toolbar without that plugin !Anonymous
July 27, 2010
When in doubt> reset IE or try another browser.Anonymous
August 10, 2010
The comment has been removedAnonymous
September 05, 2010
The comment has been removedAnonymous
September 08, 2010
The comment has been removedAnonymous
September 22, 2010
The comment has been removedAnonymous
January 08, 2011
>> Hey, I'm one of those people who can't understand a word your (sic) saying Lori, maybe you shouldn't be coming to a site called 'technet' if you have a problem understanding technical issues. Get a mac or something. >> Mark, would it be possible to add a print mode to your blog. I really like to print out your posts for offline reading HAHA! That's a good one. Do you print out spreadsheets and work on them too? LULZAnonymous
January 14, 2011
Awesome! Seems Holmes explaining to Watson. Hats off to your train of thoughtsAnonymous
March 10, 2011
The comment has been removedAnonymous
March 10, 2011
@Karl v. : Take a look at RevoUninstaller Pro