The Case of the Crashed Phone Call
David Solomon, my coauthor for the Windows Internals books, was recently in the middle of an important VOIP call on Skype when the audio suddenly garbled. A second later the system blue screened. He called back after the reboot, but a half hour later the person on the other seemed to stop talking mid-word and the system crashed again. The conversation was essentially over anyway, and since he’d explained the first drop, Dave decided not to call back and formally end the call, but to investigate the cause of the crashes. He launched Windbg from the Debugging Tools for Windows package, selected Open Crash Dump from the File menu, and chose %Systemroot%\Memory.dmp.
He’d previously configured Windbg to use the Microsoft public symbol server by entering “srv*c:\symbols*https://msdl.microsoft.com/download/symbols” in the Windbg symbols configuration dialog, so Windbg knew how to interpret the crash dump file. When Windbg loads a crash dump file, it automatically executes a heuristics-based analysis engine that identifies the driver or system component most likely responsible for the crash. The analysis output pointed at the NETw4v64.sys device driver:
When you click on the “!analyze –v” hyperlink in the output, Windbg prints out some of the data it uses in its analysis. The analysis heuristics aren’t perfect, so Dave always clicks the link to look at the additional data, specifically the stack trace at the time of the crash and possibly memory locations associated with the crash. The stack trace records the nesting of function calls on the processor from which the kernel’s crash function, KeBugCheckEx, was called. In this case the stack looked like this:
You read the stack from bottom to top to follow the chronology of function calls. The trace shows that some code in NETw4v64 called the kernel’s (“nt”) KeAcquireSpinLockRaiseToDpc function. NETw4v64’s stack frame doesn’t have a text function name, which is expected for drivers that aren’t part of Windows and therefore don’t have symbols on the Microsoft symbol server. The next higher frame indicates that KeAcquireSpinLockRaiseToDpc called KiPageFault, most likely not directly, but as the result of a reference to a virtual memory address that wasn’t currently resident in physical memory. KiPageFault then called KeBugCheckEx with stop code A, which the extended analysis output describes as IRQL_NOT_LESS_OR_EQUAL:
Dave hypothesized that the NETw4v64 driver had called the kernel with a corrupted pointer that triggered the invalid memory reference. This particular crash might have been the result of random corruption, even by another driver, so he looked in the %Systemroot%\Minidump directory for the dump file for the first crash. On Windows Vista, the operating system he was running, the system always saves a kernel-memory dump to %Systemroot%\Memory.dmp, overwriting the previous dump, and archives an abbreviated form of the dump, called a minidump, to %Systemroot%\Minidump. He followed the same steps for the second dump and the analysis engine reported the exact same cause for the crash, down to the same corrupted memory pointer value.
Without performing a meticulous manual analysis of a dump, you can’t be certain that the driver the heuristics point at is the culprit, but the first rule of crash mitigation is to make sure you have the latest versions of any implicated drivers. Sometimes Windows Update has optional updates that don’t apply automatically, so Dave went to the %Systemroot%\System32\drivers directory to investigate the NETw4v64.sys file for clues as to what device it was for. The file properties dialog showed that it was version 11.5 of the “Intel Wireless WiFi Link Driver”:
Armed with the knowledge that it was an Intel wireless network driver, he opened Device Manager, expanded the Network Adapters node and found a device with a similar name:
He right-clicked and chose “Update Driver Software…” from the context menu to launch the driver update wizard, and told it to check Windows Update for a newer version. Unfortunately, it reported that he had the most current version installed:
Sometimes OEMs have drivers posted on their Web sites that they haven’t yet been made available to Windows Update, so Dave next went to Dell, the brand of his laptop, to check the version there. Again, the version he found posted was actually older than the one he had:
OEMs often get hardware vendors to create custom versions of hardware tuned for specific cost, power, capability or size requirements. The original hardware vendor will therefore not post drivers for an OEM-only device or post drivers that are generic and might not take advantage of OEM-specific features. It’s always worth checking, though, so Dave went to Intel’s site. To his chagrin, not only was there a newer version that installed and worked as expected, but the Intel driver was version 12.1, a major release number higher than the one Dell was hosting:
Intel also conveniently offered the driver in a “Drivers-Only” download that was a mere 7MB, one tenth the size of the package on Dell’s site that also includes value-add management software.
Dave couldn’t conclusively close the case because he couldn’t be sure that the Intel driver was the actual cause of the crashes, but the crashes haven’t reoccurred. Even if the Intel driver wasn’t the root cause, Dave was happy that he picked up a newer version that most likely had performance, reliability and maybe even power-management improvements. The case is a great example of simple dump analysis and the lesson that Windows Update and even an OEM’s site might not have the most up-to-date drivers. Hopefully, Dell will start leveraging Windows Update to provide its customers the latest drivers.
Comments
Anonymous
January 01, 2003
@barrkel In fact, this device was customized for Dell and has a Dell-specific hardware ID.Anonymous
January 01, 2003
Imagine my glee when I began reading this and quickly found that not only had I seen the same problem, but I had already used exactly the same method to solve it (as learned following previous posts on your blog). Thanks! We have a large number of machines in our organisation using this Intel wireless card and that v11.5.0.32 driver caused this (and other) problems quite frequently. We've been upgrading problematic machines to the v12 driver since early December and so far have not experienced any problems with the newer version.Anonymous
January 01, 2003
@bw I have the same laptop and went through the same upgrade process. You get directed to the same driver for both devices.Anonymous
January 01, 2003
Nitpick: you said that "OEMs often get hardware vendors to create custom versions of hardware". This is almost never true; not only would the economies of scale weigh against this, but versions of the hardware different enough to require custom drivers means that the OEM needs to fork the driver source, and take on the task of maintaining it, and merging upstream fixes from the hardware vendor. All the usual problems of forking apply (e.g. it's almost always the wrong idea to fork an open source project, you can find all the good reasons out there), but they apply triply to an OEM: not only is the OEM not an expert in the specifics of the hardware, they are usually not experts in creating software either. They're odds-on to do a third-rate job. Moral of the story: wherever possible, go back to the chipset manufacturer to get reference drivers for all your hardware, even in a new PC / laptop.Anonymous
January 05, 2009
And what issues does the latest driver cause ? Give it a week or two and you will find out !Anonymous
January 05, 2009
Nice work on Dave's part...Dell's never really been all that open to providing driver updates, sadly.Anonymous
January 05, 2009
Nice writeup Mark. I always enjoy your posts and this one was informative as ever.Anonymous
January 05, 2009
They seem to be very behind in updating their drivers for some systems. The Intel integrated graphics drivers for their OptiPlex 745 series are several revisions behind. I've sat on the phone with their tech support for many hours (over several calls) talking about the easily-reproducible problems with the version listed on their website (from February 2007--almost two years old!), but it's like talking to a wall--no one wants escalate the issue or even register it in their system because "no one else has reported this issue." Nice conundrum there, huh--they didn't register my problem in their tracking system because "it's the first time anyone reported this issue." So how on earth will it ever get registered? Someone has to be the first. It would be one thing if the situation were like in this article--where you could simply go to Intel's website and download the updated driver, then install it. I downloaded the Intel driver, but it performs a check on install and fails when it detects that it's a Dell system. The Windows Update driver (also updated several revisions beyond what Dell lists on their website) applies without a problem--but this can only be done manually. Security patches and other items can be applied automatically, but driver updates require manual intervention through Windows Update. It would be nice if you could go to the Windows Update catalog and download the driver and apply it manually or script the install, but that's not possible with the Intel driver... or at least not any way that I can tell (and I've tried many, many different ways).Anonymous
January 05, 2009
I avoid using Windows Update or OEM drivers for this reason. Driver direct from the manufacturer are always more stable, contain the latest features, and are minimalistic. INF and binaries only, please.Anonymous
January 05, 2009
So, uh, what's the difference between a "driver" and a "drivers-only driver"?!? According to Wikipedia, there doesn't appear to be any special meaning of the word "driver" for Windows that would explain this. (unlike, e.g. "virtual memory") And how can even a "drivers-only driver", presumably compressed for downloading, still be 7MB in size?!? What might be in it to make it so big?Anonymous
January 05, 2009
I have a similar problem, in that my Dell-supplied ATI graphics driver is somehow causing the Sysinternals tool procmon.exe to hang the machine. No resolution as yet. http://forum.sysinternals.com/forum_posts.asp?TID=15782Anonymous
January 05, 2009
To answer Karellen, most consumer focused are what are more accurately called driver packages. They include the driver, an installer, and usually supplemental software. As an aside, Windows Update will even want to downgrade drivers sometimes. This is the current situation with the nVidia graphics drivers. The latest ones on nVidia's site are bigger by version number and are WHQL signed. To my knowledge, this means it should show up as newer according to the Windows Driver selection rules as I understand them. There is a similar issue with the JMicron SATA controller driver. The Device IDs in its INF file aren't specific enough so left to the driver installation rules, Windows will install it to handle the JMicron PATA controller because the Device ID is more specific than the generic Windows one. I can't seem to convince JMicron of this issue though.Anonymous
January 05, 2009
This is another excellent example of what Mark and David has taught us before. I used the same technique yesterday to have narrowed down a problem and updated the Nvidia SATA performance driver on my home computer. When I was browsing photos from our holiday vacation using "Windows Live Photo Gallery", the machine froze. After waited for a few minutes, I used the "Right Ctrl-Scroll Lock-Scroll Lock" keyboard combination to manually cause the system to crash (a technique also learned from Mark). After reboot, I loaded up the dump file in WinDbg. Obvious, the call stack shows that the keyboard driver caused the crash. So I used "!thread" command to see what kernel threads were running at the time. Nothing looked suspicious. I typed in "~0" to switch to CPU 0 and "!thread" again to see what are running there. "nvstor32" was the one among other "NT" threads. I looked up the information of nvstor32.sys, it is Nvidia nForce Sata performance driver. The version number is 5.10.2600.0995. Similarly, driver update from Device Manager and Windows Update doesn't find new drivers. Asus website doesn't have updated driver either for the Nvidia chipsets on its motherboards. Actually, this is not the first time this problem occurred to me. But this time I went further by looking for the nForce driver from Nvidia website. I found a newly released set of Vista drivers for the nForce/GeForce chipset. The new nvstor32.sys version is 10.3.0.42, a super major release version number change. I installed the new driver pack and it works smoothly so far. Now I keep my fingers crossed to see if the problem will ever happen again.Anonymous
January 05, 2009
The comment has been removedAnonymous
January 05, 2009
@Phileosophos In the vast majority of the cases, having the drivers regularly get updated would solve the problem. Thus, users have Windows Update in Vista which does exactly that. Add that with Secunia PSI's software monitoring and it's not too hard to make sure one always has the latest version of all software and drivers.Anonymous
January 05, 2009
"one tenth the size of the package on Dell’s site that also includes value-add management software" By 'value-add management software', I would have said "buggy bloatware that is better off being uninstalled".Anonymous
January 05, 2009
I suspect it was the Intel driver after all. I have a Lenovo T61 with the same wireless hardware, and during the first few months I had 3 bug checks (mine happened during resume from sleep). For all 3 crash dumps, !analyze pointed to the same Intel driver, and it's been more than a year since I updated the suspected driver and haven't had a crash. Coincidence? Methinks not.Anonymous
January 06, 2009
The comment has been removedAnonymous
January 06, 2009
Glad this was a readily apparent driver update issue.. When I wrote drivers, and a customer reported a problem, we'd usually get the hardware they were running on as well.. I learned the hard way to always update the other drivers on the machine before trying to debug our driver.. Learned the hard way meant I spent 20 days tracking down a corrupted IRP in our code with no luck. Another driver was re-using the IRP after releasing it, causing the corruption. Hard to track that stuff down..Anonymous
January 06, 2009
The method works well as described, but I save myself time by visiting the device manufacturer's website first. On occasion I've found that the value-added stuff will cause issues with the driver/system. Going directly to the folks who designed the device is usually the best option. There are exceptions to this, but they are rare in my experience (and mostly involve the desire to retain one of the value-added features).Anonymous
January 06, 2009
If there are any hardware vendors out there, please please PLEASE listen to my suggestions, okay? Doing so would make me a customer for life and they're really simple things that wouldn't take much effort:
- A Microsoft-heavy one: get your drivers WHQL certified, publish driver updates on Windows Update and take part in the crash reporting process. Crash reporting is free bug testing, why not take advantage of it? I used to only buy hardware that was WHQL-certified out of principal; now that all of my systems are 64-bit it's required.
- Offer every driver as an installer for newbies and as a zipped up archive containing only the .inf and the drivers. Worse yet are companies that make funky installers that prevent you from doing even that. I'll never buy another Ralink-based product[] because their wireless drivers are in an installer that doesn't support being decompressed. If I can't grab the driver and integrate/slipstream it, your hardware is worthless to me. Cut that out. []: I recently found that a newer version of their RA61 allows you do either do a 'full' or 'driver only' install, which is an improvement, but they still don't support decompressing the executable, plus you need to download a 30MB installer just to get a 200KB .inf and .sys file. You hurt me once, Ralink, never again.
Anonymous
January 06, 2009
@Phileosophos: Nothing you say is actually wrong, but what exactly do you propose? An OS is always going to be at the mercy of the driver software, and even the relatively modest driver signing requirements in vista64 have met with widespread moans microsoft can't force manufacturers to write watertight code (though driver verification helps), they can't force manufacturers to put the latest drivers onto windows update, and they can't force users to only use a short list of ms-approved hardware. And nor should they. the more flexibility people have in terms of software, hardware, and drivers, the more chance there is of something going wrongAnonymous
January 06, 2009
...yeah, but are you going to give Dave Solomon a signed copy your new book for submitting this crash dump analysis?!!! Sorry, I couldn't resist! Many thanks again for a very useful blog. P.S.: Where do I send my own successful troubleshooting analysis files to?!Anonymous
January 06, 2009
The "value-add management software" term struck me. Wow. Must have thought long and hard to coin a term which will not offend anyone. :) GLAnonymous
January 06, 2009
I use SysPrep and imaging software to create hardware-independent hard drive images for the lab computers at the college where I work, so I've had a lot of experience with drivers. I echo the commenters who've written that OEM drivers are usually well out of date, and Windows Update tends to be even further behind. Whenever a new model comes in, I do the following to ensure the current image will work with it:
- Go to the OEM (Dell or Lenovo for my college) and use the serial or part number to see and write down all the posted OEM drivers (e.g., ATI Radeon X400 graphics driver A02; Broadcom Ethernet 5711 driver A04; etc., etc.)
- Download only the BIOS and audio driver from the OEM (because the integrated audio usually comes from a company that doesn't post their own drivers).
- Go to the individual web sites and download the latest .zip version of each driver. If only an .exe is provided, usually the manufacturers are nice enough to unpack the entire driver into a folder you can specify, or at least temporarily unpack it all in the Temp directory, where you can grab a copy of it. Within the uncompressed folder is usually a folder with only the minimum files needed to install the driver (the folder always has at least one .inf and a .cat file).
- Throw all these minimal drivers into a single folder, copy that into the sysprep folder on my generic image, reimage the computer and see if it installs on the new computer with all drivers correct in the Device Manager (which it does 95% of the time). If they're OK, I then throw the contents of this single folder into the general drivers folder that ultimately goes on the hardware-independent image. This method guarantees a) the newest, most bug-free drivers and b) only the drivers, and not the accompanying "value-added" management software that usually simply slows down the computer and distracts or gets in the way of users. I think the manufacturers have the best driver because they have the incentive; you can really see this with ATI and nVidia. They have to keep tweaking their drivers to fix small bugs and get maximum performance from their products. Dell and Lenovo only care if the products crash or there are major complaints, so they'll stop updating with the first really stable release. I'm sure someone has to pay Microsoft to post drivers on Windows Update, and if they aren't automatically installed, the vast majority of users will never know they're there. So what's the point? As to the issue of manufacturers creating OEM-branded versions of hardware (like this wireless card), there doesn't seem to be any functional difference in the actual hardware or software, only in the PnP number and in advertising. The only exception seems to be discrete laptop graphics cards; ATI and nVidia specifically don't post these drivers and point you to the OEM, which may indicate some differences (perhaps because of the architecture of each laptop?). However, even then I've found that the the manufacturer's desktop graphics driver will work with a little haranguing. Anyway, the moral is: if you want a working computer, always get the drivers from the manufacturer.
Anonymous
January 07, 2009
The comment has been removedAnonymous
January 07, 2009
According to the article the Wireless card was an Intel 4965AGN but Dave downloaded a driver for 3945AGN which is a different card! I hope it worked for him.Anonymous
January 07, 2009
Another diagnosis technique to use here is to turn on the driver verifier (windows+r, %systemroot%system32verifier.exe). Turn on verifier first for any devices where you suspect that the updates are not available via Windows Update, reboot. Make sure that crash dumps are enabled. This won't fix your problem but can help find and attribute problems (e.g. the improperly reused IRPs mentioned above) more quickly. Nothing's going to help find a random memory corrupter but at least this helps with the majority of the common driver defects authored.Anonymous
January 08, 2009
Excellent! i very much enjoy reading your blog. Also i watched one of your videos the other day and i was enthralled. (i didn't fall asleep) amazing. Very clear,logical and well presented and easy to understand. Hopefully you'll maintain your clear style and not get too bogged down in MicroSpeak (TM) now you have joined the Big Baddys. I've got so interested in troubleshooting now i downloaded MS debugging tools and now just to get XP to crash! MS gets a lot of bad press, but i am very very pleased with my windows XP (sp3), i throw a lot of (questionable) software at it and it handles most everything! brilliant! And what a gaming system! roll on windows 7Anonymous
January 08, 2009
The comment has been removedAnonymous
January 08, 2009
The comment has been removedAnonymous
January 08, 2009
The comment has been removedAnonymous
January 09, 2009
Nice to see Dave's skills in VMS crash analysis haven't gone to waste. :) mike (who worked in the VMS group with Dave many years ago and took one of his first NT Internals courses!)Anonymous
January 11, 2009
Great post Mark+Dave! To Dave who's a Windows internals subject matter expert, the problem is clearly caused by Intel WiFi drivers. To the average user experiencing the same problem, the cause is "darn Windows Vista". Microsoft really need to bring the hammer down on OEM's and hardware manufacturers who don't put their latest stable drivers on Windows Update.Anonymous
January 11, 2009
Been building custom machines for a long, long time. On a clean machine build (includes new builds, rebuilds of existing hardware and resurrection of older hardware) I always go to several places to get the most current hardware drivers:
- OEMs of motherboard, graphics card, and peripheral cards.
- OEMs of chipsets used on the motherboard, graphics and peripherals. I don't use "brand" name desktop machines, just laptops and do the same with them, plus obtain laptop OEM drivers if I wipe the laptop for a clean installation. I've noticed in the past that the issue of outdated hardware drivers isn't just Dell, all the major brands including HP and Gateway also suffer this problem. In addition, once a motherboard, graphics card or peripheral card is out of production, many motherboard (and graphics and peripheral) OEMs stop updating chipset drivers for them very shortly thereafter. The chipset OEM's drivers are usually the newest and the best, but not always. It's why I go after them all and sometimes evaluate board OEM versus chipset OEM.
Anonymous
January 12, 2009
The comment has been removedAnonymous
January 12, 2009
The comment has been removedAnonymous
January 12, 2009
The comment has been removedAnonymous
January 12, 2009
Nothing to do with the crashed phone call, but i just read some free chapters from your excellent windows internals book. On an idle system there should be no registy polling...so i ran procmon..guess what? That systray network connection icon is STILL polling the DHCP stuff in the registry (windows xp sp3). -- you complained about this back in APR 2005! i guess nobody was listening (or maybe there just is no other way?) I know i can stop it being displayed, but its handy.Anonymous
January 13, 2009
"The case is a great example of simple dump analysis" -- Too bad the average user would be forever frustrated in the dark about such a case. LOLAnonymous
January 13, 2009
"Not many Windows users have the technical know how to perform the diagnostic steps that Mark and David have been performing using their excellent sysinternals tools." And Debugging Tools for Windows as well. Yep, that is why there is Online Crash Analysis (OCA). It works by mapping a OCA bucket to a resolution, which can be a driver update, which the vendor must provide.Anonymous
January 14, 2009
So in this case i advise to turn off WiFi adapter if it don't use temporally. I always use on client's computers 5 steps:
- Run HelpCtr.exe (system info)
- Find PNP ID of devise needed drivers
- Use Google to find some like: 'VEN_1012&DEV_1982'
- Go to website of vendor.
- Download latest driver.
Anonymous
January 15, 2009
The comment has been removedAnonymous
January 15, 2009
The comment has been removedAnonymous
January 22, 2009
Just so happens my otherwise stable XP Dell X1 system (which hasn't BSOD for as long as I can remember) crashed last night using Skype in just the same way. The only problems I've ever had with it have been tracked down to the wireless drivers in the past. Its a very good bet that is the cause and shall also be updating the drivers today. Thanks, excellent article.Anonymous
January 22, 2009
I don't think "chagrin" means what the article's author thinks it means.Anonymous
January 28, 2009
mrnathan, That sentence makes perfect sense. He's annoyed that Dell didn't have the newest Intel drivers by a whole major revision. Maybe he should have been relieved or happy he found a solution, but he was annoyed he found Dell was so far out of date.Anonymous
January 28, 2009
The comment has been removedAnonymous
February 16, 2009
"Hopefully, Dell will start leveraging Windows Update to provide its customers the latest drivers." What is the point of this last sentence? The previous sentence stated that even Windows Update did not have the latest driver from Intel. What does it matter if Dell uses WU to deliver drivers, if the device manufacturer doesn't use WU themselves?Anonymous
June 02, 2009
I don't think Aaron knows what "chagrin" means either. Perhaps he should try using a dictionary. (You killed my language; prepare to die)Anonymous
July 06, 2009
The comment has been removedAnonymous
October 28, 2009
nice look. http://www.mobile-phone.pk