Watson's revenge

My dear readers, I have a terrible admission to make.  But
it's time to come clean with you.  The fact is, of our hundreds of thousands
of users, a small number encounter crashing bugs in Visual Studio.  They
are working happily along, and *boom* some terrible crash will occur.  Then,
a much-dreaded dialog will come up, saying:

“Microsoft Development Environment has encountered
a problem and needs to close. We
are sorry for the inconvenience.”

And then a little farther down it says…

“Please tell Microsoft about this problem.

We have created an error report that you can
send to help us improve Microsoft Development Environment. We
will treat this report as confidential and anonymous.

To see what data this error report contains, click
here
.”

Then you see three buttons: Debug, Send Error Report, and Don’t Send.

[It occurs to me that the lack of the word “The” in front of “Microsoft Development
Environment” above makes it look like English is our second language.  I’ll
have to bug somebody about that…]

Obviously, in a perfect world users would never see this dialog.  It
means that our users had VS crash, possibly losing data.  During
product development we treat any bug that could make VS crash or otherwise lose data
as extremely serious, and few such bugs make it into the shipping product.  But
it does, alas, happen.

So, what does this dialog do, and what does it all mean?  We
call this the “Watson” dialog (not to be confused with the Dr.
Watson
tool, which is related but different).  The
dialog exists so that when a user hits a crash, if they choose to hit the “Send Error
Report” button then a condensed stack dump, which we call a minidump,
gets sent to a server at Microsoft.  If
you’re curious or paranoid, click where it says “click here” and you can see exactly
what’s going to be sent to our server.

Once that happens, you can hopefully go on with your work.  On
our side, however, the work is just beginning.  We
have people who go through the reported crash dumps and then open bugs in the main
product bug databases so that our developers can look at the minidumps and try to
figure out why the product crashed.  This
is an extremely unpleasant job, because the information in the minidump is so scant,
and sometimes we can’t piece together the cause of the crash.  Often,
however, we can figure out what went from the callstack and make a fix.  If
a lot of people are hitting a crash in an already-shipped product then we consider
rolling the fix into a service pack.  If
the report is against a version under development (e.g. an alpha or beta release),
or if a very tiny number of people are hitting a crash in an already-shipped versoin,
we generally roll the fix into the version under development.

There is, of course, a certain amount of “cosmic justice” in making VS developers
suffer through the investigation of these Watson reports, since the pain of having
the product crash from under you is much worse.  And
it’s a great motivation to code carefully so as to avoid code paths that could lead
to crashes.  (Which we do already through
many mechanisms, but I figure every extra drop of motivation is a good thing.)

If you are ever unfortunate enough to see this dialog, please accept my apologies
in advance.  But if it does every happen,
I hope you now understand what it means and I hope you will hit the “Send Error Report”
button so that we at Microsoft can get your crash report and investigate it.

That’s all for now! -Chris

Comments

  • Anonymous
    July 24, 2003
    <quote>sometimes we can’t piece together the cause of the crash</quote>What about if you provided the option for the end user to be able to track the problem? Something similar to what happens when Windows crashes. That way, if the cause can't be determined, the end user could supply this information (either in a comment added to the problem, or by phone).
  • Anonymous
    July 24, 2003
    Good point -- actually we already do this, I just failed to mention it. Thanks. -Chris
  • Anonymous
    July 24, 2003
    The comment has been removed
  • Anonymous
    August 30, 2003
    Are you kidding me? I see this dialog between 5-10 time a day! Every time I close the environment (VS2003) I get this, and the next time I launch all my toolbars and prefs are trashed. On another machine, it works fine. Yes, I've sent the error report multiple times.
  • Anonymous
    August 31, 2003
    pUnk - can you post the bucket number for the crash you keep getting? It's in the event log, the application one - for each crash there will be two Error entries, the second one should have an 8(?) digit bucket number. It's possible I can track down whether this has been fixed yet if I have the bucket. I can also see if many people are hitting it, or just you :-) (not that that would be any consolation, I know) -- Dan [ms]
  • Anonymous
    September 04, 2003
    Thanks Dan! I haven't sent the error for a while, but the most recent bucket number is 49178620.
  • Anonymous
    November 03, 2003
    The comment has been removed
  • Anonymous
    November 03, 2003
    The comment has been removed
  • Anonymous
    November 03, 2003
    The comment has been removed
  • Anonymous
    November 17, 2003
    Artificial intelligence has not yet reached the point where crashes can be reliably diagnosed by a computer. The information above is insufficient to determine exactly what went wrong but I can tell that the proximate cause was a stack overflow. I suspect some plug-in has a recursion bug and is blowing its stack.
  • Anonymous
    January 05, 2004
    The comment has been removed
  • Anonymous
    June 17, 2009
    PingBack from http://pooltoysite.info/story.php?id=11736