Problems with CLR Windows Error Reporting (WER) Integration
Like I mentioned in my previous article, CLR integrates WER in order to be able to add managed-specific information in the reports generated on crashes (if you are not familiar with Windows Error Reporting, read more in the article Windows Error Reporting (WER) for developers).
The CLR WER integration brings some problems, most of them due to the large matrix of OSs that must be supported. WER was introduced with Windows XP, so for older OSs, CLR must use Dr. Watson. Also, the support in Vista and Windows Server 2008 was enhanced, so there are differences on how CLR leverages WER on these platforms compared to Windows XP and Windows Server 2003. Because of the large matrix, there are some corner cases that are not treated as well as they should.
Other problem with WER integration stems from the report parameters. Like I mentioned in Windows Error Reporting and CLR integration article, CLR uses the method desc and the IL offset inside the method to identify the faulting instruction (for native applications, one of the parameters is the hash of the current stack). This can introduce issues if we wrap the code that deals with the error in one function and multiple functions call that function. For example, we have a function DealWithUnexpectedConditions that calls Environment.FailFast to terminate the process fast. Then any function that meets an unexpected condition, simply calls DealWithUnexpectedConditions.
class StaticHelper
{
public static void DealWithUnexpectedConditions(string message)
{
Environment.FailFast(message);
}
…
}
class Runner
{
internal void CheckCurrentProcessHealth()
{
if (!CurrentProcess.IsHealthy) // user defined methods
{
StaticHelper.DealWithUnexpectedConditions("The current process is not healthy");
}
}
internal void CheckEnvironmentHealth()
{
if (!CurrentEnvironment.IsHealthy) // user defined methods
{
StaticHelper.DealWithUnexpectedConditions("The environment has some unexpected properties");
}
}
…
}
Because CLR doesn’t look at the entire call stack, it will treat a failure due to current process being unhealthy the same way as a failure due to unhealthy environment. The 2 failures will go into the same bucket, so the developer won’t see 2 different issues. What we want to see is one more parameter that will ensure the categorization is done based on the hash of the current stack, which will take into consideration the callers as well as the current instruction that generated the failure.
Another issue with CLR WER integration is that for each change in WER, CLR must update its code. A very useful WER feature is collecting user-mode dumps. Starting with Windows Server 2008 and Windows Vista with Service Pack 1 (SP1), Windows Error Reporting can be configured so that full user-mode dumps are collected and stored locally after a user-mode application crashes. This is done by changing some registry values under HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps key. So, if you configure these values and a native application crashes, a dump will be generated irrespective of whether Windows Error Reporting is enabled or not. However, no dump is generated in the specified folder when a managed application crashes. Why? The changes in OS happened too late for CLR to be able to update the functionality for current versions; based on their priorities, the feature didn't meet the bar to be introduced as a DCR. .NET Framework 4.0 will not have this functionality on OSs lower that Windows 7. The behavior will probably be added in the version after .NET Framework 4.0.
In future versions, CLR will leverage the OS more to generate WER reports. This will ensure more consistency between the managed and native behavior and will make CLR WER integration less error prone.