Reducing Support Costs with Windows Vista
Published: May 23, 2006
Windows Vista helps reduce support costs in the enterprise by providing built-in diagnostics that can detect, diagnose, and resolve common problem scenarios. Windows Vista detects potential error conditions, such as failing hard drives or excessive memory use, before the problem occurs. It diagnoses the root cause of these problems, and either resolves the problem automatically or walks the user through a resolution. The built-in diagnostics are designed for use in the enterprise: IT departments can configure them using Group Policy settings and retrieve detailed information through improved, standardized event logging. Remote Assistance is also improved in Windows Vista, which will help to reduce desk-side visits from support personnel when problems require expert help. Windows Error Reporting and Service Quality Management ensure continuous data collection and improvement of the operation system.
This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein.
Top of page
Support costs consume a significant portion of IT budgets—a Microsoft investigation recently identified troubleshooting as the single task that consumes the greatest percentage of IT administration resources, reporting the following:
While IT support incurs concrete costs, it can be harder to measure the total costs of failures on your business. For example, to minimize support costs, organizations are increasingly moving to a “wipe-and-load” approach, in which, the IT department simply re-images the user’s machine if a problem cannot be solved within a specified period of time. While the wipe-and-load approach is cost-effective from an IT perspective, it can be costly to other parts of the organization. Users experience downtime while their machines are being re-imaged, and then they can spend hours reconfiguring their machines, re-establishing custom settings and connections and reloading applications and tools.
The loss of data that occurs with failures is also expensive, although how expensive is challenging to compute. A 2003 Pepperdine University study concluded that, taking into account data recovery time combined with worker downtime annual data losses to PCs cost US businesses $18.2 billion. The average cost of each lost data incident was $3,957, and, the study’s author notes, with the rate of data growth at 80% per year, and business relying more and more on data, this cost is likely to increase. Hidden costs aren’t addressed in the Pepperdine study, but they exist: for example: Lost data might require businesses to re-gather information from customers or vendors, compromising their perceived credibility. Finally, the initial problem may occur again, since the root cause was never addressed.
Windows Vista is designed to reduce the number and severity of these kinds of problems. Built-in diagnostics can automatically detect and diagnose the most common types of issues, such as hard drive failures, networking problems, and resource exhaustion. Windows Vista detects some problems early, before they can cause data loss or other failures, and fixes these programmatically and silently. Users never have to know anything happened: their work is uninterrupted and no support expertise is required.
When Windows Vista can’t resolve problems silently, it makes it painless and easy for users to help themselves without having to turn to support professionals for assistance. A range of wizards, dialog boxes, and other friendly and highly accessible interface elements appear automatically to guide the users through the necessary steps. And when a support professional does have to get involved, the built-in diagnostics provide detailed information to help them solve problems more quickly, so they don’t have to resort to re-imaging the machine and losing configuration settings and data.
Top of page
The new diagnostic framework in Windows Vista—Windows Diagnostic Infrastructure, or WDI—provides built-in diagnostic capabilities that IT professionals can manage through Group Policy settings. IT professionals can choose to enable or disable each diagnostic scenario, according to what makes most sense in their organization. IT pros can enable only diagnostics, while hiding resolution from end users, to minimize interruption and distraction enterprise-wide.
WDI enables intelligent event tracing-verbose traces that occur only while the diagnostic is active, which reduces the performance overhead of diagnosis All diagnostics scenarios raise events to the event log, which can be consumed by enterprise management solutions like Microsoft Operations Manager and the newly rewritten Event Viewer in Windows Vista. IT pros can use this information to get an overall sense of the health of the systems in the organization, for example, to monitor how often specific types of failures occur and adjust networking strategies or hardware deployment accordingly.
Finally, Windows Error Reporting (WER) ensures that Microsoft can provide solutions to customers as soon as they experience a failure or problem that’s already been identified. With WER, IT professionals can choose to use to provide feedback to Microsoft, comes with an improved Problem Reports and Solutions control panel in Windows Vista. WER ensures continuous data collection and improvement of the operating system.
Top of page
Improved Remote Assistance
Originally included with Microsoft Windows XP, Remote Assistance reduces demands on support professionals’ time by allowing users to turn to experts on their own teams for support, and by allowing support professionals, when they do have to get involved, to resolve problems by viewing and controlling a remote computer's desktop across the network, rather than making an expensive desk-side visit. Windows Vista Remote Assistance is faster and uses less bandwidth than the previous version, and can function through Network Address Translation (NAT). Remote Assistance in Windows Vista is also scriptable and supports logging, so that it can be used in environments where IT access must be recorded for regulatory reasons.
Top of page
How Windows Vista diagnoses problems
Performance tuning, boot-time issues, connectivity debugging, and hardware- or driver-related problems are all the kind of issues that are particularly difficult to diagnose and repair. Finding the root cause of these problems can present a real challenge for IT professionals. Sometimes the root cause will show up in the event log, and in other cases, simply updating a binary or changing a setting fixes the problem for a particular system at a particular time. But the only way to definitively identify the root cause of a specific problem is to observe exactly what happened, and that takes time. IT professionals have to repeatedly reproduce the problem, hook up debuggers to systems, get expert support, install checked binaries, and so on.
Windows Vista contains diagnostic scenarios—collections of instrumentation, troubleshooting, and resolution logic—that make it possible to solve these problems more simply through intelligent tracing and programmatic problem detection, analysis, and resolution. For example, one scenario might measure the performance of a common shell operation, and another might monitor a hard disk for a failure status.
WDI uses several types of instrumentation, performing event-based diagnosis, on-demand diagnosis, or counter-based diagnosis, depending on the type of problem it’s addressing.
Event-based diagnosis adds diagnostic instrumentation to existing components without noticeably changing their core behavior or design. Diagnostics-enabled components are “fire-and-forget”, that is, once the diagnostic event is raised, the component with the problem can carry on its normal functioning. The events are logged to WDI and diagnosed outside of the mainline code path. The two types of event-based diagnoses are start/stop scenarios and simple scenarios. In a start/stop scenario, a failure-prone code path is instrumented to log the details of its execution. The detailed context event-tracing is enabled only while the code path is executing—during normal operation, the detailed events are disabled to avoid affecting performance. The start/stop scenario events turn on detailed context event-tracing only while the bracketed code path is executing, minimizing the tracing performance hit. A simple scenario addresses a failure that can be detected and characterized at a single point in code, in which a single scenario event is raised to WDI to start diagnosis and resolution.
With on-demand diagnosis, an application can ask for diagnosis and receive notification when diagnosis is complete, adjusting as necessary based on the results. For example, if an application is running into network errors, it can ask WDI to start diagnosis and wait until the problem is resolved before it continues its activity. This avoids asking the user to resolve networking problems that don’t affect the user’s applications. For example, Microsoft Office Outlook 2003 switches from RPC over TCP/IP to RPC over HTTP without ever bothering the user about it.
For event driven scenarios, the diagnostic sequence goes like this: When a potential problem is detected, an event is raised to the event logging infrastructure. The event logging infrastructure delivers the event to the diagnostic infrastructure, which then determines whether to launch a full-fledged diagnostic scenario instance or whether the problem can be silently ignored. For example, if the system resumes from standby in less the elapsed time than the slow resume performance threshold, no further diagnosis is required. For on-demand scenarios, the application requesting diagnosis performs problem detection itself, and bypasses problem detection when it requests diagnosis directly.
If it’s necessary to launch a full-fledged diagnostic scenario through problem detection or the on-demand mechanism, the Diagnostic Policy Service (DPS) comes into play, handing the scenario in sequence to various diagnostic modules for problem detection, troubleshooting, and resolution. These diagnostic modules encode domain-specific knowledge, such as how to find the root cause of a performance problem. The DPS may invoke a troubleshooter, which is an executable that uses the event data and system state to try to identify the root cause of the problem.
After the troubleshooter identifies the root cause, it invokes a resolver, which is an executable that contains one or more actions to address a particular root cause. If it can, the resolver takes care of the problem invisibly, but again in some cases it might interact with the user, generating user-interface elements to resolve a particular problem. For example, a resolver might ask the user, “Can I back up the contents of a failing hard disk?”, and then do so when the user clicks “Yes.” In some cases, the resolver might present the user with a Wizard that walks them through a solution step-by-step and explains what actions they are taking. Or the user may be led to a specific area within Windows like the Networking control panel, if Windows diagnoses a network adapter problem, for more information. In on-demand scenarios, the application requesting diagnosis can act as an interactive resolver, so that the results of diagnosis are presented within the context of the application.
Troubleshooters can also indicate whether the resolution should be launched immediately, or whether the resolution should be queued to the Problem Reports and Solutions control panel. The Problem Reports and Solutions control panel is a place where varied resolutions are available, from crashes, to hangs, to performance problems, or any other non-critical diagnostic result. Users can see the resolution in the Problem Reports and Solutions control panel if the problem does come up and they can decide if they want to solve it.
Finally, as a diagnostic scenario instance progresses, troubleshooters and resolvers log information to the event log, leaving a record of the steps they take. This information is invaluable for support engineers investigating problems on a system or assessing and understanding diagnostic activity over time.
For example, if instrumentation detected a potential disk failure and launched a diagnostic scenario instance, the following steps would occur.
The disk diagnostic resolver provides an example of the kind of experience that users might have when the system detects a problem that can’t be solved silently. When a problem is detected, the resolution launches immediately for all users logged on to the system, and subsequently at every user logon until there are no failing disks on the system.
When the resolution launches, it follows Group Policy settings, and adjusts its behavior according to the type of user who is logged on. If the domain administrator has disabled resolution for the disk diagnostic scenario in Group Policy settings, the resolver exits silently without presenting users any user interface elements. If the scenario is enabled, the resolver behaves differently depending on whether the user is an administrator or not. It warns non-administrative users of the possible disk failure, and displays a customizable message advising them to seek help from an administrator.
The resolver provides a richer experience to administrative users, who have the necessary privileges to run a backup. First, it helps the user print or save repair and recovery instructions, including text that can be customized in Group Policy to include helpdesk contact information. Then, it helps the user run backup, using either Windows built-in backup or a third-party backup solution. Finally, it helps the user shut down the system so that the drive can be repaired or replaced.
If resolution is disabled by Group Policy, administrators can still monitor the event log for events raised when disks fail so that they can take action (such as scheduling a technician visit after-hours, when it won’t interrupt the user).
Top of page
Diagnostic Scenarios supported in Windows Vista
Windows Vista supports a number of these diagnostic scenarios out of the box, including the most common: networking problems, impending hard drive failures, and potential resource exhaustion conditions. Additional diagnostic modules will be added over time.
Top of page
In the past, when users experienced network-related problems, they had to rely on many different tools to help them troubleshoot the problem. Each tool takes different parameters and generates different types of output, meaning that users typically have to develop some level of expertise with tools, networking technologies, and troubleshooting methodologies to be successful or resort to frequently calling technical support. This level of complexity also necessarily increases support costs when users call their help desks, Microsoft Support Services, or other vendor’s support services. Subsequently, IT departments are forced to develop and maintain tools for their customers to use.
The Network Diagnostic Framework (NDF) is the network troubleshooter in the Windows Vista diagnostics infrastructure. It is designed to address top network-related issues, such as file-sharing problems, Web site access problems, trouble connecting to wireless networks, third-party firewalls, or networks not working after new hardware is installed. Using NDF, users can perform basic repair operations on network interfaces, configuration, and components of their Windows Vista PC. Repair tasks such as correcting wireless settings, or repairing the Winsock catalog, are presented to users in the context of the problem they are having.
NDF organizes and coordinates smaller network troubleshooting units called helper classes, which can be plugged in as necessary. Each network component within the operating system can have a helper class to determine its health and to troubleshoot and resolve problems. NDF interacts with the component helper classes to analyze the root cause of the problem that the user is experiencing and to help resolve the problem or provide actionable guidance to the user to resolve the issue. The logged events allow IT pros to see the output of each of these diagnostic helper classes in one place. The results from each of the individual helper classes are presented to the user in a unified way to provide a simple list of possible resolutions to the problem.
Top of page
Resource exhaustion prevention
Modern operating systems run many applications, services, and other processes simultaneously; each process sharing limited resources like memory, processor time, disk I/O, and network bandwidth. One application that consumes too much of a shared resource can slow down the entire operating system or even cause it to fail.
Windows Vista diagnostics detect when the system commit limit is approaching its maximum capacity and alerts users to the situation. If one or more applications are causing the resource limitation, Windows Vista warns users that an application might experience problems, lists the top consumers of commit charge to help the users identify the source of the problem, and instructs the users to save their data and close the largest applications. This proactive approach will reduce user frustration and data loss and improve user productivity. It will reduce support calls, too, because users will be able to identify and resolve the problem without needing an IT professional to show them how to analyze resource usage with Task Manager or the Windows Reliability and Performance Monitor.
Top of page
Application failures, operating system faults, and Stop errors are often caused by failing memory. Failing memory chips return different data than the data originally stored by the operating system. Failing memory can be difficult to identify, because problems can be intermittent—for example, a memory chip might function perfectly when it is tested in a controlled environment but begin to fail when it is used in a hot computer. Failing memory can also cause secondary problems, such as corrupted files. Often, administrators take drastic steps to repair the problem, such as reinstalling applications or the operating system, only to have the failures persist.
Windows Vista includes Windows Memory Diagnostics to help administrators track down problems with unreliable memory. Previously, this technology was only available as a download and required installing the tool on a bootable floppy disk. In Windows Vista, if Microsoft Online Crash Analysis (MOCA) determines that an error may be caused by failing memory, the software can prompt the user to perform memory diagnostics without requiring an additional download or separate boot disk. Results are presented at the next reboot and explain the problem clearly so that the user can fix the root cause of the problem.
Top of page
Windows disk diagnostics
Disk reliability problems can vary in severity. Minor problems can cause seemingly random application failures. For example, if a user connects a new camera and the operating system fails to load the driver, it’s possible that disk corruption caused the problem. More severe problems can result in the total loss of data stored on the hard disk.
Windows Vista can eliminate much of the impact of a disk failure by detecting disk problems in advance, before total failure occurs. Hard disks often show warning signs before failure, but earlier Windows operating systems did not act on these warning signs. Windows Vista listens for evidence that a hard disk is beginning to fail and warns the user or the support center of the problem so that IT can back up the data and replace the hard disk before the problem becomes an emergency.
Most new hard disks include Self-Monitoring Analysis and Reporting Technology (S.M.A.R.T.) to monitor the health of the disk. If Windows Vista detects impending disk failure through S.M.A.R.T., Windows Vista can launch disk diagnostics to guide the user or IT professionals through the process of backing up the data and replacing the disk before total failure occurs. Windows Vista can also detect application crashes caused by a dirty or scratched CD or DVD, and instruct the user to clean the media.
Windows Vista also includes reliability improvements to the NTFS file system. Specifically, if Windows Vista detects corrupted metadata on the file system, it invokes NTFS’s self-healing capabilities to rebuild the metadata. Some data may still be lost, but Windows Vista can limit the damage and repair the problem without taking the entire system offline for a lengthy check and repair cycle.
Windows also includes diagnostics to detect application crashes caused by damaged system files on disks. If an application attempts to access a system file that is irretrievable because of a bad block – an uncorrectable read error on the disk – the application may crash. Windows detects these crashes, and silently repairs the damaged system file from a backup copy. This diagnostic turns repeat crashes into one-time crashes with silent recovery.
Top of page
Windows Vista can automatically detect problems related to slow performance and add an event to the event log that describes the condition and provides information about the root cause. Administrators can use this information to troubleshoot long boot times, unresponsive user interfaces, slow standby and resume times, and long shutdown problems on a case-by-case basis or aggregate the event log data by using a tool like MOM to analyze performance for the entire enterprise. Windows Vista provides end users with specific information about potential causes of performances problems it diagnoses in clear, user-oriented language, and provides Wizard-style UI the user can use to fix problems with the click of a button whenever possible.
For seasoned professionals and administrators, Windows also includes the Windows Reliability and Performance Monitor Snap-In, for analyzing real-time performance data. It improves on the Windows XP Performance Monitor, or Perfmon, in many ways:
Top of page
Disk failure diagnostics reduce the number of startup problems related to failed disks, but startup problems can still occur because of corrupt system files or faulty driver installations. These are some of the most difficult problems to troubleshoot, because an administrator cannot start the operating system and use the built-in troubleshooting tools. Often, administrators choose to reinstall the operating system rather than attempt to solve the problem—even though the solution might be as simple as replacing a single file.
Windows Vista includes the Startup Repair tool to automatically fix many common problems and to help end-users and IT professionals quickly diagnose and repair more complex startup problems. When a boot failure is detected, the system fails over into Startup Repair. Once started, Startup Repair performs diagnostics, including analyzing startup log files, to determine the cause of the startup failure. After Startup Repair determines the cause of the failure, it attempts to fix the problem automatically. The entire process requires little to no user input.
Problems Startup Repair can automatically repair include:
After the operating system has been repaired, Windows Vista notifies the user of the repairs and provides logging so that IT professionals can determine exactly which steps Startup Repair performed. Startup Repair also includes tools to assist IT professionals with manually troubleshooting startup problems.
Consider the common scenario of a traveling user with a mobile computer that fails during the startup process. With Windows Vista, the user would have this experience:
Skilled IT professionals can also launch the Windows Recovery Environment (RE) to manually solve problems without additional tools. The recovery menu in Windows RE provides direct access to Windows Memory Diagnostics, the file system repair tool (Chkdsk), a basic command prompt, tools for restoring files from a backup, and additional recovery tools that might be provided by the computer hardware manufacturer. For example, a support engineer troubleshooting a computer that won’t start could have this experience:
For both end users and IT professionals, Windows RE provides a pleasant experience with efficient troubleshooting. Not all startup problems can be automatically repaired, however. In these circumstances, Windows RE launches additional diagnostic tools and provides troubleshooting Windows RE uses disk diagnostics to further troubleshoot the problem.
Top of page
Implementing Diagnostics in the Enterprise
In Windows Vista, Group Policy settings give IT professionals full control over built-in diagnostics, and all built-in diagnostics take advantage of improved event logging in Windows Vista to provide full information to help support professionals respond to and track failure conditions. IT professionals can disable scenarios, add enterprise-specific contact information for follow up, or even provide a completely different resolution from the in-box version by triggering enterprise-specific resolutions off of an event logged by a built-in troubleshooter.
The improved event logging service in Windows Vista is more comprehensive and requires events to meet a high quality bar, ensuring that they are meaningful, actionable, and well-documented. All diagnostics scenarios generate events to the event log that provide information about what symptoms were found and what — if any — actions were taken. For example, if a resource exhaustion condition was detected, the event indicates which applications were using the most resources.
If the problem can’t be solved through built-in diagnostics, the new Windows Vista Eventing 6.0 system makes it easier to find and use events to diagnose issues. The system provides APIs for event logging and tracing, as well as event consuming APIs that support event filtering, subscriptions and notifications, consistent event formatting and rendering, log maintenance and archiving, and collecting events from other computers. The event and trace logs provide a record of diagnostics and symptoms to help IT pros resolve problems more quickly using command line utilities and the new Event Viewer.
Finally, Windows Vista features a new Task Scheduler service that tightly integrates with the new Event Viewer and greatly expands on the Scheduled Tasks tool in previous versions of Windows. Windows Vista Task Scheduler provides controlled, unattended management of task execution, launched either in response to events or system state changes or on schedule.
IT professionals can now configure machines to automatically react to potential system problems, including intermittent, hard-to-reproduce failures. They can also set up more complex and demanding tasks to run in sequence or in response to multiple triggers and condition changes. A task can notify an IT professional of a problem on a desktop by e-mail, and it can launch a diagnostic program or even an automated resolution.
Finally, enterprise management solutions like Microsoft Operations Manager can consume the events triggered by Windows diagnostics to provide an aggregate view of problems occurring in the enterprise. For example, IT pros can use resource exhaustion detection and resolution diagnostics to identify those applications that may be using system resources inefficiently.
Top of page
Remote Assistance for enterprise environments
Using Remote Assistance, support engineers can help users with common problems, such as configuring Outlook to work with a corporate server, upgrading drivers to go with new hardware, or changing network settings to support file sharing over the corporate network, without having to physically touch a user’s computer. Although Remote Assistance was introduced with Windows XP, it has not been widely adopted for use in enterprise environments, largely because of its uneven performance and the fact that in Windows XP it does not log sessions.
In Windows Vista, Remote Assistance is optimized for use in the enterprise environment, with an eye toward driving down the occurrence of desk-side visits. Instead of running inside Help and Support Center, as it did in Windows XP, Remote Assistance in Windows Vista is a stand-alone program that provides markedly faster startup and connectivity as well as a command-line scripting interface. It uses less network bandwidth and can transverse Network Address Translators (NATs).
Remote Assistance in Windows Vista also includes session logging (on both the helper’s and the user’s computers). The time-stamped log is XML-formatted to be easy to integrate into other data sets, and IT pros can choose to disable it using Group Policy settings. A separate log file is generated for each session.
Top of page
Data collection for continual improvement
Windows Vista is engineered to support continual improvement beyond the in-box diagnostics. With Windows Error Reporting (WER), which shipped in Windows XP, users can report errors directly to Microsoft over the Internet, providing technical information that programming groups at Microsoft use to enhance future versions of the product. IT professionals can configure Group Policy settings to use Corporate Error Reporting to collect and report to Microsoft only important errors.
The new Problem Reports and Solutions control panel in Windows Vista takes WER to the next level, allowing users to view events that have occurred on their computers, track reports to resolution, manage responses from Microsoft, and act on these responses to prevent future issues. The console allows users to drill down on individual reports, export reports to file for analysis by an expert, and check for updated resolutions for reports sent up in the past.
IT professionals can also configure Group Policy Setting to enable the Windows Customer Experience Improvement Program (CEIP). With this new Windows Vista program, you can choose to help Microsoft learn about how you use Windows programs and about some of the problems you encounter. Microsoft uses this information to improve the products and features you use most often and to help solve problems. Participation in the program is strictly voluntary, and the end results are software improvements in future releases.
Top of page
With Windows Vista, IT departments will face fewer support issues, and will experience easier diagnosis and repair when problems do occur. Users they support will benefit from less downtime, fewer IT hassles, and greater productivity.
Windows Vista can self-diagnose a number of common problems, including failing hard disks, memory problems, and networking issues. Windows Disk Diagnostics detect impending disk failures and guide users through data backup, disk replacement, and data restoration. Windows Memory Diagnostics work with Microsoft Online Crash Analysis to detect crashes possibly caused by failing memory, prompting the user to schedule a memory test the next time the computer is restarted, and providing guided support. In Active Directory domains, administrators can configure Built-in Diagnostics using Group Policy settings.
Even in closely managed enterprise environments, it's common for mobile users to go weeks or months without a backup. Data loss caused by unexpected disk failure can be disastrous, and a user might spend weeks recreating work. Because Windows Vista can proactively detect impending failure, IT departments can perform a full backup, replace the hard disk, and restore every byte of the user's data before the failure occurs. This proactive repair can potentially take place overnight or over a weekend, virtually eliminating end-user impact.
Built-in Diagnostics provide information to IT professionals to solve those problems that can't be resolved automatically. Whenever Windows Vista detects a potential problem, it raises an event. IT professionals can use these events to monitor the behavior of desktops in their organization, or to troubleshoot problems that cannot be resolved automatically.
The Remote Assistance tool provides better performance, connectivity, and reliability than the current tool. It is enhanced to provide IT professionals with key diagnostic tools at their fingertips. Through Remote Assistance, support professionals can use information provided by built-in diagnostics, referring to scenario data that was automatically collected when the problem first occurred and during subsequent problem detection on the user’s computer to help diagnose and troubleshoot computer problems.
Finally, Windows Error Reporting and the Windows Customer Experience Improvement Program make it possible for Microsoft to provide solutions to customers as soon as they experience a failure or problem that’s already been identified. These tools also ensure that Microsoft can improve future service packs and releases of Windows Vista in response to customer experience and feedback.