Azure Diagnostics troubleshooting

This article describes troubleshooting information that's relevant to using Azure Diagnostics. For more information about Diagnostics, see Azure Diagnostics overview.

Logical components

The components are:

  • Diagnostics Plug-in Launcher (DiagnosticsPluginLauncher.exe): Launches the Diagnostics extension. It serves as the entry-point process.
  • Diagnostics Plug-in (DiagnosticsPlugin.exe): Configures, launches, and manages the lifetime of the monitoring agent. It's the main process that's launched by the launcher.
  • Monitoring Agent (MonAgent*.exe processes): Monitors, collects, and transfers the diagnostics data.

Log/artifact paths

The following paths lead to some important logs and artifacts. We refer to this information throughout this article.

Azure Cloud Services

Artifact Path
Azure Diagnostics configuration file %SystemDrive%\Packages\Plugins\Microsoft.Azure.Diagnostics.PaaSDiagnostics<version>\Config.txt
Log files C:\Logs\Plugins\Microsoft.Azure.Diagnostics.PaaSDiagnostics<version>\
Local store for diagnostics data C:\Resources\Directory<CloudServiceDeploymentID>.<RoleName>.DiagnosticStore\WAD0107\Tables
Monitoring agent configuration file C:\Resources\Directory<CloudServiceDeploymentID>.<RoleName>.DiagnosticStore\WAD0107\Configuration\MaConfig.xml
Azure Diagnostics extension package %SystemDrive%\Packages\Plugins\Microsoft.Azure.Diagnostics.PaaSDiagnostics<version>
Log collection utility path %SystemDrive%\Packages\GuestAgent\
MonAgentHost log file C:\Resources\Directory<CloudServiceDeploymentID>.<RoleName>.DiagnosticStore\WAD0107\Configuration\MonAgentHost.<seq_num>.log

Virtual machines

Artifact Path
Azure Diagnostics configuration file C:\Packages\Plugins\Microsoft.Azure.Diagnostics.IaaSDiagnostics<version>\RuntimeSettings
Log files C:\WindowsAzure\Logs\Plugins\Microsoft.Azure.Diagnostics.IaaSDiagnostics<DiagnosticsVersion>\
Local store for diagnostics data C:\WindowsAzure\Logs\Plugins\Microsoft.Azure.Diagnostics.IaaSDiagnostics<DiagnosticsVersion>\WAD0107\Tables
Monitoring agent configuration file C:\WindowsAzure\Logs\Plugins\Microsoft.Azure.Diagnostics.IaaSDiagnostics<DiagnosticsVersion>\WAD0107\Configuration\MaConfig.xml
Status file C:\Packages\Plugins\Microsoft.Azure.Diagnostics.IaaSDiagnostics<version>\Status
Azure Diagnostics extension package C:\Packages\Plugins\Microsoft.Azure.Diagnostics.IaaSDiagnostics<DiagnosticsVersion>
Log collection utility path C:\WindowsAzure\Logs\WaAppAgent.log
MonAgentHost log file C:\WindowsAzure\Logs\Plugins\Microsoft.Azure.Diagnostics.IaaSDiagnostics<DiagnosticsVersion>\WAD0107\Configuration\MonAgentHost.<seq_num>.log

Metric data doesn't appear in the Azure portal

Diagnostics provides metric data that can be displayed in the Azure portal. If you have problems seeing the data in the portal, check the WADMetrics\* table in the Diagnostics storage account to see if the corresponding metric records are there and ensure that the resource provider Microsoft.Insights is registered.

Here, the PartitionKey of the table is the resource ID, virtual machine, or virtual machine scale set. RowKey is the metric name. It's also known as the performance counter name.

If the resource ID is incorrect, check Diagnostics Configuration > Metrics > ResourceId to see if the resource ID is set correctly.

If there's no data for the specific metric, check Diagnostics Configuration > PerformanceCounter to see if the metric (performance counter) is included. We enable the following counters by default:

  • \Processor(_Total)% Processor Time
  • \Memory\Available Bytes
  • \ASP.NET Applications(Total)\Requests/Sec
  • \ASP.NET Applications(Total)\Errors Total/Sec
  • \ASP.NET\Requests Queued
  • \ASP.NET\Requests Rejected
  • \Processor(w3wp)% Processor Time
  • \Process(w3wp)\Private Bytes
  • \Process(WaIISHost)% Processor Time
  • \Process(WaIISHost)\Private Bytes
  • \Process(WaWorkerHost)% Processor Time
  • \Process(WaWorkerHost)\Private Bytes
  • \Memory\Page Faults/sec
  • .NET CLR Memory(Global)% Time in GC
  • \LogicalDisk(C:)\Disk Write Bytes/sec
  • \LogicalDisk(C:)\Disk Read Bytes/sec
  • \LogicalDisk(D:)\Disk Write Bytes/sec
  • \LogicalDisk(D:)\Disk Read Bytes/sec

If the configuration is set correctly but you still can't see the metric data, use the following guidelines to help you troubleshoot.

Azure Diagnostics doesn't start

For information about why Diagnostics failed to start, see the DiagnosticsPluginLauncher.log and DiagnosticsPlugin.log files in the log files location that was provided earlier.

If these logs indicate Monitoring Agent not reporting success after launch, it means there was a failure launching MonAgentHost.exe. Look at the logs in the location that's indicated for the MonAgentHost log file in the previous "Virtual machines" section.

The last line of the log files contains the exit code.

DiagnosticsPluginLauncher.exe Information: 0 : [4/16/2016 6:24:15 AM] DiagnosticPlugin exited with code 0

If you find a negative exit code, see the exit code table in the References section.

Diagnostics data isn't logged to Azure Storage

Determine if none of the data appears or if some of the data appears.

Diagnostics infrastructure logs

Diagnostics logs all errors in the Diagnostics infrastructure logs. Make sure you've enabled the capture of Diagnostics infrastructure logs in your configuration. Then you can quickly look for any relevant errors that appear in the DiagnosticInfrastructureLogsTable table in your configured storage account.

No data appears

The most common reason that event data doesn't appear at all is that the storage account information is defined incorrectly.

Solution: Correct your Diagnostics configuration and reinstall Diagnostics.

If the storage account is configured correctly, remote access into the machine and verify that DiagnosticsPlugin.exe and MonAgentCore.exe are running. If they aren't running, follow the steps in Azure Diagnostics doesn't start.

If the processes are running, go to Is data getting captured locally? and follow the instructions there.

If there's still a problem, try to:

  1. Uninstall the agent.
  2. Remove the directory C:\WindowsAzure\Logs\Plugins\Microsoft.Azure.Diagnostics.IaaSDiagnostics.
  3. Install the agent again.

Part of the data is missing

If you're getting some data but not all, it means that the data collection/transfer pipeline is set correctly. Follow the subsections here to narrow down the issue.

Is the collection configured?

The Diagnostics configuration contains instructions for a particular type of data to be collected. Review your configuration to verify that you're only looking for data that you've configured for the collection.

Is the host generating data?

  • Performance counters: Open perfmon and check the counter.
  • Trace logs: Remote access into the VM and add a TextWriterTraceListener to the app's config file. To set up the text listener, see Create and initialize trace listeners. Make sure the <trace> element has <trace autoflush="true">. If you don't see trace logs being generated, see the section "More about missing trace logs."
  • Event Tracing for Windows (ETW) traces: Remote access into the VM and install the PerfView tool. In PerfView, run File > User Command > Listen etwprovder1 > etwprovider2, and so on. The Listen command is case sensitive, and there can't be spaces between the comma-separated list of ETW providers. If the command fails to run, select Log at the bottom right of the PerfView tool to see what attempted to run and what the result was. Assuming the input is correct, a new window opens. In a few seconds, you'll see ETW traces.
  • Event logs: Remote access into the VM. Open Event Viewer and make sure that the events exist.

Is data getting captured locally?

Next, make sure the data is getting captured locally. The data is locally stored in *.tsf files in the local store for diagnostics data. Different kinds of logs get collected in different .tsf files. The names are similar to the table names in Azure Storage.

For example, performance counters get collected in PerformanceCountersTable.tsf. Event logs get collected in WindowsEventLogsTable.tsf. Use the instructions in the Local log extraction section to open the local collection files and verify that you see them getting collected on disk.

If you don't see logs getting collected locally, and have already verified that the host is generating data, you likely have a configuration issue. Review your configuration carefully.

Also, review the configuration that was generated for MonitoringAgent MaConfig.xml. Verify that there's a section that describes the relevant log source. Then verify that it isn't lost in translation between the Diagnostics configuration and the monitoring agent configuration.

Is data getting transferred?

If you've verified that the data is getting captured locally but you still don't see it in your storage account, follow these steps:

  • Verify that you've provided a correct storage account and that you haven't rolled over keys for the given storage account. For Azure Cloud Services, sometimes users don't update useDevelopmentStorage=true.
  • Verify that the provided storage account is correct. Make sure you don't have network restrictions that prevent the components from reaching public storage endpoints. One way to do that is to remote access into the machine and try to write something to the same storage account yourself.
  • Finally, you can look at what failures are being reported by the monitoring agent. The monitoring agent writes its logs in maeventtable.tsf, which is located in the local store for diagnostics data. Follow the instructions in the Local log extraction section to open this file. Then try to determine if there are errors that indicate failures reading to local files writing to storage.

Capture and archive logs

If you're thinking about contacting support, the first thing they might ask you is to collect logs from your machine. You can save time by doing that yourself. Run the CollectGuestLogs.exe utility at the Log collection utility path. It generates a .zip file with all relevant Azure logs in the same folder.

Diagnostics data tables not found

The tables in Azure Storage that hold ETW events are named by using the following code:

        if (String.IsNullOrEmpty(eventDestination)) {
            if (e == "DefaultEvents")
                tableName = "WADDefault" + MD5(provider);
            else
                tableName = "WADEvent" + MD5(provider) + eventId;
        }
        else
            tableName = "WAD" + eventDestination;

Here's an example:

        <EtwEventSourceProviderConfiguration provider="prov1">
          <Event id="1" />
          <Event id="2" eventDestination="dest1" />
          <DefaultEvents />
        </EtwEventSourceProviderConfiguration>
        <EtwEventSourceProviderConfiguration provider="prov2">
          <DefaultEvents eventDestination="dest2" />
        </EtwEventSourceProviderConfiguration>
"EtwEventSourceProviderConfiguration": [
    {
        "provider": "prov1",
        "Event": [
            {
                "id": 1
            },
            {
                "id": 2,
                "eventDestination": "dest1"
            }
        ],
        "DefaultEvents": {
            "eventDestination": "DefaultEventDestination",
            "sinks": ""
        }
    },
    {
        "provider": "prov2",
        "DefaultEvents": {
            "eventDestination": "dest2"
        }
    }
]

This code generates four tables:

Event Table name
provider="prov1" <Event id="1" /> WADEvent+MD5("prov1")+"1"
provider="prov1" <Event id="2" eventDestination="dest1" /> WADdest1
provider="prov1" <DefaultEvents /> WADDefault+MD5("prov1")
provider="prov2" <DefaultEvents eventDestination="dest2" /> WADdest2

References

Check out the following references

Check Diagnostics extension configuration

The easiest way to check your extension configuration is to go to Azure Resource Explorer. Then go to the virtual machine or cloud service where the Diagnostics extension (IaaSDiagnostics / PaaDiagnostics) is.

Alternatively, remote desktop into the machine and look at the Diagnostics configuration file that's described in the Log artifacts path section.

In either case, search for Microsoft.Azure.Diagnostics and the xmlCfg or WadCfg field.

If you're searching on a virtual machine and the WadCfg field is present, it means the config is in JSON format. If the xmlCfg field is present, it means the config is in XML and is base64 encoded. You need to decode it to see the XML that was loaded by Diagnostics.

For the cloud service role, if you pick the configuration from disk, the data is base64 encoded. You'll need to decode it to see the XML that was loaded by Diagnostics.

Azure Diagnostics plug-in exit codes

The plug-in returns the following exit codes:

Exit code Description
0 Success.
-1 Generic error.
-2 Unable to load the rcf file.

This internal error should only happen if the guest agent plug-in launcher is manually invoked incorrectly on the VM.

-3 Can't load the Diagnostics configuration file.

Solution: Caused by a configuration file not passing schema validation. The solution is to provide a configuration file that complies with the schema.

-4 Another instance of the monitoring agent Diagnostics is already using the local resource directory.

Solution: Specify a different value for LocalResourceDirectory.

-6 The guest agent plug-in launcher attempted to launch Diagnostics with an invalid command line.

This internal error should only happen if the guest agent plug-in launcher is manually invoked incorrectly on the VM.

-10 The Diagnostics plug-in exited with an unhandled exception.
-11 The guest agent was unable to create the process responsible for launching and monitoring the monitoring agent.

Solution: Verify that sufficient system resources are available to launch new processes.

-101 Invalid arguments when calling the Diagnostics plug-in.

This internal error should only happen if the guest agent plug-in launcher is manually invoked incorrectly on the VM.

-102 The plug-in process is unable to initialize itself.

Solution: Verify that sufficient system resources are available to launch new processes.

-103 The plug-in process is unable to initialize itself. Specifically, it's unable to create the logger object.

Solution: Verify that sufficient system resources are available to launch new processes.

-104 Unable to load the rcf file provided by the guest agent.

This internal error should only happen if the guest agent plug-in launcher is manually invoked incorrectly on the VM.

-105 The Diagnostics plug-in can't open the Diagnostics configuration file.

This internal error should only happen if the Diagnostics plug-in is manually invoked incorrectly on the VM.

-106 Can't read the Diagnostics configuration file.

Caused by a configuration file not passing schema validation.

Solution: Provide a configuration file that complies with the schema. For more information, see Check Diagnostics extension configuration.

-107 The resource directory pass to the monitoring agent is invalid.

This internal error should only happen if the monitoring agent is manually invoked incorrectly on the VM.

-108 Unable to convert the Diagnostics configuration file into the monitoring agent configuration file.

This internal error should only happen if the Diagnostics plug-in is manually invoked with an invalid configuration file.

-110 General Diagnostics configuration error.

This internal error should only happen if the Diagnostics plug-in is manually invoked with an invalid configuration file.

-111 Unable to start the monitoring agent.

Solution: Verify that sufficient system resources are available.

-112 General error.

Local log extraction

The monitoring agent collects logs and artifacts as .tsf files. The .tsf file isn't readable but you can convert it into a .csv as follows:

<Azure diagnostics extension package>\Monitor\x64\table2csv.exe <relevantLogFile>.tsf

A new file called <relevantLogFile>.csv is created in the same path as the corresponding .tsf file.

Note

You only need to run this utility against the main .tsf file (for example, PerformanceCountersTable.tsf). The accompanying files (for example, PerformanceCountersTables_\*\*001.tsf, PerformanceCountersTables_\*\*002.tsf) are automatically processed.

More about missing trace logs

Note

The following information applies mostly to Azure Cloud Services unless you've configured the DiagnosticsMonitorTraceListener on an application that's running on your infrastructure as a service (IaaS) VM.

  • Make sure the DiagnosticMonitorTraceListener is configured in the web.config or app.config. It's configured by default in cloud service projects. However, some customers comment it out, which causes the trace statements to not be collected by Diagnostics.
  • If logs aren't getting written from the OnStart or Run method, make sure the DiagnosticMonitorTraceListener is in the app.config. By default, it's in the web.config, but that only applies to code running within w3wp.exe. So you need it in app.config to capture traces that are running in WaIISHost.exe.
  • Make sure you're using Diagnostics.Trace.TraceXXX instead of Diagnostics.Debug.WriteXXX. The Debug statements are removed from a release build.
  • Make sure the compiled code actually has the Diagnostics.Trace lines. Use Reflector, ildasm, or ILSpy to verify. Diagnostics.Trace commands are removed from the compiled binary unless you use the TRACE conditional compilation symbol. This common problem occurs when you're using MSBuild to build a project.

Known issues and mitigations

The following known issues have mitigations.

.NET 4.5 dependency

The Azure Diagnostics extension for Windows has a runtime dependency on .NET Framework 4.5 or later. At the time of writing, all machines that are provisioned for Azure Cloud Services, and all official images that are based on Azure VMs, have .NET 4.5 or later installed.

It's still possible to encounter a situation where you try to run the Azure Diagnostics extension for Windows on a machine that doesn't have .NET 4.5 or later. This situation happens when you create your machine from an old image or snapshot, or when you bring your own custom disk.

This issue generally manifests as an exit code 255 when you run DiagnosticsPluginLauncher.exe. Failure happens because of the following unhandled exception:

System.IO.FileLoadException: Could not load file or assembly 'System.Threading.Tasks, Version=1.5.11.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies

Mitigation: Install .NET 4.5 or later on your machine.

Performance counters data is available in storage but doesn't show in the portal

The portal experience in the VMs shows certain performance counters by default. If you don't see the performance counters, and you know that the data is getting generated because it's available in storage, be sure to check:

  • Whether the data in storage has counter names in English. If the counter names aren't in English, the portal metric chart won't recognize it.

    • Mitigation: Change the machine's language to English for system accounts. To do this, select Control Panel > Region > Administrative > Copy Settings. Next, clear Welcome screen and system accounts so that the custom language isn't applied to the system account.
  • If you're using wildcards (*) in your performance counter names, the portal can't correlate the configured and collected counter when the performance counters are sent to the Azure Storage sink.

    • Mitigation: To make sure you can use wildcards and have the portal expand the (*), route your performance counters to the Azure Monitor sink.