Enabling tracing for HPC SOA applications

When helping our customers troubleshoot HPC SOA applications and cluster infrastructure, we realized that SOA trace is a very powerful tool when physical or remote logon is not available to the cluster. (Which happens a lot given that HPC is always a core asset of institutes and normally hosts sensitive or private data.)

However, we found that there is no clear and central instruction on how to enable HPC traces! So here is wrap-up of (hopefully) everything.

 

First of all, there are 3 levels of traces, let's take a look of them one by one.

Closest to application level, there is the user code trace, which is generated by user service code but service host also generates information about receiving and sending message. This is helpful in troubleshooting errors in user code and monitoring message traffic on compute nodes. The easiest way to enable this trace is through HPC Cluster Manager - launch HPC Cluster Manager, switch to the Configuration pane, select Service node, select the service you want to configure trace, click "Configure Trace" in the action pane on the right (or through right-click menu), and select the desired trace level. I'll suggest using Verbose’ directly because there is no such thing as information overload. 

 

NOTE: The configuration will only take effect on those processes that start AFTER the modification. So make sure you modify the setting before you run the session.

 

Viewing this trace is also easy - launch HPC Cluster Manager, switch to the Job pane, select the job you want to view the trace of, click "Collect trace" in the action pane on the right (or through right-click menu), and replace the directory in the dialog if necessary. Note that this path must be accessible from all compute nodes, so the file share on head node is a good idea.

After the collection finishes, you will see all the svclog files have been put in the specified directory. To view those files, use the Service Trace Viewer available with the Windows SDK or Visual Studio 2008 (or later).

 

The 2nd type of trace is SOA infrastructure trace. This trace provides more "insight" on what's happening under the hood of HPC SOA Session service, Broker service, broker worker, and service host processes. This is the main source of information when you hit problem that you can't solve and have to turn to Microsoft for help. To enable the trace, on the headnodebroker node, launch an admin privilege CLI. Run,

logman start trace SOATRACE -p Microsoft-HPC-Runtime -max 2000 -o "%CCP_DATA%\LogFiles\SOATrace.etl" -ets

Run the session again, and SOATrace.etl will be generated under %CCP_DATA%\LogFiles. To view an .etl file, you can run “Event Viewer” and load the file by menu item “Action  ->  Open Saved Log…”.

NOTE: Remember to stop the trace once you finished troubleshooting since trace has a negative impact on broker performance.

The last type of trace is WCF tracing. In case nobody understands what the problem is, WCF library can log detailed information at message level and might be helpful. To enable WCF traces, replace the <system.diagnostics> section of %CCP_HOME%\bin\HpcBrokerWorker.exe.config on all broker nodes,

<system.diagnostics>
<sources>
<source name="System.ServiceModel" switchValue="Verbose,ActivityTracing"
propagateActivity="true">
<listeners>
<add type="System.Diagnostics.DefaultTraceListener" name="Default">
<filter type="" />
</add>
<add name="WcfTraceListener">
<filter type="" />
</add>
</listeners>
</source>
<source name="System.ServiceModel.MessageLogging" switchValue="Verbose,ActivityTracing">
<listeners>
<add type="System.Diagnostics.DefaultTraceListener" name="Default">
<filter type="" />
</add>
<add name="WcfTraceListener">
<filter type="" />
</add>
</listeners>
</source>
</sources>
<sharedListeners>
<add initializeData="c:\temp\hpcbrokerworker.trace.svclog" type="System.Diagnostics.XmlWriterTraceListener, System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"
name="WcfTraceListener" traceOutputOptions="LogicalOperationStack, DateTime, Timestamp, ProcessId, ThreadId, Callstack">
<filter type="" />
</add>
</sharedListeners>
</system.diagnostics>
nbsp;

NOTE: Just like user code trace, changing app.config file requires a restart of the executable. So make sure you modify the setting before you run the session. You also need restart the HPC Broker service on broker nodes after making this change.

NOTE: Make sure the user has ‘write’ privilege to the specified directory, in this case “c:\temp”.

NOTE: This is for SOA broker worker. To understand the end-to-end workflow, make sure you modify %CCP_HOME%\bin\HpcServiceHost.exe.config on all compute nodes and the app.config of you client application on client computer.