Verification Steps: Test the Application Network
Applies To: Windows HPC Server 2008 R2, Windows HPC Server 2008
After the compute nodes have finished deploying, you can run a built-in diagnostic test to verify that your application network is running properly and that the NetworkDirect device drivers are working as expected. Depending on the version of your Windows HPC Server cluster, the test is named as follows:
Cluster version |
Diagnostic test |
---|---|
Windows HPC Server 2008 R2 |
MPI Ping-Pong: Latency |
Windows HPC Server 2008 |
MPI Ping-Pong: Quick Check |
This diagnostic test runs a network latency test between pairs of compute nodes in your cluster, using the application network.
Important
If you have a Windows HPC Server 2008 R2 cluster, you can also install and run the Network Troubleshooting Report diagnostic test (https://go.microsoft.com/fwlink/?LinkId=207765). When installed along with the vstat.exe tool that is available from many InfiniBand hardware vendors, this test collects and analyzes network information in your InfiniBand network. For more information about the Network Troubleshooting Report diagnostic test and the vstat.exe tool, see Install and Use the Network Troubleshooting Report Diagnostic Test (https://go.microsoft.com/fwlink/?LinkId=202483).
To test the application network with a network latency diagnostic test
If HPC Cluster Manager is not already open on the head node, open it. Click Start, point to All Programs, click Microsoft HPC Pack, and then click HPC Cluster Manager.
In Diagnostics, in the Navigation Pane, click Tests.
In the views pane, double-click the name of the diagnostic test - for example, MPI Ping-Pong: Latency, The Run Diagnostics dialog box appears.
Click All nodes, and then click Run to start the diagnostics test.
To view the results of the test, in Diagnostics, in the Navigation Pane, click Test Results, and then in the views pane, click the name of the test. After the test has finished running, you can view the results in the Detail Pane.
Your application network is running properly and the NetworkDirect device drivers are working as expected if the diagnostic test is successful and the following average latency results are reported under Latency Summary:
For InfiniHost adapter cards, the average latency is less than 7 microseconds
For ConnectX adapter cards, the average latency is less than 3 microseconds
Note
You can also test the application network by running the mpipingpong command-line tool. This tool can provide detailed latency and throughput measurements and statistics for packet sizes of up to 4 megabytes. For more information about this tool, see mpipingpong (https://go.microsoft.com/fwlink/?LinkID=137352).
Additional considerations
The CCP_MPI_NETMASK environment variable can be temporarily changed to move application traffic from the application network to a different HPC network (for example, the private network). Performing this change might be useful when troubleshooting your application network.
To see the current value of the CCP_MPI_NETMASK environment variable, on the head node, on an elevated Command Prompt window, type the following command:
cluscfg listenvs
The current value of the CCP_MPI_NETMASK environment variable can be changed with the cluscfg command-line tool. For example, if you want to move the application traffic to the private network, and the current configuration of the private network has IP addresses in the 10.0.x.x range, with subnet mask 255.255.0.0, the following command will change the application traffic to run on that network:
cluscfg setenvs CCP_MPI_NETMASK=10.0.0.0/255.255.0.0
Important
After you have finished performing troubleshooting, and before taking your HPC cluster back into production, remember to change the CCP_MPI_NETMASK environment variable to its original value. Otherwise, your application traffic will continue to be routed to a different HPC network.
Note
You can review the current HPC network configuration in HPC Cluster Manager, under Configuration. For more information about the cluscfg command-line tool, see cluscfg (https://go.microsoft.com/fwlink/?LinkID=137353).