Share via


Install and Use the Network Troubleshooting Report Diagnostic Test

Applies To: Microsoft HPC Pack 2008 R2, Microsoft HPC Pack 2012, Microsoft HPC Pack 2012 R2, Windows HPC Server 2008 R2

This topic contains information about how to install and use the Network Troubleshooting Report diagnostic test for Microsoft® HPC Pack. The Network Troubleshooting Report diagnostic test collects and analyzes network information to help you troubleshoot networking issues on your Windows HPC cluster.

In this topic:

  • About the Network Troubleshooting Report diagnostic test

  • Install the diagnostic test

  • Install the diagnostic test files on workstation nodes

  • Install vstat.exe (optional)

  • Run the Network Troubleshooting Report diagnostic test

  • Copy test data to Microsoft Excel for custom analysis

  • Redistribute the diagnostic test files to cluster nodes

  • Edit a node template to automatically deploy test files to new nodes

  • Uninstall the diagnostic test

About the Network Troubleshooting Report diagnostic test

  • The Network Troubleshooting Report diagnostic test is installed using an installation (MSI) file, which automates the tasks of adding the test to HPC Cluster Manager and copying the necessary binary files and scripts to the nodes. This installation method for a new or custom diagnostic test is not necessary, but it automates tasks that would otherwise be manual. (For more information about adding custom diagnostic tests to a cluster, see Add New and Custom Diagnostic Tests.)

  • During the installation, the following binary files are copied to each node:

    Node

    Files

    Path

    Function

    Head node

    NetTroubleshoot.exe, NetWrench.exe, NetTSAdapter.dll

    %REMINST%\NetTroubleshoot

    Analyze the data relayed by the NetWrench.exe and NetTSAdapter.dll files that run on the compute nodes, and generate the HTML report for the test

    Compute nodes

    NetWrench.exe, NetTSAdapter.dll

    Important

    If a node does not have these files (for example, they failed to be distributed to that node during the installation of the test or the node was redeployed recently), the test will fail to run on that node. No results for that node will appear in the report for the test.

    %CCP_HOME%bin

    Run when the test runs, to collect network information and relay it to the head node for analysis

  • If you have an InfiniBand network, you can choose to install vstat.exe on the nodes in that network to collect additional information for the report. The NetWrench.exe and NetTSAdapter.dll files on the compute node use vstat.exe to collect information about the status and the capabilities of the host channel adapter (HCA) cards for that network. This information is then analyzed on the head node and is included in the HTML report. For more information, see Install vstat.exe (optional), later in this topic.

  • A script (RedistributeCN.cmd) is provided to redistribute the binary files to the nodes. You can run this script at any time. The script should be run after a new node is added to the cluster, or if an existing node is redeployed. For more information, see Redistribute the diagnostic test files to cluster nodes, later in this topic.

  • If you redeploy nodes often, so that you do not need to constantly redistribute the binary files to the nodes, you can add a task to your node templates to run a script (CopyCN.cmd). This script will copy the binary files for the test during node deployment. For more information, see Edit a node template to automatically deploy test files to new nodes, later in this topic.

  • Unlike the basic HTML reports generated by the default tests installed with HPC Pack, the Network Troubleshooting Report test report includes custom tables that contain all of the test data. These tables can be copied to Microsoft ® Excel ® or a similar application for analysis. For more information, see Copy test data to Microsoft Excel for custom analysis , later in this topic.

^ Top of page

Install the diagnostic test

Important

  • To install the Network Troubleshooting Report diagnostic test, you must use the domain credentials of an HPC cluster administrator. If you use local administrator credentials, the installation will fail. It is recommended that you are logged on to the head node as an HPC cluster administrator.

  • To copy the necessary binary files for the test to the compute nodes, the installation wizard starts a script that runs a clusrun command. If a compute node cannot be reached or the files are not copied successfully, the installation of the test can still complete. However, if a compute node does not have the files to collect and relay information to the head node, information about that node will not appear in the test report.

  • If the binary files that are required for the test are not successfully copied to all compute nodes during the installation, you can manually redistribute the files to the nodes at a later date by running the RedistributeCN.cmd script. For more information, see Redistribute the diagnostic test files to cluster nodes, later in this topic.

  • If you have workstation nodes in your cluster, you or an administrator of workstation computers may need to perform additional, manual steps to deploy the necessary binary files to the workstation nodes. For more information, see Install the diagnostic test files on workstation nodes.

  • If you have an InfiniBand network, you can choose to install vstat.exe on the nodes to provide additional network information, or vstat.exe may already be installed. If you need to install vstat.exe on the nodes, see Install vstat.exe (optional), later in this topic.

To install the diagnostic test

  1. Download the Network Troubleshooting Report diagnostic test installation program from the Microsoft Download Center to the head node of your Windows HPC cluster or to a network location. The Network Troubleshooting Report diagnostic test is included in the HPC Pack Tool Pack. For more information about the Tool Pack, see Microsoft HPC Pack: Tools and Extensibility.

  2. Make sure that as many nodes as possible in your cluster are started and can be reached from the head node. For example, to check this, view the Node Health monitoring chart. To do this:

    • If HPC Cluster Manager is not already open on the head node, open it.

    • In Charts and Reports, view the Node Health monitoring chart.

  3. On the head node computer, run NetTroubleshoot.msi. Follow the steps in the installation wizard.

  4. To copy the NetWrench.exeand NetTSAdapter.dll files to the compute nodes in the cluster, the wizard automatically opens a Command Prompt window and runs a script that copies the files. When the script completes, review the Summary and then press a key to continue.

    Important

    • If an error occurs during the copying of the files to the compute nodes or there is a problem with your credentials, an error message appears in the command output that guides you to correct the problem.

    • If you are prompted, type the password for your account.

  5. On the final page of the installation wizard, click Finish.

Additional considerations

  • The Network Troubleshooting Report diagnostic test is installed in HPC Cluster Manager using the following metadata:

    Item Description

    Suite

    Network Troubleshooting

    Name

    Network Troubleshooting Report

    Alias

    netTroubleshoot

^ Top of page

Install the diagnostic test files on workstation nodes

If the HPC cluster administrator credentials that you use to install the test also provide administrative credentials on workstation nodes (and unmanaged server nodes, if supported in your version of HPC Pack), the installation program automatically copies the necessary test files to the workstation computers.

In many organizations, however, HPC cluster administrators do not have administrative credentials on the workstation nodes. If you do not have administrative credentials on the workstations, the installation program cannot copy the binary files for the test to those nodes. It is also not possible to redistribute the test files to the workstation nodes by running the RedistributeCN.cmd script.

If you are not a workstation administrator for your organization, you will need to discuss and coordinate the installation of the diagnostic test files with the workstation administrator. The administrator will need to follow the deployment practices in the organization to copy the following files from the head node computer (or a network location) to the %CCP_HOME%bin folder in the workstation node computers: NetWrench.exe and NetTSAdapter.dll. For more information about these files and their locations, see About the Network Troubleshooting Report diagnostic test, in this topic.

Install vstat.exe (optional)

If you have an InfiniBand network and you want the Network Troubleshooting Report to display the status and the capabilities of the host channel adapter (HCA) cards in that network, the vstat.exe tool must be installed on each node that has an HCA card. The Network Troubleshooting Report diagnostic test does not install vstat.exe.

Important

  • Vstat.exe may be installed automatically as part of the driver package from your HCA card vendor or system builder, or you may need to download and install vstat.exeseparately. Consult the documentation from your vendor or system builder.

  • If you installed the driver package or vstat.exe only on the head node computer, you will need to deploy vstat.exe to the other cluster nodes. The method you use depends on how vstat.exe is installed on the head node.

  • If vstat.exe is not available from your vendor or system builder, you can also download the latest device drivers that are published by the OpenFabrics Alliance (OFA), which work with most commercially available HCA cards. The vstat.exe tool is installed automatically with the device drivers that are published by the OFA. For more information and to download the latest device drivers that are published by the OFA, see The OpenFabrics Alliance (https://go.microsoft.com/fwlink/?LinkID=137347).

To install vstat.exe on the nodes in your cluster

  1. Download the InfiniBand driver or tools installation program from the appropriate hardware vendor or system builder to the head node of your Windows HPC cluster or to a network location.

  2. On the head node computer, run the installation program.

  3. (Important) When prompted, choose to install the program files in a folder in the C:\Program Files folder on the head node computer. In most cases, this is a default installation option.

  4. Depending on how vstat.exe is installed on the head node, do one of the following:

    • If vstat.exe is installed as a stand-alone application, you can run a clusrun command on the head node to copy vstat.exe to the nodes.

      Important

      vstat.exe must be copied to a folder in the C:\Program Files folder on the nodes. This ensures that the diagnostic test can use vstat.exe to collect information on the nodes.

    • If vstat.exe is installed with the driver package, you can add the drivers to the operating system images that are deployed to the nodes. In HPC Cluster Manager, in Configuration, in the Deployment To-do List, click Manage drivers.

Additional resources

^ Top of page

Run the Network Troubleshooting Report diagnostic test

You can use the following procedure to run the Network Troubleshooting Report diagnostic test on all nodes in the cluster and to view the report.

Important

  • A different suite of tests, also named “Network Troubleshooting”, is also installed by default in HPC Cluster Manager. This suite includes the DNS Test, the Domain Connectivity Test, and the Ping Test. To run the Network Troubleshooting Report diagnostic test that you installed, ensure that you select the test with the alias netTroubleshoot.

  • Running the Network Troubleshooting Report diagnostic test on a large cluster can take a long time.

To run the Network Troubleshooting Report diagnostic test

  1. If HPC Cluster Manager is not already open on the head node, open it.

  2. In Diagnostics, in the Navigation Pane, expand Tests, expand Network, and then click Network Troubleshooting.

  3. In the view pane, right-click Network Troubleshooting Report, and then click Run.

  4. In the Run Diagnostic Tests dialog box, in Nodes to test, select All nodes, and then click Run.

To view the report

  1. In Diagnostics, in the Navigation Pane, click Test Results.

  2. In the view pane, verify that the status of the Network Troubleshooting Report test is not Running.

  3. To view the report, in the view pane, double-click Network Troubleshooting Report. The report will open in your default web browser.

  4. To export the report, right-click Network Troubleshooting Report and then click Export Results. You can then open the report using a browser or Microsoft Excel.

Note

If you see an error message in the Nodes Excluded from this Report section of the report similar to “Netwrench.exe is not recognized as an internal or external command, operable program or batch file”, the binary files for the diagnostic test are not found on the indicated nodes. This can occur if there is a problem distributing the files to the nodes, or if a node was redeployed. To redistribute the binary files for the test to the nodes, see Redistribute the diagnostic test files to cluster nodes, later in this topic.

Additional resources

^ Top of page

Copy test data to Microsoft Excel for custom analysis

You can copy the data in any report table to Microsoft Excel, and perform your custom analysis of the data. The All Data by Network tab in the report is specifically created for that purpose. It contains summary tables of the data in the different categories in the Analysis by Category tab.

To copy the data to Excel

  1. Open a new Excel workbook.

  2. On the test report, click All Data by Network.

  3. Select or highlight the table that you want to copy.

  4. In Excel, click a cell and paste the data.

  5. You can use the tools in Excel to sort, filter, and analyze the data.

^ Top of page

Redistribute the diagnostic test files to cluster nodes

At any time, you can redistribute the files necessary to collect information for the diagnostic test to all nodes that can be reached in the cluster.

To redistribute the diagnostic test files

  1. Make sure that as many nodes as possible in your cluster are started and can be reached from the head node.

  2. Open an elevated Command Prompt window. Click Start, point to All Programs, click Accessories, right-click Command Prompt, and then click Run as administrator.

  3. At the elevated Command Prompt window, type the following command:

    %ccp_home%bin\RedistributeCN.cmd
    

^ Top of page

Edit a node template to automatically deploy test files to new nodes

If you are deploying nodes from bare metal, you can edit an existing node template to automatically deploy the Network Troubleshooting diagnostic test files to new nodes.

To edit the node template

  1. In HPC Cluster Manager, in Configuration, in the Navigation Pane, click Node Templates.

  2. In the views pane, select a node template.

  3. In the Actions pane, click Edit. The Node Template Editor dialog box appears.

  4. To add a task that will copy the files for the Network Troubleshooting Report diagnostic test to each node, click Add Task, point to Deployment, and then click Run OS command.

  5. Ensure that the new task that you created is selected in the Node template tasks list, and then click Move Down until that task is listed as the last task under Deployment. This will make the new task run after all the other deployment tasks have finished running.

  6. Specify the following properties for the new task:

    • Set the ContinueOnFailure property to True.

    • Optionally, in the text box for the Description property, type a description for the task. For example: Copy test report files command.

    • In the text box for the Command property, type the following command:

      \\%ccp_scheduler%\REMINST\NetTroubleshootSetup\CopyCN.cmd
      
  7. To save the node template with the new task, click Save.

^ Top of page

Uninstall the diagnostic test

To uninstall the Network Troubleshooting Report diagnostic test, do the following:

  1. Uninstall the diagnostic test on the head node

  2. Delete the diagnostic test files from the nodes (optional)

  3. Delete vstat.exe from the nodes (optional)

To uninstall the diagnostic test on the head node

  1. On the head node, close HPC Cluster Manager, if it is currently open.

  2. Open Control Panel. Click Start, and then click Control Panel.

  3. In Control Panel, under Programs, click Uninstall a program.

  4. On the list of installed programs, right-click Microsoft HPC Pack 2008 R2 Network Troubleshooting Report, and then click Uninstall. Follow the steps of the wizard.

Important

To uninstall the Network Troubleshooting Report diagnostic test, you must use the domain credentials of an HPC cluster administrator. If you use local administrator credentials, the uninstallation will fail.

To delete the diagnostic test files from the nodes (optional)

  1. Open an elevated Command Prompt window. Click Start, point to All Programs, click Accessories, right-click Command Prompt, and then click Run as administrator.

  2. At the elevated Command Prompt window, type the following two commands:

    clusrun /all del “%CCP_HOME%bin\NetWrench.exe”
    clusrun /all del “%CCP_HOME%bin\NetTSAdapter.dll”
    

To delete vstat.exe from the nodes (optional)

  • Follow the instructions of your vendor of HCA cards or system builder. If an unattended uninstallation program is provided, you can run a clusrun command on the head node to uninstall vstat.exe on the nodes.

^ Top of page