Walkthrough: Launching the MPI Cluster Debugger in Visual Studio 2010

In this walkthrough, you will learn how to set up and launch an MPI Cluster Debugger session on your local computer and on a Microsoft Windows HPC Server 2008 cluster. This walkthrough includes the steps and the sample code that you need to create an application that uses Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) application programming interfaces (APIs).

In this guide:

  • Requirements for using the MPI Cluster Debugger

  • Create a C++ MPI sample project in Visual Studio 2010

  • Configure and launch the MPI Cluster Debugger

  • Appendix: Files deployed by Visual Studio in addition to the application binaries (and CRT if requested)

Requirements for using the MPI Cluster Debugger

  • You must have an edition of Visual Studio 2010 that includes the remote debugger installed on your development computer.

  • You must have administrative permissions on the cluster.

  • Visual Studio must be able to access the compute nodes on which you want to run the debugging session. The following scenarios provide the necessary access:

    • You are developing your application on the cluster head node or on a dedicated login node.

    • You are using a cluster in which the compute nodes are connected to the enterprise network (Topology 2, 4, or 5), and your development computer is joined to the same domain or to a domain that has a trust relationship with the cluster domain.

  • To debug applications that use managed code, you must install the corresponding version of the .NET Framework on the cluster compute nodes.

  • To submit your application to an HPC cluster from a client computer, you must have Microsoft HPC Pack 2008 installed.

  • To build MPI programs by using Microsoft Message Passing Interface, you need the Windows HPC Server 2008 SDK installed on your development computer.

Create a C++ MPI sample project in Visual Studio 2010

The sample code in this section is for a parallel application that approximates the value of pi by using a Monte Carlo simulation.

The sample code runs through 50,000,000 iterations on each MPI process. In each iteration, the sample code generates random numbers in the interval [0,1] to determine a set of x and y coordinates. The coordinate set is evaluated to determine if the point falls under the line x2 + y2 = 1. If the point falls under the line, the variable count is increased by one. The value of count from each MPI process is summed into the variable result. The total number of points that fell under the line (result) is multiplied by four then divided by the total number of iterations to approximate the value of pi.

The following procedure includes two implementations of the Monte Carlo simulation.

To create the sample project

  1. Run Visual Studio 2010.

  2. Create a new C++ Win32 Console application named ParallelPI. Use a project without precompiled headers.

    1. On the File menu, point to New, and then click Project.

    2. In the New Project dialog box, click Installed Templates, then select Visual C++. (Depending on how you set up Visual Studio, Visual C++ may be under the Other Languages node.)

    3. In the list of templates, click Win32 Console Application.

    4. For the project name, type: ParallelPI.

    5. Click OK. This opens the Win32 Console Application Wizard.

    6. Click Next.

    7. In Application Settings, under Additional options, clear the Precompiled header check box.

    8. Click Finish to close the wizard and create the project.

  3. Specify additional properties for the project.

    1. In Solution Explorer, right-click Parallel PI, then click Properties. This opens the Property Pages dialog box.

    2. Expand Configuration Properties, then select VC++ Directories.

      In Include Directories, place the cursor at the beginning of the list that appears in the text box, then specify the location of the MS MPI C header files, followed by a semicolon (;). For example:

      C:\Program Files\Microsoft HPC Pack 2008 SDK\Include;
      
    3. In Library Directories, place the cursor at the beginning of the list that appears in the text box, then specify the location of the Microsoft HPC Pack 2008 SDK library file, followed by a semicolon (;).

      For example, if you want to build and debug a 32-bit application:

      C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\i386;
      

      If you want to build and debug a 64-bit application:

      C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\amd64;
      
    4. Under Linker, select Input.

      In Additional Dependencies, place the cursor at the beginning of the list that appears in the text box, and then type the following:

      msmpi.lib;

    5. If you are using the code sample with OpenMP:

      In Configuration Properties, expand C/C++, and then select Language.

      In Open MP Support, select Yes (/openmp) to enable compiler support for OpenMP.

    6. Click OK to close the property pages.

  4. In the main source file, select all the code and then delete it.

  5. Paste one of the following code samples into the empty source file. The first sample uses MPI and OpenMP, and the second sample uses MPI and Parallel Patterns Library (PPL).

    The following code sample uses MPI and OpenMP. The function ThrowDarts uses an OpenMP parallel for loop to utilize the multicore hardware if available.

    // ParallelPI.cpp : Defines the entry point for the MPI application.
    //
    #include "mpi.h"
    #include "stdio.h"
    #include "stdlib.h"
    #include "limits.h"
    #include "omp.h"
    #include <random>
    
    int ThrowDarts(int iterations)
    {
    std::tr1::uniform_real<double> MyRandom;
    std::tr1::minstd_rand0 MyEngine;
    
    
    double RandMax = MyRandom.max();
    int count = 0;
    omp_lock_t MyOmpLock;
    
    omp_init_lock(&MyOmpLock);
    //Compute approximation of pi on each node
    #pragma omp parallel for
    for(int i = 0; i < iterations; ++i)
    {
    double x, y;
    x = MyRandom(MyEngine)/RandMax;
    y = MyRandom(MyEngine)/RandMax;
    
    if(x*x + y*y < 1.0)
    {
    omp_set_lock(&MyOmpLock);
    count++;
    omp_unset_lock(&MyOmpLock);
    }
    }
    
    omp_destroy_lock(&MyOmpLock);
    
    return count;
    }
    
    int main(int argc, char* argv[])
    {
    int rank;
    int size;
    int iterations;
    int count;
    int result;
    double time;
    MPI_Status s;
    
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);
    MPI_Comm_size(MPI_COMM_WORLD,&size);
    
    if(rank == 0)
    {
    //Rank 0 asks the number of iterations from the user.
    iterations = 50000000;
    if(argc > 1)
    {
    iterations = atoi(argv[1]);
    }
    printf("Executing %d iterations.\n", iterations);
    fflush(stdout);
    }
    //Broadcast the number of iterations to execute.
    if(rank == 0)
    {
    for(int i = 1; i < size; ++i)
    {
    MPI_Ssend(&iterations, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
    }
    }
    else
    {
    MPI_Recv(&iterations, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &s);
    }
    
    //MPI_Bcast(&iterations, 1, MPI_INT, 0, MPI_COMM_WORLD);
    
    count = ThrowDarts(iterations);
    
    //Gather and sum results
    if(rank != 0)
    {
    MPI_Ssend(&count, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
    }
    else
    {
    for(int i = 1; i < size; ++i)
    {
    int TempCount = 0;
    MPI_Recv(&TempCount, 1, MPI_INT, i, 0, MPI_COMM_WORLD, &s);
    count += TempCount;
    }
    }
    result = count;
    
    //MPI_Reduce(&count, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
    
    if(rank == 0)
    {
    printf("The value of PI is approximated to be: %16f", 4*((float)result/(float)(iterations*size)));
    }
    
    MPI_Barrier(MPI_COMM_WORLD);
    
    MPI_Finalize();
    return 0;
    }
    

     

    The following code sample uses Parallel Patterns Library (PPL) instead of OpenMP, and it uses the MPI collective operations instead of point-to-point operations.

     

    // ParallelPI.cpp : Defines the entry point for the MPI application.
    //
    #include "mpi.h"
    #include "stdio.h"
    #include "stdlib.h"
    #include "limits.h"
    #include <ppl.h>
    #include <random>
    #include <time.h>
    
    using namespace Concurrency;
    
    int ThrowDarts(int iterations)
    {
    
    combinable<int> count;
    
    int result = 0;
    
    
    parallel_for(0, iterations, [&](int i){
    
    std::tr1::uniform_real<double> MyRandom;
    double RandMax = MyRandom.max();
    std::tr1::minstd_rand0 MyEngine;
    double x, y;
    
    MyEngine.seed((unsigned int)time(NULL));
    
    x = MyRandom(MyEngine)/RandMax;
    y = MyRandom(MyEngine)/RandMax;
    
    if(x*x + y*y < 1.0)
    {
    count.local() += 1;
    }
    });
    
    result = count.combine([](int left, int right) { return left + right; });
    
    return result;
    }
    
    void main(int argc, char* argv[])
    {
    int rank;
    int size;
    int iterations;
    int count;
    int result;
    
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);
    MPI_Comm_size(MPI_COMM_WORLD,&size);
    
    if(rank == 0)
    {
    //Rank 0 reads the number of iterations from the command line.
    //50M iterations is the default.
    iterations = 50000000;
    if(argc > 1)
    {
    iterations = atoi(argv[argc-1]);
    }
    printf("Executing %d iterations on %d nodes.\n", iterations, size);
    fflush(stdout);
    }
    //Broadcast the number of iterations to execute.
    MPI_Bcast(&iterations, 1, MPI_INT, 0, MPI_COMM_WORLD);
    
    count = ThrowDarts(iterations);
    
    //Gather and sum results
    MPI_Reduce(&count, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
    
    if(rank == 0)
    {
    printf("The value of PI is approximated to be: %16f", 4*((double)result/(double)(iterations*size)));
    }
    
    MPI_Barrier(MPI_COMM_WORLD);
    
    MPI_Finalize();
    
    }
    
  6. On the File menu, click Save All.

  7. On the Build menu, click Rebuild Solution.

Configure and launch the MPI Cluster Debugger

After building your application, you are ready to configure and launch the debugger. This section describes three options for debugging:

  • Debug one MPI process on the local computer

  • Debug multiple MPI Processes on the local computer

  • Debug one or more MPI processes on a cluster

Note

In the MPI Cluster Debugger, you cannot start without debugging. Pressing Ctrl+F5 (or Start without debugging on the Debug menu) also starts debugging.

Debug one MPI process on the local computer

To debug on the local computer using only one MPI process, use the same process that you would use to debug any other application. Set a break point at the desired location in your program and press F5 to start the debugger.

MPI programs communicate through IP over ports. The first time you launch an MPI program, you may see a security warning from the firewall that indicates a port is being opened. Read the warning message and ensure that you understand the changes that you are making to your system. You must unblock the firewall to continue debugging on the local computer.

Debug multiple MPI Processes on the local computer

The following procedure describes how to start a local debugging session for ParallelPI.

To start the MPI Cluster Debugger with four MPI processes running on your local computer

  1. In Solution Explorer, right-click Parallel PI, and then click Properties. This opens the Property Pages dialog box.

  2. Expand Configuration Properties, and then select Debugging.

  3. Under Debugger to launch, select MPI Cluster Debugger.

  4. In Run Environment, select Edit Hpc Node from the drop-down list. This opens the Node Selector dialog box.

  5. In the Head Node drop-down list, select localhost.

  6. In Number of processes, select 4.

  7. Click OK to save changes and close the Node Selector dialog box.

  8. ParallelPI accepts one argument that determines the number of iterations to run. The default is set to 50,000,000. For the local debugging session, reduce the iterations to 5,000 as follows:

    In Application Arguments, type 5000.

  9. Click OK to save the changes and close the Property Pages.

  10. Set a breakpoint within the body of the parallel for loop.

  11. Press F5 to launch the debugger.

  12. Five console windows should appear: one cmd.exe window, and four ParallelPI.exe windows (one for each process that you launched). The console window that corresponds to the rank 0 process indicates the number of iterations and the calculated approximation of pi.

  13. On the Debug menu, click Windows, and then click Processes.

  14. Set the active process for debugging by double-clicking a process in the Processes window.

Note

When you are debugging multiple processes, by default, a breakpoint affects all processes that are being debugged. To avoid breaking processes in unintended places, deselect the Break all processes when one process breaks option. (In the Tools menu, click Options, then select Debugging). For more information about how to change break behavior, see How to: Break Execution.

Debug one or more MPI processes on a cluster

When you launch the MPI Debugger on a cluster, the debugger submits your application to the cluster as a job. The Visual C runtimes that match your project (x86 or x64, debug or release) must be present in the working directory on the compute nodes. If the correct runtimes are not already on the compute nodes, then you need to include these in the debugger deployment by specifying the Additional Files to Deploy property. The following procedure includes a step to deploy the OpenMP debug runtime DLL. By default, the C Runtime (CRT) library is deployed when you launch the MPI Cluster Debugger. If the correct runtimes are not present, you will see side-by-side errors when you try to run your application. If the OpenMP runtime is not included, the breakpoints will not be hit.

To launch the MPI Debugger on a cluster

  1. In Solution Explorer, right-click Parallel PI, then click Properties. This opens the Property Pages dialog box.

  2. Expand Configuration Properties, then select Debugging.

  3. Under Debugger to launch, select MPI Cluster Debugger.

  4. In Run Environment, select Edit Hpc Node from the drop-down list. This opens the Node Selector dialog box.

  5. In the Head Node drop-down list, select the name of the head node for the cluster that you want to use.

    The list of head nodes is populated from the Active Directory domain controller. Only clusters in your domain will appear in the list. If you do not see your head node, type the name or the IPv4 address of the head node in the property field.

  6. In Number of processes, select 4.

  7. In Schedule one process per, select how to allocate your processes. You can allocate one process per Core, Socket, or Node.

  8. Click OK to save changes and close the Node Selector dialog box.

  9. In Deployment Directory, specify a shared directory on the head node. If the deployment directory does not exist and you have write permissions in the root directory that is specified, the deployment directory is created automatically.

    The CcpSpoolDir directory shared resource is created when HPC Pack 2008 is installed on the head node. For example, type the following, where <myHeadNode> is the name of the cluster that you are using:

    \\<myHeadNode>\CcpSpoolDir\

  10. In Working Directory, specify a local working directory on each compute node. For example, type the following, where <myUserName> is your user name:

    C:\Users\<myUserName>\ParallelPI

  11. If you are using the sample code with OpenMP, add the OpenMP debug runtime DLL file (Microsoft.VC100.DebugOpenMP\vcomp100d.dll):

    1. In Additional Files to Deploy, select <Edit Fileā€¦>. This opens the File and Folder Selector dialog box.

    2. Click Add File, navigate to Microsoft.VC100.DebugOpenMP\vcomp100d.dll, select the file, and then click Open.

      For example, on an x86-based computer, the default location on a 64-bit edition of the Windows Server 2008 operating system is:

      C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\redist\Debug_NonRedist\x86\Microsoft.VC100.DebugOpenMP\ vcomp100d.dll

    3. Click OK to add the file and close the File and Folder Selector dialog box.

  12. Click OK to save changes and close Property Pages.

  13. Set a breakpoint within the body of the parallel for loop.

  14. Press F5 to launch the debugger.

  15. Because you are submitting a job to the cluster, you are prompted to enter your password to connect to the cluster. Type your password, and then press ENTER.

  16. After the debugger launches, look at the process window to verify the placement of the processes. For each process, look at the Transport Qualifier column to view the compute node on which the process is running.

Appendix: Files deployed by Visual Studio in addition to the application binaries (and CRT if requested)

  • DebuggerProxy.dll

  • DebuggerProxy.dll.manifest

  • Delete_from_workdir.bat: A script to delete the files that are deployed

  • Deploy_to_workdir.bat: A script to copy files from the Deployment Directory to the work directory

  • dbghelp.dll

  • mcee.dll

  • Mpishim.bat: A script to launch the remote debugger

  • Mpishim.exe: A program that orchestrates communication between the IDE and Msvsmon.exe

  • Msvsmon.exe: The remote debugger

  • Msvsmon.exe.config

  • PfxTaskProvider.dll

  • symsrv.dll

  • symsrv.yes

  • vbdebug.dll

  • 1033\msdbgui.dll

  • 1033\vbdebugui.dll

See Also

Concepts

Configuration Properties for the MPI Cluster Debugger
Debugging MPI Applications on an HPC Cluster

Other Resources

Debugger Roadmap
mpiexec Command Reference