Walkthrough: Launching the MPI Cluster Debugger in Visual Studio 2008
This walkthrough describes how to configure and launch an MPI Cluster Debugger session on your local computer and on a Microsoft Windows HPC Server 2008 cluster. This walkthrough includes the steps and the sample code that you need to create an application that uses Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) application programming interfaces (APIs).
Before following the steps in this guide, ensure that you have the required software, software updates, and system configurations as listed in Requirements for Using the MPI Cluster Debugger.
This guide includes the following sections:
Create a C++ MPI sample project in Visual Studio 2008
Configure and launch the MPI Cluster Debugger
Debug one MPI process on the local computer
Debug multiple MPI processes on the local computer
Debug one or more MPI processes on a cluster
Appendix: Files deployed by Visual Studio in addition to the application binaries (and CRT if requested)
Create a C++ MPI sample project in Visual Studio 2008
The sample code in this section is for a parallel application that approximates the value of pi by using a Monte Carlo simulation.
The sample code runs through 50,000,000 iterations on each MPI process. In each iteration, the sample code generates random numbers in the interval [0,1] to determine a set of x and y coordinates. The coordinate set is evaluated to determine if the point falls under the line x2 + y2 = 1. If the point falls under the line, the variable count is increased by one. The value of count from each MPI process is summed into the variable result. The total number of points that fell under the line (result) is multiplied by four then divided by the total number of iterations to approximate the value of pi.
The following procedure includes two implementations of the Monte Carlo simulation.
The first sample uses MPI and OpenMP. For more information about OpenMP, see OpenMP in Visual C++.
The second sample uses MPI and Parallel Patterns Library (PPL). For more information about PPL, see Parallel Patterns Library (PLL).
To create the sample project
Run Visual Studio 2008.
Create a new C++ Win32 Console application named ParallelPI. Use a project without precompiled headers.
On the File menu, point to New, and then click Project.
In the New Project dialog box, in Project types, select Visual C++. (Depending on how you set up Visual Studio, Visual C++ may be under Other Project Types.)
In Templates, click Win32 Console Application.
For the project name, type: ParallelPI.
Click OK. This opens the Win32 Console Application Wizard.
Click Next.
In Application Settings, under Additional options, clear the Precompiled header check box.
Click Finish to close the wizard and create the project.
Specify additional properties for the project.
In Solution Explorer, right-click Parallel PI, then click Properties. This opens the Property Pages dialog box.
Expand Configuration Properties, expand C/C++, and then select General.
In Additional Include Directories, specify the location of the MS MPI C header files. For example:
C:\Program Files\Microsoft HPC Pack 2008 SDK\Include;
In Configuration Properties, expand Linker, and then select General.
In Additional Library Directories, specify the location of the Microsoft HPC Pack 2008 SDK library file.
For example, if you want to build and debug a 32-bit application:
C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\i386;
If you want to build and debug a 64-bit application:
C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\amd64;
Under Linker, select Input.
In Additional Dependencies, place the cursor at the beginning of the list that appears in the text box, and then type the following:
msmpi.lib
If you are using the code sample with OpenMP:
In Configuration Properties, expand C/C++, and then select Language.
In Open MP Support, select Yes (/openmp) to enable compiler support for OpenMP.
Click OK to save your settings and close the property pages.
In the main source file, select all the code and then delete it.
Paste one of the following code samples into the empty source file. The first sample uses MPI and OpenMP, and the second sample uses MPI and Parallel Patterns Library (PPL).
The following code sample uses MPI and OpenMP. The function
ThrowDarts
uses an OpenMP parallelfor
loop to utilize the multicore hardware if available.// ParallelPI.cpp : Defines the entry point for the MPI application. // #include "mpi.h" #include "stdio.h" #include "stdlib.h" #include "limits.h" #include "omp.h" #include <random> int ThrowDarts(int iterations) { std::tr1::uniform_real<double> MyRandom; std::tr1::minstd_rand0 MyEngine; double RandMax = MyRandom.max(); int count = 0; omp_lock_t MyOmpLock; omp_init_lock(&MyOmpLock); //Compute approximation of pi on each node #pragma omp parallel for for(int i = 0; i < iterations; ++i) { double x, y; x = MyRandom(MyEngine)/RandMax; y = MyRandom(MyEngine)/RandMax; if(x*x + y*y < 1.0) { omp_set_lock(&MyOmpLock); count++; omp_unset_lock(&MyOmpLock); } } omp_destroy_lock(&MyOmpLock); return count; } int main(int argc, char* argv[]) { int rank; int size; int iterations; int count; int result; double time; MPI_Status s; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&size); if(rank == 0) { //Rank 0 asks the number of iterations from the user. iterations = 50000000; if(argc > 1) { iterations = atoi(argv[1]); } printf("Executing %d iterations.\n", iterations); fflush(stdout); } //Broadcast the number of iterations to execute. if(rank == 0) { for(int i = 1; i < size; ++i) { MPI_Ssend(&iterations, 1, MPI_INT, i, 0, MPI_COMM_WORLD); } } else { MPI_Recv(&iterations, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &s); } //MPI_Bcast(&iterations, 1, MPI_INT, 0, MPI_COMM_WORLD); count = ThrowDarts(iterations); //Gather and sum results if(rank != 0) { MPI_Ssend(&count, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); } else { for(int i = 1; i < size; ++i) { int TempCount = 0; MPI_Recv(&TempCount, 1, MPI_INT, i, 0, MPI_COMM_WORLD, &s); count += TempCount; } } result = count; //MPI_Reduce(&count, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if(rank == 0) { printf("The value of PI is approximated to be: %16f", 4*((float)result/(float)(iterations*size))); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return 0; }
The following code sample uses Parallel Patterns Library (PPL) instead of OpenMP, and it uses the MPI collective operations instead of point-to-point operations.
// ParallelPI.cpp : Defines the entry point for the MPI application. // #include "mpi.h" #include "stdio.h" #include "stdlib.h" #include "limits.h" #include <ppl.h> #include <random> #include <time.h> using namespace Concurrency; int ThrowDarts(int iterations) { combinable<int> count; int result = 0; parallel_for(0, iterations, [&](int i){ std::tr1::uniform_real<double> MyRandom; double RandMax = MyRandom.max(); std::tr1::minstd_rand0 MyEngine; double x, y; MyEngine.seed((unsigned int)time(NULL)); x = MyRandom(MyEngine)/RandMax; y = MyRandom(MyEngine)/RandMax; if(x*x + y*y < 1.0) { count.local() += 1; } }); result = count.combine([](int left, int right) { return left + right; }); return result; } void main(int argc, char* argv[]) { int rank; int size; int iterations; int count; int result; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&size); if(rank == 0) { //Rank 0 reads the number of iterations from the command line. //50M iterations is the default. iterations = 50000000; if(argc > 1) { iterations = atoi(argv[argc-1]); } printf("Executing %d iterations on %d nodes.\n", iterations, size); fflush(stdout); } //Broadcast the number of iterations to execute. MPI_Bcast(&iterations, 1, MPI_INT, 0, MPI_COMM_WORLD); count = ThrowDarts(iterations); //Gather and sum results MPI_Reduce(&count, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if(rank == 0) { printf("The value of PI is approximated to be: %16f", 4*((double)result/(double)(iterations*size))); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }
On the File menu, click Save All.
On the Build menu, click Build ParallelPI.
Configure and launch the MPI Cluster Debugger
After building your application, you are ready to configure and launch the debugger. This section describes three options for debugging:
Debug one MPI process on the local computer
Debug multiple MPI processes on the local computer
Debug one or more MPI processes on a cluster
Important
MPI programs communicate through IP over ports. The first time you launch an MPI program, you may see a security warning from the firewall that indicates a port is being opened. Read the warning message and ensure that you understand the changes that you are making to your system. You must unblock the firewall to continue debugging on the local computer.
Note
In the MPI Cluster Debugger, you cannot start without debugging. Pressing Ctrl+F5 (or Start without debugging on the Debug menu) also starts debugging.
Debug one MPI process on the local computer
To debug on the local computer by using only one MPI process, use the same process that you would use to debug any other application. Set a break point at the desired location in your program and press F5 to start the debugger.
Debug multiple MPI processes on the local computer
The following procedure describes how to start a local debugging session for ParallelPI. ParallelPI accepts one argument that determines the number of iterations to run. The default is set to 50,000,000. The following procedure includes a step to reduce the iterations to 5,000.
To start the MPI Cluster Debugger with four MPI processes running on your local computer
In Solution Explorer, right-click Parallel PI, and then click Properties. This opens the Property Pages dialog box.
Expand Configuration Properties, and then select Debugging.
Under Debugger to launch, select MPI Cluster Debugger.
To reduce the iterations to 5,000: In Application Arguments, type 5000.
Click OK to save the changes and close the Property Pages.
On the Tools menu, click Cluster Debugger Configuration. This opens the Cluster Debugger Configuration pane.
In Cluster Debugger Configuration, specify the following properties:
In Cluster head node, select localhost.
In Number of processes, type 4.
Set a breakpoint within the body of the parallel
for
loop.Press F5 to launch the debugger.
Five console windows appear: one cmd.exe window, and four ParallelPI.exe windows (one for each process that you launched). The console window that corresponds to the rank 0 process indicates the number of iterations and the calculated approximation of pi.
On the Debug menu, click Windows, and then click Processes.
Set the active process for debugging by double-clicking a process in the Processes window.
Note
When you are debugging multiple processes, by default, a breakpoint affects all processes that are being debugged. To avoid breaking processes in unintended places, clear the Break all processes when one process breaks option. (In the Tools menu, click Options, then select Debugging). For more information about how to change break behavior, see Execution Control.
Debug one or more MPI processes on a cluster
When you launch the MPI Debugger on a cluster, the debugger submits your application to the cluster as a job. The Visual C runtimes that match your project (x86 or x64, and debug or release) must be present in the working directory on the compute nodes. If the correct runtimes are not already on the compute nodes, you need to include these in the debugger deployment by specifying the Additional Files to Deploy property.
The following procedure includes a step to deploy the OpenMP debug runtime DLL. By default, the C run-time (CRT) library is deployed when you launch the MPI Cluster Debugger. If the correct runtimes are not present, you will see side-by-side errors when you try to run your application. If the OpenMP runtime is not included, the breakpoints will not be hit.
To launch the MPI Debugger on a cluster
In Solution Explorer, right-click Parallel PI, and then click Properties. This opens the Property Pages dialog box.
Expand Configuration Properties, and then select Debugging.
Under Debugger to launch, select MPI Cluster Debugger.
Click OK to save changes and close Property Pages.
On the Tools menu, click Cluster Debugger Configuration. This opens the Cluster Debugger Configuration pane.
In Cluster Debugger Configuration, specify the following properties:
In the Cluster head node drop-down list, select the name of the head node for the cluster that you want to use.
The list of head nodes is populated from the Active Directory domain controller. Only clusters in your domain appear in the list. If you do not see your head node, type the name or the IPv4 address of the head node in the property field.
In Number of processes, type 4.
Expand Advanced Configurations.
In Execution\work directory, specify a local working directory on each compute node. For example, type the following, where <myUserName> is your user name:
C:\Users\<myUserName>\ParallelPI
If you are using the sample code with OpenMP, add the OpenMP debug runtime DLL file in the Cluster Debugger Configuration properties as follows:
In Advanced Configuration, in Additional Files to Deploy, select <Edit…>. This opens the File and Folder Selector dialog box.
Click Add File, navigate to Microsoft.VC90.DebugOpenMP\vcomp90d.dll, select the file, and then click Open.
For example, on an x86-based computer, the default location on a 64-bit edition of the Windows Server 2008 operating system is:
C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\redist\Debug_NonRedist\x86\Microsoft.VC90.DebugOpenMP\vcomp90d.dll
Click OK to add the file and close the File and Folder Selector dialog box.
Set a breakpoint within the body of the parallel
for
loop.Press F5 to launch the debugger.
Because you are submitting a job to the cluster, you are prompted to enter your password to connect to the cluster. Type your password, and then press ENTER.
After the debugger launches, look at the process window to verify the placement of the processes. For each process, look at the Transport Qualifier column to view the compute node on which the process is running.
Appendix: Files deployed by Visual Studio in addition to the application binaries (and CRT if requested)
DebuggerProxy.dll
DebuggerProxy.dll.manifest
Delete_from_workdir.bat: A script to delete the files that are deployed
Deploy_to_workdir.bat: A script to copy files from the Deployment Directory to the work directory
dbghelp.dll
mcee.dll
Mpishim.bat: A script to launch the remote debugger
Mpishim.exe: A program that orchestrates communication between the IDE and Msvsmon.exe
Msvsmon.exe: The remote debugger
Msvsmon.exe.config
PfxTaskProvider.dll
symsrv.dll
symsrv.yes
vbdebug.dll
1033\msdbgui.dll
1033\vbdebugui.dll
See Also
Using the MPI Cluster Debugger Add-In for Visual Studio 2008
Configuration Properties for the MPI Cluster Debugger
How to: Configure and Launch the MPI Cluster Debugger in Visual Studio 2008
Debugger Roadmap
mpiexec Command Reference