Hello Ben !
Thank you for posting in Microsoft Learn.
In your case, I think you need a mechanism to detect available cores and spawn additional processes accordingly. This is a challenging problem because standard MPI job schedulers (for example Microsoft HPC Job Scheduler, SLURM, OpenPBS...) allocate resources at job submission, and dynamically increasing cores during execution is not natively supported.
Your MPI master process needs to periodically check how many cores are available in the system. You can do this by:
- Querying the HPC Job Scheduler (
job list /all /format:list
in Windows) - Using system commands like
wmic cpu get NumberOfCores
- Running a Python script inside the job to check available resources (
psutil
module in Python)
For example, in Python:
import psutil
def get_available_cores():
return psutil.cpu_count(logical=False) # Returns the number of physical cores
or in PowerShell:
Get-WMIObject Win32_Processor | Select-Object -ExpandProperty NumberOfCores
Once the master detects additional available cores, it can spawn new child processes dynamically using MPI_Comm_spawn
. Example in C:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
int world_rank, world_size;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
if (world_rank == 0) {
// Master Process: Check available cores and spawn new workers
int additional_cores = 10; // Dynamically fetch available cores
MPI_Comm intercomm;
char *worker_program = "worker.exe";
MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, additional_cores,
MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);
}
printf("Process %d out of %d\n", world_rank, world_size);
MPI_Finalize();
return 0;
}
One issue with MPI_Comm_spawn
is that newly spawned processes belong to different communicators. To solve this:
- Use Inter-communicators (
MPI_Intercomm_merge
) to merge spawned processes intoMPI_COMM_WORLD
. - Use MPI Publish-Subscribe (
MPI_Open_port, MPI_Comm_connect
) for communication.
Example:
MPI_Comm merged_comm;
MPI_Intercomm_merge(intercomm, 1, &merged_comm);