MPI fails when mixing Intel and AMD

Jeff Faust 0 Reputation points
2025-01-09T16:50:10.44+00:00

The following code works fine with only Intel or only AMD machines. When I launch from an Intel machine, with two worker nodes.

#include <iostream>
#include <mpi.h>

int main()
{
    int argc = 0;
    MPI_Init(&argc, nullptr);

    const int count = 100;
    for (int i = 0; i < count; ++i)
    {
        std::cout << " Attempting Barrier " << i + 1 << std::endl;
        MPI_Barrier(MPI_COMM_WORLD);
        std::cout << " Completed Barrier " << i + 1 << std::endl;
    }

    MPI_Finalize();
}

mpiexec-hosts 2 localhost amd_machine -wdir "\network\path" \path-to-exe

This fails consistently after loop 3, with the output:

[0] Attempting Barrier 1 [1] Attempting Barrier 1 [0] Completed Barrier 1 [0] Attempting Barrier 2 [1] Completed Barrier 1 [0] Completed Barrier 2 [1] Attempting Barrier 2 [0] Attempting Barrier 3 [0] Completed Barrier 3 [0] Attempting Barrier 4 [1] Completed Barrier 2 [1] Attempting Barrier 3 [1] Completed Barrier 3 [1] Attempting Barrier 4 job aborted: [ranks] message [0] terminated [1] fatal error Fatal error in MPI_Barrier: Other MPI error, error stack: MPI_Barrier(MPI_COMM_WORLD) failed A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (errno 10060)

Windows for business | Windows Client for IT Pros | Devices and deployment | Other
Developer technologies | C++
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.