MPI and compiler problems

I have an MPI program that runs on one node but not on multiple nodes. I have left that program and moved to this incredibly simple one I found here:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
 
/**
 * @brief Illustrates how to use an MPI barrier.
 **/
int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);
 
    // Get my rank
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
 
    printf("[MPI process %d] I start waiting on the barrier.n", my_rank);
    MPI_Barrier(MPI_COMM_WORLD);
    printf("[MPI process %d] I know all MPI processes have waited on the barrier.n", my_rank);
 
    MPI_Finalize();
 
    return EXIT_SUCCESS;
}

I compile the code using mpicc main.c -o run_me and this produces no errors. I run this code on the head node using mpirun -n 2 ./run_me and it produces the following:

[MPI process 1] I start waiting on the barrier.
[MPI process 0] I start waiting on the barrier.
[MPI process 0] I know all MPI processes have waited on the barrier.
[MPI process 1] I know all MPI processes have waited on the barrier.

Now, when I run the code using SLURM with the following config:

#!/bin/bash

#SBATCH --job-name=hellompi
#SBATCH --output=hellompi.out
#SBATCH --ntasks=6
#SBATCH --partition=XXXXXXX

mpirun ./run_me

It produces the following:

[MPI process 3] I start waiting on the barrier.
[MPI process 4] I start waiting on the barrier.
[MPI process 5] I start waiting on the barrier.
[MPI process 0] I start waiting on the barrier.
[MPI process 1] I start waiting on the barrier.
Fatal error in PMPI_Barrier: Unknown error class, error stack:
PMPI_Barrier(289).....................: MPI_Barrier(comm=MPI_COMM_WORLD) failed
PMPI_Barrier(275).....................:
MPIR_Barrier_impl(175)................:
MPIR_Barrier_intra_auto(110)..........:
MPIR_Barrier_intra_smp(43)............:
MPIR_Barrier_impl(175)................:
MPIR_Barrier_intra_auto(110)..........:
MPIR_Barrier_intra_dissemination(49)..:
MPIDU_Complete_posted_with_error(1137): Process failed
MPIR_Barrier_intra_smp(59)............:
MPIR_Bcast_impl(310)..................:
MPIR_Bcast_intra_auto(223)............:
MPIR_Bcast_intra_binomial(182)........: Failure during collective
[MPI process 2] I start waiting on the barrier.
Fatal error in PMPI_Barrier: Unknown error class, error stack:
PMPI_Barrier(289).............: MPI_Barrier(comm=MPI_COMM_WORLD) failed
PMPI_Barrier(275).............:
MPIR_Barrier_impl(175)........:
MPIR_Barrier_intra_auto(110)..:
MPIR_Barrier_intra_smp(59)....:
MPIR_Bcast_impl(310)..........:
MPIR_Bcast_intra_auto(223)....:
MPIR_Bcast_intra_binomial(182): Failure during collective
Fatal error in PMPI_Barrier: Unknown error class, error stack:
PMPI_Barrier(289).....................: MPI_Barrier(comm=MPI_COMM_WORLD) failed
PMPI_Barrier(275).....................:
MPIR_Barrier_impl(175)................:
MPIR_Barrier_intra_auto(110)..........:
MPIR_Barrier_intra_smp(43)............:
MPIR_Barrier_impl(175)................:
MPIR_Barrier_intra_auto(110)..........:
MPIR_Barrier_intra_dissemination(49)..:
MPIDU_Complete_posted_with_error(1137): Process failed
MPIR_Barrier_intra_smp(59)............:
MPIR_Bcast_impl(310)..................:
MPIR_Bcast_intra_auto(223)............:
MPIR_Bcast_intra_binomial(182)........: Failure during collective
Fatal error in PMPI_Barrier: Unknown error class, error stack:
PMPI_Barrier(289).............: MPI_Barrier(comm=MPI_COMM_WORLD) failed
PMPI_Barrier(275).............:
MPIR_Barrier_impl(175)........:
MPIR_Barrier_intra_auto(110)..:
MPIR_Barrier_intra_smp(59)....:
MPIR_Bcast_impl(310)..........:
MPIR_Bcast_intra_auto(223)....:
MPIR_Bcast_intra_binomial(182): Failure during collective
Fatal error in PMPI_Barrier: Unknown error class, error stack:
PMPI_Barrier(289).............: MPI_Barrier(comm=MPI_COMM_WORLD) failed
PMPI_Barrier(275).............:
MPIR_Barrier_impl(175)........:
MPIR_Barrier_intra_auto(110)..:
MPIR_Barrier_intra_smp(59)....:
MPIR_Bcast_impl(310)..........:
MPIR_Bcast_intra_auto(223)....:
MPIR_Bcast_intra_binomial(182): Failure during collective
Fatal error in PMPI_Barrier: Unknown error class, error stack:
PMPI_Barrier(289).............: MPI_Barrier(comm=MPI_COMM_WORLD) failed
PMPI_Barrier(275).............:
MPIR_Barrier_impl(175)........:
MPIR_Barrier_intra_auto(110)..:
MPIR_Barrier_intra_smp(59)....:
MPIR_Bcast_impl(310)..........:
MPIR_Bcast_intra_auto(223)....:
MPIR_Bcast_intra_binomial(182): Failure during collective

The version details for mpirun:

HYDRA build details:
    Version:                                 3.3.2
    Release Date:                            Tue Nov 12 21:23:16 CST 2019
    CC:                              gcc   -Wl,-Bsymbolic-functions -Wl,-z,relro
    CXX:                             g++   -Wl,-Bsymbolic-functions -Wl,-z,relro
    F77:                             f77  -Wl,-Bsymbolic-functions -Wl,-z,relro
    F90:                             f95  -Wl,-Bsymbolic-functions -Wl,-z,relro
    Configure options:                       '--disable-option-checking' '--prefix=/usr' '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--with-libfabric' '--enable-shared' '--enable-fortran=all' '--disable-rpath' '--disable-wrapper-rpath' '--sysconfdir=/etc/mpich' '--libdir=/usr/lib/x86_64-linux-gnu' '--includedir=/usr/include/x86_64-linux-gnu/mpich' '--docdir=/usr/share/doc/mpich' 'CPPFLAGS= -Wdate-time -D_FORTIFY_SOURCE=2 -I/build/mpich-VeuB8Z/mpich-3.3.2/src/mpl/include -I/build/mpich-VeuB8Z/mpich-3.3.2/src/mpl/include -I/build/mpich-VeuB8Z/mpich-3.3.2/src/openpa/src -I/build/mpich-VeuB8Z/mpich-3.3.2/src/openpa/src -D_REENTRANT -I/build/mpich-VeuB8Z/mpich-3.3.2/src/mpi/romio/include' 'CFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -O2' 'CXXFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -O2' 'FFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -O2' 'FCFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -cpp -O2' 'BASH_SHELL=/bin/bash' 'build_alias=x86_64-linux-gnu' 'MPICHLIB_CFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -Wformat -Werror=format-security' 'MPICHLIB_CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'MPICHLIB_CXXFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -Wformat -Werror=format-security' 'MPICHLIB_FFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong' 'MPICHLIB_FCFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -cpp' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro' 'FC=f95' 'F77=f77' 'MPILIBNAME=mpich' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'LIBS=' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Checkpointing libraries available:
    Demux engines available:                 poll select

Version details for GCC/G++:

gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

I’m honestly about to pull my hair out 🙁 Is there anyone who has some advice for me as to why the code will not run on multiple nodes?

Leave a Comment