slurmstepd: error: mpi/pmix_v5: _dmdx_req: owl1 [0]: pmixp_dmdx.c:319: Bad request from owl2: nspace "slurm.pmix.5244.0" has only 2 ranks, asked for -1 #44

Closed
opened 2024-03-15 11:12:51 +01:00 by rarias · 5 comments
rarias commented 2024-03-15 11:12:51 +01:00 (Migrated from pm.bsc.es)

With the new MPICH 4.2.0, PMIX 5.0.1 and SLURM 23.11.4.1 we get the following error when running the osu benchmarks:

hut% srun -N2 osu_bw
slurmstepd: error:  mpi/pmix_v5: _dmdx_req: owl1 [0]: pmixp_dmdx.c:319: Bad request from owl1: nspace "slurm.pmix.5245.0" has only 2 ranks, asked for -1
slurmstepd: error:  mpi/pmix_v5: _dmdx_req: owl1 [0]: pmixp_dmdx.c:319: Bad request from owl2: nspace "slurm.pmix.5245.0" has only 2 ranks, asked for -1

Running without MPI works fine:

hut% srun -N2 hostname
owl1
owl2
With the new MPICH 4.2.0, PMIX 5.0.1 and SLURM 23.11.4.1 we get the following error when running the osu benchmarks: ``` hut% srun -N2 osu_bw slurmstepd: error: mpi/pmix_v5: _dmdx_req: owl1 [0]: pmixp_dmdx.c:319: Bad request from owl1: nspace "slurm.pmix.5245.0" has only 2 ranks, asked for -1 slurmstepd: error: mpi/pmix_v5: _dmdx_req: owl1 [0]: pmixp_dmdx.c:319: Bad request from owl2: nspace "slurm.pmix.5245.0" has only 2 ranks, asked for -1 ``` Running without MPI works fine: ``` hut% srun -N2 hostname owl1 owl2 ```
rarias commented 2024-03-15 11:12:53 +01:00 (Migrated from pm.bsc.es)

assigned to @rarias

assigned to @rarias
rarias commented 2024-03-15 12:41:00 +01:00 (Migrated from pm.bsc.es)
Opened ticket in slurm https://bugs.schedmd.com/show_bug.cgi?id=19324 and pmix https://github.com/openpmix/openpmix/issues/3320.
rarias commented 2024-03-15 17:20:20 +01:00 (Migrated from pm.bsc.es)

Working with MPICH 4.1.3, broken on 4.2.0. Probably a bug in MPICH. Opening ticket there.

Working with MPICH 4.1.3, broken on 4.2.0. Probably a bug in MPICH. Opening ticket there.
rarias commented 2024-03-15 17:28:22 +01:00 (Migrated from pm.bsc.es)
https://github.com/pmodels/mpich/issues/6946
rarias commented 2024-03-18 11:33:44 +01:00 (Migrated from pm.bsc.es)

MPICH provided a workaround, closing.

MPICH provided a workaround, closing.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#44
No description provided.