Michael Di Domenico
2017-10-10 15:58:59 UTC
i posted a copy of this to openmpi mailing list, but i'm curious if
anyone here can lend suggestions on troubleshooting
---
i'm getting stuck trying to run some fairly large IMB-MPI alltoall
tests under openmpi 2.0.2 on rhel 7.4
i have two different clusters, one running mellanox fdr10 and one
running qlogic qdr
if i issue
mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv
the job just stalls after the "List of Benchmarks to run: Alltoallv"
line outputs from IMB-MPI
if i switch it to alltoall the test does progress
often when running various size alltoall's i'll get
"too many retries sending message to <>:<>, giving up
i'm able to use infiniband just fine (our lustre filesystem mounts
over it) and i have other mpi programs running
it only seems to stem when i run alltoall type primitives
any thoughts on debugging where the failures are, i might just need to
turn up the debugging, but i'm not sure where
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.b
anyone here can lend suggestions on troubleshooting
---
i'm getting stuck trying to run some fairly large IMB-MPI alltoall
tests under openmpi 2.0.2 on rhel 7.4
i have two different clusters, one running mellanox fdr10 and one
running qlogic qdr
if i issue
mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv
the job just stalls after the "List of Benchmarks to run: Alltoallv"
line outputs from IMB-MPI
if i switch it to alltoall the test does progress
often when running various size alltoall's i'll get
"too many retries sending message to <>:<>, giving up
i'm able to use infiniband just fine (our lustre filesystem mounts
over it) and i have other mpi programs running
it only seems to stem when i run alltoall type primitives
any thoughts on debugging where the failures are, i might just need to
turn up the debugging, but i'm not sure where
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.b