Discussion:
[Beowulf] mpi alltoall help
Michael Di Domenico
2017-10-10 15:58:59 UTC
Permalink
i posted a copy of this to openmpi mailing list, but i'm curious if
anyone here can lend suggestions on troubleshooting

---

i'm getting stuck trying to run some fairly large IMB-MPI alltoall
tests under openmpi 2.0.2 on rhel 7.4

i have two different clusters, one running mellanox fdr10 and one
running qlogic qdr

if i issue

mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv

the job just stalls after the "List of Benchmarks to run: Alltoallv"
line outputs from IMB-MPI

if i switch it to alltoall the test does progress

often when running various size alltoall's i'll get

"too many retries sending message to <>:<>, giving up

i'm able to use infiniband just fine (our lustre filesystem mounts
over it) and i have other mpi programs running

it only seems to stem when i run alltoall type primitives

any thoughts on debugging where the failures are, i might just need to
turn up the debugging, but i'm not sure where
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.b
Christopher Samuel
2017-10-10 22:20:17 UTC
Permalink
Post by Michael Di Domenico
i'm getting stuck trying to run some fairly large IMB-MPI alltoall
tests under openmpi 2.0.2 on rhel 7.4
Did this work on RHEL 7.3?

I've heard rumours of issues with RHEL 7.4 and OFED.

cheers,
Chris
--
Christopher Samuel Senior Systems Administrator
Melbourne Bioinformatics - The University of Melbourne
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowul
Michael Di Domenico
2017-10-11 11:35:59 UTC
Permalink
On Tue, Oct 10, 2017 at 6:20 PM, Christopher Samuel
Post by Christopher Samuel
Post by Michael Di Domenico
i'm getting stuck trying to run some fairly large IMB-MPI alltoall
tests under openmpi 2.0.2 on rhel 7.4
Did this work on RHEL 7.3?
I've heard rumours of issues with RHEL 7.4 and OFED.
can't say. we moved from 7.3 to 7.4 pretty fast, i don't think i ever tested it
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailma
Loading...