Michael Di Domenico
2017-10-11 14:12:02 UTC
i'm seeing issues on a mellanox fdr10 cluster where the mpi setup and
teardown takes longer then i expect it should on larger rank count
jobs. i'm only trying to run ~1000 ranks and the startup time is over
a minute. i tested this with both openmpi and intel mpi, both exhibit
close to the same behavior.
has anyone else seen this or might know how to fix it? i expect ~1000
ranks to take sometime to setup, but it seems to be taking longer then
i think it should
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/l
teardown takes longer then i expect it should on larger rank count
jobs. i'm only trying to run ~1000 ranks and the startup time is over
a minute. i tested this with both openmpi and intel mpi, both exhibit
close to the same behavior.
has anyone else seen this or might know how to fix it? i expect ~1000
ranks to take sometime to setup, but it seems to be taking longer then
i think it should
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/l