[Beowulf] slow jobs when run through queue

Discussion:

Nick Evans

2017-12-06 00:44:24 UTC

HI All,

Just wondering if anyone else has encountered a similar problem or thoughts
on how they would try and track down the problem.

Queue = PBS / Moab combination

We have found that if we submit a job to the queue then it takes a long
time to process. ie. >4 hours
If we are to run the exact same processing directly on the compute node
then it is significantly faster < 1 hour.

We have tried with a number of different variations on the environment
variables / local and remote scratch disk and can't seem to find any reason
for the difference between submitting the jobs and just running the job on
the compute resource.

Any hints / recommendations appreciated

Thanks
Nick

Chris Samuel

2017-12-06 01:58:13 UTC

Permalink

Post by Nick Evans
We have found that if we submit a job to the queue then it takes a long
time to process. ie. >4 hours
If we are to run the exact same processing directly on the compute node
then it is significantly faster < 1 hour.

Some quick ideas

Are you comparing a job that has asked for all cores and all RAM with
it running directly on the node?

Try using "perf top" to get an idea of what's going on with the node
when doing the comparison runs, perhaps "perf record" too but I can
never remember if an unprivilged user can do that. That might shed
some light.

To me it sounds like it might be something that that checks how
many cores a node has naively and then starts that many threads/
processes and if the batch job only asks for a single core, or
less than all, then you might end up with a lot of contention.

Good luck!
Chris

--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www

Nick Evans

2017-12-06 05:47:42 UTC

Permalink

Thanks Brian / Carl / Chris for places to look.... it turned out to be what
Chris had mentioned and they were only requesting 1 CPU but trying to use
all 48 in the machine.

Resubmitted the request asking for all CPU's and the job ran in the
expected amount of time.

Thanks again
Nick

Post by Nick Evans
We have found that if we submit a job to the queue then it takes a long

Post by Nick Evans
time to process. ie. >4 hours
If we are to run the exact same processing directly on the compute node
then it is significantly faster < 1 hour.

Some quick ideas
Are you comparing a job that has asked for all cores and all RAM with
it running directly on the node?
Try using "perf top" to get an idea of what's going on with the node
when doing the comparison runs, perhaps "perf record" too but I can
never remember if an unprivilged user can do that. That might shed
some light.
To me it sounds like it might be something that that checks how
many cores a node has naively and then starts that many threads/
processes and if the batch job only asks for a single core, or
less than all, then you might end up with a lot of contention.
Good luck!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Chris Samuel

2017-12-06 06:27:54 UTC

Permalink

Post by Nick Evans
Thanks Brian / Carl / Chris for places to look.... it turned out to be what
Chris had mentioned and they were only requesting 1 CPU but trying to use
all 48 in the machine.

There's the handy "nproc" command which will tell you how many cores you can
actually use - it's cgroups aware so it won't just blindly report all the
cores in the host. It's also part of coreutils so you should be able to
rely on it being there.

Post by Nick Evans
Resubmitted the request asking for all CPU's and the job ran in the expected
amount of time.

Great news!

All the best,
Chris

--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/l

Peter Kjellström

2017-12-06 09:09:20 UTC

Permalink

On Wed, 6 Dec 2017 16:47:42 +1100

Post by Nick Evans
Thanks Brian / Carl / Chris for places to look.... it turned out to
be what Chris had mentioned and they were only requesting 1 CPU but
trying to use all 48 in the machine.

If you got only a 4x performance reduction when running a 48x over
subscribe on 1/48 the amount of cores then something is wrong with
your base line. That is, the "run without queue system" case seems
suspicsious.

/Peter K

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Chris Samuel

2017-12-06 10:39:20 UTC

Permalink

Post by Peter KjellstrÃ¶m
If you got only a 4x performance reduction when running a 48x over
subscribe on 1/48 the amount of cores then something is wrong with
your base line. That is, the "run without queue system" case seems
suspicsious.

If this is, as I suspect is likely, bioinformatics code it could well be that
it is a pipeline type application and only part of the application may be able
to make use of parallelism (and then might not be very good at it).

All the best,
Chris

--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http:

Peter Kjellström

2017-12-06 12:53:55 UTC

Permalink

On Wed, 06 Dec 2017 21:39:20 +1100

Post by Chris Samuel

If this is, as I suspect is likely, bioinformatics code it could well
be that it is a pipeline type application and only part of the
application may be able to make use of parallelism (and then might
not be very good at it).

Even more reason not to give it 48x more resource to waste then :-)

Probably wise to go back to a smaller set of resources and no over
subscribe and evaluate behavior.

/Peter K
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/be

Faraz Hussain

2017-12-06 14:07:10 UTC

Permalink

48 cores sounds like a lot. Perhaps hyper-threading is turned on? If
so, try running with 24 cpus to see if you get the same or better
performance than 48.

-FEACluster.com

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listin

David Mathog

2017-12-06 18:20:25 UTC

Permalink

Post by Chris Samuel
If this is, as I suspect is likely, bioinformatics code it could well
be that
it is a pipeline type application and only part of the application may
be able
to make use of parallelism (and then might not be very good at it).

Exactly. Super frustrating to set something like '--cpus=40' and then
watch the resulting heap of programs sit for long periods of time
(hours, not seconds) running only on a single CPU.

Regards,

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/ma

Tim Cutts

2017-12-06 18:35:11 UTC

Permalink

Of course, if you charge for your cluster time, that hurts them in the wallet, since they pay for all the allocated unused time. If you don’t charge (which is the case for us) it’s hard to incentivise them not to do this. Shame works, a bit. We publish cluster analytics showing CPU efficiency and memory efficiency league tables for the users, and that has had some good effects in the past...

Tim

Post by Chris Samuel
If this is, as I suspect is likely, bioinformatics code it could well be that
it is a pipeline type application and only part of the application may be able
to make use of parallelism (and then might not be very good at it).

Exactly. Super frustrating to set something like '--cpus=40' and then watch the resulting heap of programs sit for long periods of time (hours, not seconds) running only on a single CPU.
Regards,
David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinf

Nick Evans

2017-12-06 20:47:28 UTC

Permalink

Hi all,

Under normal circumstances I would agree that the request for all 48 cores
in the machine is overkill but this particular machine has a highly
specialised FPGA card in it to do most of the heavy lifting when running a
specific set of analysis that has been tuned to run with the card. It can
only run 1 job at a time and the node itself didn't implement the normal
central software mount by design so there isn't the temptation to run
normal jobs on it and block the use of the FPGA.

Nick

On 7 Dec 2017 5:35 AM, "Tim Cutts" <***@sanger.ac.uk> wrote:

Of course, if you charge for your cluster time, that hurts them in the
wallet, since they pay for all the allocated unused time. If you donât
charge (which is the case for us) itâs hard to incentivise them not to do
this. Shame works, a bit. We publish cluster analytics showing CPU
efficiency and memory efficiency league tables for the users, and that has
had some good effects in the past...

Tim

Post by David Mathog

Post by Chris Samuel
If this is, as I suspect is likely, bioinformatics code it could well be that
it is a pipeline type application and only part of the application may be able
to make use of parallelism (and then might not be very good at it).

Exactly. Super frustrating to set something like '--cpus=40' and then

watch the resulting heap of programs sit for long periods of time (hours,
not seconds) running only on a single CPU.

Post by David Mathog
Regards,
David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit

Peter Clapham

2017-12-19 16:41:44 UTC

Permalink

Show back of utilization and use patterns openly also removes admins from being “the Police”.

Instead each user of the system can see who is requesting excessive memory, using inappropriate queues or just inefficient workloads at scale. This creates a self-Policing environment and certainly both re-enforces a community feel and improves communication between the groups of users.
Pete

On 12/6/17, 6:36 PM, "Beowulf on behalf of Tim Cutts" <beowulf-***@beowulf.org on behalf of ***@sanger.ac.uk> wrote:

Of course, if you charge for your cluster time, that hurts them in the wallet, since they pay for all the allocated unused time. If you don’t charge (which is the case for us) it’s hard to incentivise them not to do this. Shame works, a bit. We publish cluster analytics showing CPU efficiency and memory efficiency league tables for the users, and that has had some good effects in the past...

Tim

Post by Chris Samuel
If this is, as I suspect is likely, bioinformatics code it could well be that
it is a pipeline type application and only part of the application may be able
to make use of parallelism (and then might not be very good at it).

Exactly. Super frustrating to set something like '--cpus=40' and then watch the resulting heap of programs sit for long periods of time (hours, not seconds) running only on a single CPU.
Regards,
David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Nick Evans

2017-12-20 05:37:36 UTC

Permalink

I completely agree. We have a web page where people can see

- where their jobs are running
- what sort of resources were requested
- the peak resources actually used
- wall time remaining (orange highlighted at 20% remaining and red at
10% remaining)

Post by Peter Clapham
Show back of utilization and use patterns openly also removes admins from
being âthe Policeâ.
Instead each user of the system can see who is requesting excessive
memory, using inappropriate queues or just inefficient workloads at scale.
This creates a self-Policing environment and certainly both re-enforces a
community feel and improves communication between the groups of users.
Pete
On 12/6/17, 6:36 PM, "Beowulf on behalf of Tim Cutts" <
Of course, if you charge for your cluster time, that hurts them in the
wallet, since they pay for all the allocated unused time. If you donât
charge (which is the case for us) itâs hard to incentivise them not to do
this. Shame works, a bit. We publish cluster analytics showing CPU
efficiency and memory efficiency league tables for the users, and that has
had some good effects in the past...
Tim

Post by David Mathog

Post by Chris Samuel
If this is, as I suspect is likely, bioinformatics code it could

well be that

Post by David Mathog

Post by Chris Samuel
it is a pipeline type application and only part of the application

may be able

Post by David Mathog

Post by Chris Samuel
to make use of parallelism (and then might not be very good at it).

Exactly. Super frustrating to set something like '--cpus=40' and

then watch the resulting heap of programs sit for long periods of time
(hours, not seconds) running only on a single CPU.

Post by David Mathog
Regards,
David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________

Computing

Post by David Mathog
To change your subscription (digest mode or unsubscribe) visit

http://www.beowulf.org/mailman/listinfo/beowulf
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Continue reading on narkive:

Search results for '[Beowulf] slow jobs when run through queue' (Questions and Answers)

replies

Why are Americans so incompetent and SLOW?

started 2016-08-02 10:01:25 UTC

food & drink

replies

Brown puts Brits at head of housing queue ! Bit late innit? There's NO bloody vacant houses left.?

started 2009-06-29 06:02:05 UTC

current events

replies

do you ever get fed up with slow checkout assistants?