David Mathog
2018-03-19 20:58:12 UTC
On one of our Centos 6.9 systems with a PERC H370 controller I just
noticed
that file system reads are quite slow. Like 30Mb/s slow. Anybody care
to hazard a guess what might be causing this situation? We have another
quite similar machine which is fast (A), compared to this (B) which is
slow:
A B
RAM 512 512 GB
CPUs 48 56 (via /proc/cpuinfo, actually this is threads)
Adapter H710P H730
RAID Level * * Primary-5, Secondary-0, RAID Level Qualifier-3
Size 7.275 9.093 TB
state * * Optimal
Drives 5 6
read rate 540 30 Mb/s (dd if=largefile bs=8192 of=/dev/null& ;
iotop)
sata disk ST2000NM0033
sas disk ST2000NM0023
patrol No No (megacli shows patrol read not going now)
ulimit -a on both is:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 2067196
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 60000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Nothing in the SMART values indicating a read problem, although on "B"
one disk is slowly accumulating events in the write x rereads/rewrites
measurement (it has 2346, accumulated at about 10 per week). The value
is 0 there for reads x rereads/rewrites. For "B" the smartctl output
columns are:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed
uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 934353848 0 0 934353848 0 48544.026 0
read: 2017672022 0 0 2017672022 0 48574.489 0
read: 2605398517 3 0 2605398520 3 48516.951 0
read: 3237457411 1 0 3237457412 1 48501.302 0
read: 2028103953 0 0 2028103953 0 14438.132 0
read: 197018276 0 0 197018276 0 48640.023 0
write: 0 0 0 0 0 26394.472 0
write: 0 0 2346 2346 2346 26541.534 0
write: 0 0 0 0 0 27549.205 0
write: 0 0 0 0 0 25779.557 0
write: 0 0 0 0 0 11266.293 0
write: 0 0 0 0 0 26465.227 0
verify: 341863005 0 0 341863005 0 241374.368 0
verify: 866033815 0 0 866033815 0 223849.660 0
verify: 2925377128 0 0 2925377128 0 221697.809 0
verify: 1911833396 6 0 1911833402 6 228054.383 0
verify: 192670736 0 0 192670736 0 66322.573 0
verify: 1181681503 0 0 1181681503 0 222556.693 0
If the process doing the IO is root it doesn't go any faster.
Oddly if on "B" a second dd process is started on another file it ALSO
reads at 30Mb/s. So the disk system then does a total of 60Gb/s, but
only 30Gb/s per process. Added a 3rd and a 4th process doing the same.
At the 4th it seemed to hit some sort of limit, with each process now
consistently less than 30Gb/s and the total at maybe 80Gb/s total. Hard
to say what the exact total was as it was jumping around like crazy. On
"A" 2 processes each got 270Mb/s,
and 3 180Mb/s. Didn't try 4.
The only oddness of late on "B" is that a few days ago it loaded too
many memory hungry processes so the OS killed some. I have had that
happen before on other systems without them doing anything odd
afterwards.
Any ideas what this slowdown might be?
Thanks,
David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.
noticed
that file system reads are quite slow. Like 30Mb/s slow. Anybody care
to hazard a guess what might be causing this situation? We have another
quite similar machine which is fast (A), compared to this (B) which is
slow:
A B
RAM 512 512 GB
CPUs 48 56 (via /proc/cpuinfo, actually this is threads)
Adapter H710P H730
RAID Level * * Primary-5, Secondary-0, RAID Level Qualifier-3
Size 7.275 9.093 TB
state * * Optimal
Drives 5 6
read rate 540 30 Mb/s (dd if=largefile bs=8192 of=/dev/null& ;
iotop)
sata disk ST2000NM0033
sas disk ST2000NM0023
patrol No No (megacli shows patrol read not going now)
ulimit -a on both is:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 2067196
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 60000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Nothing in the SMART values indicating a read problem, although on "B"
one disk is slowly accumulating events in the write x rereads/rewrites
measurement (it has 2346, accumulated at about 10 per week). The value
is 0 there for reads x rereads/rewrites. For "B" the smartctl output
columns are:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed
uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 934353848 0 0 934353848 0 48544.026 0
read: 2017672022 0 0 2017672022 0 48574.489 0
read: 2605398517 3 0 2605398520 3 48516.951 0
read: 3237457411 1 0 3237457412 1 48501.302 0
read: 2028103953 0 0 2028103953 0 14438.132 0
read: 197018276 0 0 197018276 0 48640.023 0
write: 0 0 0 0 0 26394.472 0
write: 0 0 2346 2346 2346 26541.534 0
write: 0 0 0 0 0 27549.205 0
write: 0 0 0 0 0 25779.557 0
write: 0 0 0 0 0 11266.293 0
write: 0 0 0 0 0 26465.227 0
verify: 341863005 0 0 341863005 0 241374.368 0
verify: 866033815 0 0 866033815 0 223849.660 0
verify: 2925377128 0 0 2925377128 0 221697.809 0
verify: 1911833396 6 0 1911833402 6 228054.383 0
verify: 192670736 0 0 192670736 0 66322.573 0
verify: 1181681503 0 0 1181681503 0 222556.693 0
If the process doing the IO is root it doesn't go any faster.
Oddly if on "B" a second dd process is started on another file it ALSO
reads at 30Mb/s. So the disk system then does a total of 60Gb/s, but
only 30Gb/s per process. Added a 3rd and a 4th process doing the same.
At the 4th it seemed to hit some sort of limit, with each process now
consistently less than 30Gb/s and the total at maybe 80Gb/s total. Hard
to say what the exact total was as it was jumping around like crazy. On
"A" 2 processes each got 270Mb/s,
and 3 180Mb/s. Didn't try 4.
The only oddness of late on "B" is that a few days ago it loaded too
many memory hungry processes so the OS killed some. I have had that
happen before on other systems without them doing anything odd
afterwards.
Any ideas what this slowdown might be?
Thanks,
David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.