Discussion:
[Beowulf] Slow RAID reads, no errors logged, why?
David Mathog
2018-03-19 20:58:12 UTC
Permalink
On one of our Centos 6.9 systems with a PERC H370 controller I just
noticed
that file system reads are quite slow. Like 30Mb/s slow. Anybody care
to hazard a guess what might be causing this situation? We have another
quite similar machine which is fast (A), compared to this (B) which is
slow:
A B
RAM 512 512 GB
CPUs 48 56 (via /proc/cpuinfo, actually this is threads)
Adapter H710P H730
RAID Level * * Primary-5, Secondary-0, RAID Level Qualifier-3
Size 7.275 9.093 TB
state * * Optimal
Drives 5 6
read rate 540 30 Mb/s (dd if=largefile bs=8192 of=/dev/null& ;
iotop)
sata disk ST2000NM0033
sas disk ST2000NM0023
patrol No No (megacli shows patrol read not going now)

ulimit -a on both is:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 2067196
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 60000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Nothing in the SMART values indicating a read problem, although on "B"
one disk is slowly accumulating events in the write x rereads/rewrites
measurement (it has 2346, accumulated at about 10 per week). The value
is 0 there for reads x rereads/rewrites. For "B" the smartctl output
columns are:

Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed
uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors

read: 934353848 0 0 934353848 0 48544.026 0
read: 2017672022 0 0 2017672022 0 48574.489 0
read: 2605398517 3 0 2605398520 3 48516.951 0
read: 3237457411 1 0 3237457412 1 48501.302 0
read: 2028103953 0 0 2028103953 0 14438.132 0
read: 197018276 0 0 197018276 0 48640.023 0

write: 0 0 0 0 0 26394.472 0
write: 0 0 2346 2346 2346 26541.534 0
write: 0 0 0 0 0 27549.205 0
write: 0 0 0 0 0 25779.557 0
write: 0 0 0 0 0 11266.293 0
write: 0 0 0 0 0 26465.227 0

verify: 341863005 0 0 341863005 0 241374.368 0
verify: 866033815 0 0 866033815 0 223849.660 0
verify: 2925377128 0 0 2925377128 0 221697.809 0
verify: 1911833396 6 0 1911833402 6 228054.383 0
verify: 192670736 0 0 192670736 0 66322.573 0
verify: 1181681503 0 0 1181681503 0 222556.693 0

If the process doing the IO is root it doesn't go any faster.

Oddly if on "B" a second dd process is started on another file it ALSO
reads at 30Mb/s. So the disk system then does a total of 60Gb/s, but
only 30Gb/s per process. Added a 3rd and a 4th process doing the same.
At the 4th it seemed to hit some sort of limit, with each process now
consistently less than 30Gb/s and the total at maybe 80Gb/s total. Hard
to say what the exact total was as it was jumping around like crazy. On
"A" 2 processes each got 270Mb/s,
and 3 180Mb/s. Didn't try 4.

The only oddness of late on "B" is that a few days ago it loaded too
many memory hungry processes so the OS killed some. I have had that
happen before on other systems without them doing anything odd
afterwards.

Any ideas what this slowdown might be?

Thanks,

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.
David Mathog
2018-03-19 21:03:07 UTC
Permalink
Post by David Mathog
The only oddness of late on "B" is that a few days ago it loaded too
many memory hungry processes so the OS killed some. I have had that
happen before on other systems without them doing anything odd
afterwards.
Sorry, hit return to soo.

The /var/log/messages entries associated with that showed OOM only
killed
some user processes, no system processes were removed.

Regards,

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman
Alex Chekholko via Beowulf
2018-03-19 21:21:57 UTC
Permalink
Normally I would suggest to do a diagnostic read dd from each disk, but you
may not be able to do that with your RAID controller since it hides the
individual disks.

My next recommendation would be a full AC cycle; can you power the host off
for a few minutes? It's a bit cargo cult-y but sometimes it works. It may
also help (or not) for you to spin around 3 times while the machine is off.
Post by David Mathog
Post by David Mathog
The only oddness of late on "B" is that a few days ago it loaded too
many memory hungry processes so the OS killed some. I have had that
happen before on other systems without them doing anything odd
afterwards.
Sorry, hit return to soo.
The /var/log/messages entries associated with that showed OOM only killed
some user processes, no system processes were removed.
Regards,
David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
David Mathog
2018-03-19 21:36:18 UTC
Permalink
Post by Alex Chekholko via Beowulf
Normally I would suggest to do a diagnostic read dd from each disk, but you
may not be able to do that with your RAID controller since it hides the
individual disks.
I run full smartctl scans on the disks once a week. In fact those were
running at the time (on both machines). Nothing turns up other than
what was posted.
Post by Alex Chekholko via Beowulf
My next recommendation would be a full AC cycle; can you power the host off
for a few minutes? It's a bit cargo cult-y but sometimes it works. It may
also help (or not) for you to spin around 3 times while the machine is off.
Did a reboot, but not a full AC cycle. There is a battery backup on the
RAID controller, so power down is not really ever power off for that
board. When it came back up the IO rate was unchanged. Perhaps a few
minutes with the power off might clear some stuck bits elsewhere in the
system.

Checked the speed on a third similar machine, H370P controller and again
SAS
disks. This one is all SEAGATE ST4000NM0005. It was as fast as the 'A'
machine (more or less.)

Thanks,

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listi
David Mathog
2018-03-19 23:50:51 UTC
Permalink
Found the problem. Well, sort of.

The issue is that when a long SMART test runs on any disk on the system
(A) which has this problem the IO goes down to 30Mb/s. It doesn't
matter which disk is running the test. The system we have which is most
like it (C) does not have this issue.

A C
Centos 6.7 6.9
RAM 512 512 Gb
CPUs 56 40 (actually threads)
PowerEdge T630 T630
Xeon E5-2695 E5-2650 (both v3)
speed 2.30GHz 2.30Ghz
cpufreq? yes no
PERC H730 H730P
SAS disk ST2000NM0023
SAS disk ST4000NM0005

There are a bunch of small differences between the two systems so it is
hard to say for sure which is the actual culprit.

I will put this out on the smartmontools list and see if anybody has
seen it before.

Regards,

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowul
Jörg Saßmannshausen
2018-03-20 00:55:01 UTC
Permalink
Hi David,

I am wondering if that is not what you would expect.
The long SMART test is quite thorough so the disc will be subject to quite a
bit of stress. Thus, it could well be that the controller is slowing down the
rest of the RAID as there is a bottle neck. So from that angle it does make
sense to me.
This is, however, just speculation from my side.

All the best from a chilly London

Jörg
Post by David Mathog
Found the problem. Well, sort of.
The issue is that when a long SMART test runs on any disk on the system
(A) which has this problem the IO goes down to 30Mb/s. It doesn't
matter which disk is running the test. The system we have which is most
like it (C) does not have this issue.
A C
Centos 6.7 6.9
RAM 512 512 Gb
CPUs 56 40 (actually threads)
PowerEdge T630 T630
Xeon E5-2695 E5-2650 (both v3)
speed 2.30GHz 2.30Ghz
cpufreq? yes no
PERC H730 H730P
SAS disk ST2000NM0023
SAS disk ST4000NM0005
There are a bunch of small differences between the two systems so it is
hard to say for sure which is the actual culprit.
I will put this out on the smartmontools list and see if anybody has
seen it before.
Regards,
David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org
Chris Samuel
2018-03-20 05:44:48 UTC
Permalink
Post by Jörg Saßmannshausen
Thus, it could well be that the controller is slowing down the
rest of the RAID as there is a bottle neck.
As David is testing reading from a RAID-5 array then (IIRC) it needs to read
the stripe from all active drives to retrieve the data, and if one drive is
slower than the rest then the controller will end up waiting for that drive.
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowu
Jörg Saßmannshausen
2018-03-20 09:26:26 UTC
Permalink
Hi Chris,
Post by Chris Samuel
Post by Jörg Saßmannshausen
Thus, it could well be that the controller is slowing down the
rest of the RAID as there is a bottle neck.
As David is testing reading from a RAID-5 array then (IIRC) it needs to read
the stripe from all active drives to retrieve the data, and if one drive is
slower than the rest then the controller will end up waiting for that drive.
Well, that is basically just confirming my suspicion here. :-)

All the best

Jörg

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/
Skylar Thompson
2018-03-20 00:19:52 UTC
Permalink
Could it be a patrol read, possibly hitting a marginal disk? We've run into
this on some of our Dell systems, and exporting the RAID HBA logs reveals
what's going on. You can see those with "omconfig storage controller
controller=n action=exportlog" (exports logs in /var/log/lsi_mmdd.log) or
an equivalent MegaCLI command that I can't remember right now. We had a
rash of these problems, along with uncaught media errors (probably a
combination disk/firmware bug), so we ended up sending these logs to
Splunk, but if it's a one-off thing it's pretty easy to spot visually too.

Skylar
On one of our Centos 6.9 systems with a PERC H370 controller I just noticed
that file system reads are quite slow. Like 30Mb/s slow. Anybody care to
hazard a guess what might be causing this situation? We have another quite
A B
RAM 512 512 GB
CPUs 48 56 (via /proc/cpuinfo, actually this is threads)
Adapter H710P H730
RAID Level * * Primary-5, Secondary-0, RAID Level Qualifier-3
Size 7.275 9.093 TB
state * * Optimal
Drives 5 6
read rate 540 30 Mb/s (dd if=largefile bs=8192 of=/dev/null& ;
iotop)
sata disk ST2000NM0033
sas disk ST2000NM0023
patrol No No (megacli shows patrol read not going now)
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 2067196
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 60000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Nothing in the SMART values indicating a read problem, although on "B"
one disk is slowly accumulating events in the write x rereads/rewrites
measurement (it has 2346, accumulated at about 10 per week). The value is
0 there for reads x rereads/rewrites. For "B" the smartctl output columns
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed
uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 934353848 0 0 934353848 0 48544.026 0
read: 2017672022 0 0 2017672022 0 48574.489 0
read: 2605398517 3 0 2605398520 3 48516.951 0
read: 3237457411 1 0 3237457412 1 48501.302 0
read: 2028103953 0 0 2028103953 0 14438.132 0
read: 197018276 0 0 197018276 0 48640.023 0
write: 0 0 0 0 0 26394.472 0
write: 0 0 2346 2346 2346 26541.534 0
write: 0 0 0 0 0 27549.205 0
write: 0 0 0 0 0 25779.557 0
write: 0 0 0 0 0 11266.293 0
write: 0 0 0 0 0 26465.227 0
verify: 341863005 0 0 341863005 0 241374.368 0
verify: 866033815 0 0 866033815 0 223849.660 0
verify: 2925377128 0 0 2925377128 0 221697.809 0
verify: 1911833396 6 0 1911833402 6 228054.383 0
verify: 192670736 0 0 192670736 0 66322.573 0
verify: 1181681503 0 0 1181681503 0 222556.693 0
If the process doing the IO is root it doesn't go any faster.
Oddly if on "B" a second dd process is started on another file it ALSO
reads at 30Mb/s. So the disk system then does a total of 60Gb/s, but only
30Gb/s per process. Added a 3rd and a 4th process doing the same. At the
4th it seemed to hit some sort of limit, with each process now consistently
less than 30Gb/s and the total at maybe 80Gb/s total. Hard to say what the
exact total was as it was jumping around like crazy. On "A" 2 processes
each got 270Mb/s,
and 3 180Mb/s. Didn't try 4.
The only oddness of late on "B" is that a few days ago it loaded too many
memory hungry processes so the OS killed some. I have had that happen
before on other systems without them doing anything odd afterwards.
Any ideas what this slowdown might be?
Thanks,
David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
David Mathog
2018-03-21 16:21:46 UTC
Permalink
Post by Jörg Saßmannshausen
I am wondering if that is not what you would expect.
The long SMART test is quite thorough so the disc will be subject to
quite a
bit of stress. Thus, it could well be that the controller is slowing
down the
rest of the RAID as there is a bottle neck.
Long SMART tests are run on active RAID systems here all the time, this
is the only machine so far where this massive slow down has been noted.
The others might slow down a few percent, but a drop in the read rate
from ~400Mb/s to 30Mb/s is just horrible. I'm wondering if the disks on
this one system (the only machine here with this model disk) disable
readahead or command queuing or something of that sort when a long SMART
test runs. However, no change in the read ahead value is visible with
"blockdev --report" with and without that SMART test running. The disks
are 2Tb 6.0Gb/s SAS with 128Mb cache and 4.16ms latency.

Regards,

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe
Jonathan Engwall
2018-03-22 02:11:15 UTC
Permalink
I am curious about such things.
The DELL specs show the 710 can handle 32 devices, while the 730 can handle
255! The
pdf spec sheet also says the h730 has "workload profiles." If you are
nowhere near 255 disks perhaps you can change the controller?
Post by Jörg Saßmannshausen
I am wondering if that is not what you would expect.
Post by Jörg Saßmannshausen
The long SMART test is quite thorough so the disc will be subject to
quite a
bit of stress. Thus, it could well be that the controller is slowing down
the
rest of the RAID as there is a bottle neck.
Long SMART tests are run on active RAID systems here all the time, this is
the only machine so far where this massive slow down has been noted. The
others might slow down a few percent, but a drop in the read rate from
~400Mb/s to 30Mb/s is just horrible. I'm wondering if the disks on this
one system (the only machine here with this model disk) disable readahead
or command queuing or something of that sort when a long SMART test runs.
However, no change in the read ahead value is visible with "blockdev
--report" with and without that SMART test running. The disks are 2Tb
6.0Gb/s SAS with 128Mb cache and 4.16ms latency.
Regards,
David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
David Mathog
2018-03-22 16:02:20 UTC
Permalink
Post by Jonathan Engwall
I am curious about such things.
The DELL specs show the 710 can handle 32 devices, while the 730 can handle
255! The
pdf spec sheet also says the h730 has "workload profiles." If you are
nowhere near 255 disks perhaps you can change the controller?
The box only has 8 drive slots, so there really isn't much difference
between 8 and 255
in practical terms. Well, maybe if there is some way to attach external
storage. The spec for both the 710 and 730 say the controller has "8
internal ports", which for all I know map 1:1 with those slots.

My best guess is that the problem is in the disk firmware, so changing
the controller most likely would not help with the SMART issue.

Regards,

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe

Continue reading on narkive:
Loading...