Discussion:
[Beowulf] cold spare storage?
mathog
2017-08-17 17:43:33 UTC
Permalink
(Originally posted here:

https://stackoverflow.com/questions/45719853/enterprise-spare-drives-better-on-shelf-or-spun-down-in-enclosure

but nobody has answered.)

Hi all,

Some Dell servers I recently started managing have spare disks in their
array enclosures. megacli showed the spares as:

Firmware state: Online, Spun Up

so they are not configured as hot spares. Effectively these are cold
spares, but ones whose working lifetime is being wasted spinning for no
good reason. They have now been spun down with:

megacli -PDPrpRmv -PhysDrv[32:5] -a0

which is better for their longevity, I suppose.

The question is: would these spares last longer if they were pulled and
stored in an antistatic bag in a drawer? Even though they are no longer
spinning they are still in a warm environment and receiving the
vibrations from the other disks and fans in these servers. I only found
some articles about storing drives with data on them, but that isn't the
case here, they are blank. Has anybody published actual data on this
issue (as opposed to just their opinion)?

Thanks.

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit h
Alex Chekholko
2017-08-17 18:10:02 UTC
Permalink
The Google paper from a few years ago showed essentially no correlations
between the things you ask about and failure rates. So... do whatever is
most convenient for you.
Post by mathog
https://stackoverflow.com/questions/45719853/enterprise-spare-drives-better-on-shelf-or-spun-down-in-enclosure
but nobody has answered.)
Hi all,
Some Dell servers I recently started managing have spare disks in their
Firmware state: Online, Spun Up
so they are not configured as hot spares. Effectively these are cold
spares, but ones whose working lifetime is being wasted spinning for no
megacli -PDPrpRmv -PhysDrv[32:5] -a0
which is better for their longevity, I suppose.
The question is: would these spares last longer if they were pulled and
stored in an antistatic bag in a drawer? Even though they are no longer
spinning they are still in a warm environment and receiving the
vibrations from the other disks and fans in these servers. I only found
some articles about storing drives with data on them, but that isn't the
case here, they are blank. Has anybody published actual data on this
issue (as opposed to just their opinion)?
Thanks.
David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
mathog
2017-08-17 18:54:29 UTC
Permalink
Post by Alex Chekholko
The Google paper from a few years ago showed essentially no
correlations
between the things you ask about and failure rates. So... do whatever is
most convenient for you.
This one?

http://research.google.com/archive/disk_failures.pdf

They didn't do a control where they put some drives on a shelf and then
tested
them later. Nor did they (as far as I can tell) do a control with
installed disks powered on but not spun up. Every disk they tested was
fully "live". It is true that they didn't see any big difference based
on usage, temperature, or vibration (to the limited extent they could
measure this).

Also that study was published in 2007 so the 5 year failure rates are
for disks which were made in 2001 or 2002. That is a long, long time
ago in terms of disk technology and density. I'm not even sure that I
believe their results from 10 years ago are still fully applicable to
current disks.

Regards,

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beow
Benson Muite
2017-08-17 21:26:07 UTC
Permalink
Post by mathog
Post by Alex Chekholko
The Google paper from a few years ago showed essentially no correlations
between the things you ask about and failure rates. So... do whatever is
most convenient for you.
This one?
http://research.google.com/archive/disk_failures.pdf
They didn't do a control where they put some drives on a shelf and then
tested
them later. Nor did they (as far as I can tell) do a control with
installed disks powered on but not spun up. Every disk they tested was
fully "live". It is true that they didn't see any big difference based
on usage, temperature, or vibration (to the limited extent they could
measure this).
Also that study was published in 2007 so the 5 year failure rates are
for disks which were made in 2001 or 2002. That is a long, long time
ago in terms of disk technology and density. I'm not even sure that I
believe their results from 10 years ago are still fully applicable to
current disks.
Regards,
David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Related study at:
http://www.cs.toronto.edu/~bianca/papers/fast07.pdf

Computer failure data repository seems to be not so active
(https://www.usenix.org/cfdr)

May suggest some of these issues be considered in VI4IO (might also get
an answer to this question there as well) -
https://www.vi4io.org/contribute/start

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/be
Bill Broadley via Beowulf
2017-08-18 00:00:31 UTC
Permalink
The Google paper from a few years ago showed essentially no correlations between
the things you ask about and failure rates. So... do whatever is most
convenient for you.
Backblaze also has a pretty large data set, granted not as big as google.
Backblaze has been MUCH more transparent about what was measured, show some
useful correlations, and has been regularly updating the data. They even *gasp*
mention brands and models.

https://www.backblaze.com/blog/hard-drive-failure-rates-q1-2017/

Their post on reliability of smart data:
https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/

A quote:
That means that 23.3% of failed drives showed no warning from the SMART stats
we record. Are these stats useful? I’ll let you decide if you’d like to have a
sign of impending drive failure 76.7% of the time. But before you decide, read
on.




_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.
Continue reading on narkive:
Loading...