[Beowulf] Lustre Upgrades

Discussion:

Paul Edmon

2018-07-23 15:59:31 UTC

We have some old large scale Lustre installs that are running 2.6.34 and
we want to get these up to the latest version of Lustre. I was curious
if people in this group have any experience with doing this and if they
could share them. How do you handle upgrades like this? How much time
does it take? What are the pitfalls? How do you manage it with minimal
customer interruption? Should we just write off upgrading and stand up
new servers that are on the correct version (in which case we need to
transfer the several PB's worth of data over to the new system)?

Thanks for your wisdom.

-Paul Edmon-

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe)

Michael Di Domenico

2018-07-23 16:51:04 UTC

Permalink

Should we just write off upgrading and stand up new servers
that are on the correct version (in which case we need to transfer the
several PB's worth of data over to the new system)?

if you can afford the hardware and the time for the copy, this would
certainly be the best option... :)

I've always done it that way as well. Lustre can be a scary upgrade
and I've generally found that by the time I'm ready to update the
machines the hardware has been abused for 2 or 3 years anyhow, so
swapping out the hardware that's supporting a filesystem generally
seemed like a good todo, certainly not a necessity though.

my understanding (i haven't done it yet) is that the later versions of
lustre >2.5 using zfs the upgrades have become more of a regular thing
with much less concern.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/li

Jeff Johnson

2018-07-23 17:00:36 UTC

Permalink

Paul,

2.6.34 is a kernel version. What version of Lustre are you at now? Some
updates are easier than others.

--Jeff

Post by Paul Edmon
We have some old large scale Lustre installs that are running 2.6.34 and
we want to get these up to the latest version of Lustre. I was curious if
people in this group have any experience with doing this and if they could
share them. How do you handle upgrades like this? How much time does it
take? What are the pitfalls? How do you manage it with minimal customer
interruption? Should we just write off upgrading and stand up new servers
that are on the correct version (in which case we need to transfer the
several PB's worth of data over to the new system)?
Thanks for your wisdom.
-Paul Edmon-
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

***@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001 f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage

Paul Edmon

2018-07-23 17:05:21 UTC

Permalink

My apologies I meant 2.5.34 not 2.6.34.Â We'd like to get up to 2.10.4
which is what our clients are running.Â Recently we upgraded our cluster
to CentOS7 which necessitated the client upgrade.Â Our storage servers
though stayed behind on 2.5.34.

-Paul Edmon-

Post by Jeff Johnson
Paul,
2.6.34 is a kernel version. What version of Lustre are you at now?
Some updates are easier than others.
--Jeff
We have some old large scale Lustre installs that are running
2.6.34 and we want to get these up to the latest version of
Lustre.Â I was curious if people in this group have any experience
with doing this and if they could share them. How do you handle
upgrades like this?Â How much time does it take?Â What are the
pitfalls?Â How do you manage it with minimal customer
interruption? Should we just write off upgrading and stand up new
servers that are on the correct version (in which case we need to
transfer the several PB's worth of data over to the new system)?
Thanks for your wisdom.
-Paul Edmon-
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 Â f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Jeff Johnson

2018-07-23 17:18:20 UTC

Permalink

You're running 2.10.4 clients against 2.5.34 servers? I believe there are
notable lnet attrs that don't exist in 2.5.34. Maybe a Whamcloud wiz might
chime in but I think that version mismatch might be problematic.

You can do a testbed upgrade to test taking a ldiskfs volume from 2.5.34 to
2.10.4, just to be conservative.

--Jeff

My apologies I meant 2.5.34 not 2.6.34. We'd like to get up to 2.10.4
which is what our clients are running. Recently we upgraded our cluster to
CentOS7 which necessitated the client upgrade. Our storage servers though
stayed behind on 2.5.34.
-Paul Edmon-
Paul,
2.6.34 is a kernel version. What version of Lustre are you at now? Some
updates are easier than others.
--Jeff

--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing
www.aeoncomputing.com
t: 858-412-3810 x1001 f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Paul Edmon

2018-07-23 17:34:57 UTC

Permalink

Yeah we've found out firsthand that its problematic as we have been
seeing issues :).Â Hence the urge to upgrade.

We've begun exploring this but we wanted to reach out to other people
who may have gone through the same thing to get their thoughts.Â We also
need to figure out how significant an outage this will be.Â As if it
takes a day or two of full outage to do the upgrade that is more
acceptable than a week.Â We also wanted to know if people had
experienced data loss/corruption in the process and any other kinks.

We were planning on playing around on VM's to test the upgrade path
before committing to upgrading our larger systems.Â One of the questions
we had though was if we needed to run e2fsck before/after the upgrade as
that could add significant time to the outage for that to complete.

-Paul Edmon-

Post by Jeff Johnson
You're running 2.10.4 clients against 2.5.34 servers? I believe there
are notable lnet attrs that don't exist in 2.5.34. Maybe a Whamcloud
wiz might chime in but I think that version mismatch might be
problematic.
You can do a testbed upgrade to test taking a ldiskfs volume from
2.5.34 to 2.10.4, just to be conservative.
--Jeff
My apologies I meant 2.5.34 not 2.6.34.Â We'd like to get up to
2.10.4 which is what our clients are running. Recently we upgraded
our cluster to CentOS7 which necessitated the client upgrade.Â Our
storage servers though stayed behind on 2.5.34.
-Paul Edmon-

Post by Jeff Johnson
Paul,
2.6.34 is a kernel version. What version of Lustre are you at
now? Some updates are easier than others.
--Jeff
On Mon, Jul 23, 2018 at 8:59 AM, Paul Edmon
We have some old large scale Lustre installs that are running
2.6.34 and we want to get these up to the latest version of
Lustre.Â I was curious if people in this group have any
experience with doing this and if they could share them.Â How
do you handle upgrades like this?Â How much time does it
take?Â What are the pitfalls?Â How do you manage it with
minimal customer interruption? Should we just write off
upgrading and stand up new servers that are on the correct
version (in which case we need to transfer the several PB's
worth of data over to the new system)?
Thanks for your wisdom.
-Paul Edmon-
_______________________________________________
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 Â f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>

_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 Â f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Jeff Johnson

2018-07-23 17:58:20 UTC

Permalink

Paul,

How big are your ldiskfs volumes? What type of underlying hardware are
they? Running e2fsck (ldiskfs aware) is wise and can be done in parallel.
It could be within a couple of days, the time all depends on the size and
underlying hardware.

Going from 2.5.34 to 2.10.4 is a significant jump. I would be sure there
isn't a step upgrade advised. I know there has been step upgrades in the
past, not sure about going to/from these two versions.

--Jeff

Yeah we've found out firsthand that its problematic as we have been seeing
issues :). Hence the urge to upgrade.
We've begun exploring this but we wanted to reach out to other people who
may have gone through the same thing to get their thoughts. We also need
to figure out how significant an outage this will be. As if it takes a day
or two of full outage to do the upgrade that is more acceptable than a
week. We also wanted to know if people had experienced data
loss/corruption in the process and any other kinks.
We were planning on playing around on VM's to test the upgrade path before
committing to upgrading our larger systems. One of the questions we had
though was if we needed to run e2fsck before/after the upgrade as that
could add significant time to the outage for that to complete.
-Paul Edmon-
You're running 2.10.4 clients against 2.5.34 servers? I believe there are
notable lnet attrs that don't exist in 2.5.34. Maybe a Whamcloud wiz might
chime in but I think that version mismatch might be problematic.
You can do a testbed upgrade to test taking a ldiskfs volume from 2.5.34
to 2.10.4, just to be conservative.
--Jeff

Paul Edmon

2018-07-23 18:11:40 UTC

Permalink

Yeah we've pinged Intel/Whamcloud to find out upgrade paths as we wanted
to know what the recommended procedure is.

Sure. So we have 3 systems that we want to upgrade 1 that is a PB and 2
that are 5 PB each.Â I will just give you a description of one and
assume that everything would scale linearly with size. They all have the
same hardware.

The head nodes are Dell R620's while the shelves are M3420 (mds) and
M3260 (oss).Â The MDT is 2.2T with 466G used and 268M inodes used.Â Each
OST is 30T with each OSS hosting 6.Â The filesystem itself is 93% full.

-Paul Edmon-

Post by Jeff Johnson
Paul,
How big are your ldiskfs volumes? What type of underlying hardware are
they? Running e2fsck (ldiskfs aware) is wise and can be done in
parallel. It could be within a couple of days, the time all depends on
the size and underlying hardware.
Going from 2.5.34 to 2.10.4 is a significant jump. I would be sure
there isn't a step upgrade advised. I know there has been step
upgrades in the past, not sure about going to/from these two versions.
--Jeff
Yeah we've found out firsthand that its problematic as we have
been seeing issues :).Â Hence the urge to upgrade.
We've begun exploring this but we wanted to reach out to other
people who may have gone through the same thing to get their
thoughts.Â We also need to figure out how significant an outage
this will be.Â As if it takes a day or two of full outage to do
the upgrade that is more acceptable than a week.Â We also wanted
to know if people had experienced data loss/corruption in the
process and any other kinks.
We were planning on playing around on VM's to test the upgrade
path before committing to upgrading our larger systems.Â One of
the questions we had though was if we needed to run e2fsck
before/after the upgrade as that could add significant time to the
outage for that to complete.
-Paul Edmon-

Post by Jeff Johnson
You're running 2.10.4 clients against 2.5.34 servers? I believe
there are notable lnet attrs that don't exist in 2.5.34. Maybe a
Whamcloud wiz might chime in but I think that version mismatch
might be problematic.
You can do a testbed upgrade to test taking a ldiskfs volume from
2.5.34 to 2.10.4, just to be conservative.
--Jeff
On Mon, Jul 23, 2018 at 10:05 AM, Paul Edmon
My apologies I meant 2.5.34 not 2.6.34. We'd like to get up
to 2.10.4 which is what our clients are running.Â Recently we
upgraded our cluster to CentOS7 which necessitated the client
upgrade.Â Our storage servers though stayed behind on 2.5.34.
-Paul Edmon-

Post by Jeff Johnson
Paul,
2.6.34 is a kernel version. What version of Lustre are you
at now? Some updates are easier than others.
--Jeff
On Mon, Jul 23, 2018 at 8:59 AM, Paul Edmon
We have some old large scale Lustre installs that are
running 2.6.34 and we want to get these up to the latest
version of Lustre.Â I was curious if people in this
group have any experience with doing this and if they
could share them.Â How do you handle upgrades like
this?Â How much time does it take?Â What are the
pitfalls?Â How do you manage it with minimal customer
interruption? Should we just write off upgrading and
stand up new servers that are on the correct version (in
which case we need to transfer the several PB's worth of
data over to the new system)?
Thanks for your wisdom.
-Paul Edmon-
_______________________________________________
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 Â f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>

_______________________________________________
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 Â f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>

Jörg Saßmannshausen

2018-07-24 08:52:44 UTC

Permalink

Hi Paul,

with a file system being 93% full, in my humble opinion it would make sense to
increase the underlying hardware capacity as well. The reasoning behind it is
that usually over time there will be more data on any given file system and
thus if there is already a downtime, I would increase the size of it as well.
I rather have a bit of a longer down time and then I got a new version of
Luster (of which I know little about) and more capacity which will last for
longer than only upgrading Luster and then run out of disc capacity a bit
later.
It also means that in your case you could simply install the new system, test
it, and then migrate the data over. Depending how it is set up you even could
do that at stages.
As you mentioned 3 different Luster servers, you could for example start with
the biggest one and use new hardware here. The freed capacity of the now
obsolete hardware could for example being utilized for the other systems.
Of course, I don't know your hardware etc.

Just some ideas from a hot London 8-)

Jörg

Post by Paul Edmon
Yeah we've pinged Intel/Whamcloud to find out upgrade paths as we wanted
to know what the recommended procedure is.
Sure. So we have 3 systems that we want to upgrade 1 that is a PB and 2
that are 5 PB each. I will just give you a description of one and
assume that everything would scale linearly with size. They all have the
same hardware.
The head nodes are Dell R620's while the shelves are M3420 (mds) and
M3260 (oss). The MDT is 2.2T with 466G used and 268M inodes used. Each
OST is 30T with each OSS hosting 6. The filesystem itself is 93% full.
-Paul Edmon-

Post by Jeff Johnson
Paul,
How big are your ldiskfs volumes? What type of underlying hardware are
they? Running e2fsck (ldiskfs aware) is wise and can be done in
parallel. It could be within a couple of days, the time all depends on
the size and underlying hardware.
Going from 2.5.34 to 2.10.4 is a significant jump. I would be sure
there isn't a step upgrade advised. I know there has been step
upgrades in the past, not sure about going to/from these two versions.
--Jeff
Yeah we've found out firsthand that its problematic as we have
been seeing issues :). Hence the urge to upgrade.
We've begun exploring this but we wanted to reach out to other
people who may have gone through the same thing to get their
thoughts. We also need to figure out how significant an outage
this will be. As if it takes a day or two of full outage to do
the upgrade that is more acceptable than a week. We also wanted
to know if people had experienced data loss/corruption in the
process and any other kinks.
We were planning on playing around on VM's to test the upgrade
path before committing to upgrading our larger systems. One of
the questions we had though was if we needed to run e2fsck
before/after the upgrade as that could add significant time to the
outage for that to complete.
-Paul Edmon-

Post by Jeff Johnson
You're running 2.10.4 clients against 2.5.34 servers? I believe
there are notable lnet attrs that don't exist in 2.5.34. Maybe a
Whamcloud wiz might chime in but I think that version mismatch
might be problematic.
You can do a testbed upgrade to test taking a ldiskfs volume from
2.5.34 to 2.10.4, just to be conservative.
--Jeff
On Mon, Jul 23, 2018 at 10:05 AM, Paul Edmon
My apologies I meant 2.5.34 not 2.6.34. We'd like to get up
to 2.10.4 which is what our clients are running. Recently we
upgraded our cluster to CentOS7 which necessitated the client
upgrade. Our storage servers though stayed behind on 2.5.34.
-Paul Edmon-

Post by Jeff Johnson
Paul,
2.6.34 is a kernel version. What version of Lustre are you
at now? Some updates are easier than others.
--Jeff
On Mon, Jul 23, 2018 at 8:59 AM, Paul Edmon
We have some old large scale Lustre installs that are
running 2.6.34 and we want to get these up to the latest
version of Lustre. I was curious if people in this
group have any experience with doing this and if they
could share them. How do you handle upgrades like
this? How much time does it take? What are the
pitfalls? How do you manage it with minimal customer
interruption? Should we just write off upgrading and
stand up new servers that are on the correct version (in
which case we need to transfer the several PB's worth of
data over to the new system)?
Thanks for your wisdom.
-Paul Edmon-
_______________________________________________
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
change your subscription (digest mode or unsubscribe)
visithttp://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>

_______________________________________________
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
sponsored by Penguin Computing To change your subscription (digest
mode or unsubscribe)
visithttp://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>

_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe

Paul Edmon

2018-07-24 14:19:18 UTC

Permalink

Yeah, that's my preferred solution as the hardware we have is nearing
end of life. In that case though we would then have to coordinate the
cut over of the data to the new storage and forklift all those PB's over
to the new system, which brings its own unique challenges. Plus then
you also have to have the budget to buy the new hardware.

Right now we are just exploring our options.

-Paul Edmon-

Post by JÃ¶rg SaÃmannshausen
Hi Paul,
with a file system being 93% full, in my humble opinion it would make sense to
increase the underlying hardware capacity as well. The reasoning behind it is
that usually over time there will be more data on any given file system and
thus if there is already a downtime, I would increase the size of it as well.
I rather have a bit of a longer down time and then I got a new version of
Luster (of which I know little about) and more capacity which will last for
longer than only upgrading Luster and then run out of disc capacity a bit
later.
It also means that in your case you could simply install the new system, test
it, and then migrate the data over. Depending how it is set up you even could
do that at stages.
As you mentioned 3 different Luster servers, you could for example start with
the biggest one and use new hardware here. The freed capacity of the now
obsolete hardware could for example being utilized for the other systems.
Of course, I don't know your hardware etc.
Just some ideas from a hot London 8-)
Jörg

Post by Jeff Johnson
Paul,
How big are your ldiskfs volumes? What type of underlying hardware are
they? Running e2fsck (ldiskfs aware) is wise and can be done in
parallel. It could be within a couple of days, the time all depends on
the size and underlying hardware.
Going from 2.5.34 to 2.10.4 is a significant jump. I would be sure
there isn't a step upgrade advised. I know there has been step
upgrades in the past, not sure about going to/from these two versions.
--Jeff
Yeah we've found out firsthand that its problematic as we have
been seeing issues :). Hence the urge to upgrade.
We've begun exploring this but we wanted to reach out to other
people who may have gone through the same thing to get their
thoughts. We also need to figure out how significant an outage
this will be. As if it takes a day or two of full outage to do
the upgrade that is more acceptable than a week. We also wanted
to know if people had experienced data loss/corruption in the
process and any other kinks.
We were planning on playing around on VM's to test the upgrade
path before committing to upgrading our larger systems. One of
the questions we had though was if we needed to run e2fsck
before/after the upgrade as that could add significant time to the
outage for that to complete.
-Paul Edmon-

Post by Jeff Johnson
You're running 2.10.4 clients against 2.5.34 servers? I believe
there are notable lnet attrs that don't exist in 2.5.34. Maybe a
Whamcloud wiz might chime in but I think that version mismatch
might be problematic.
You can do a testbed upgrade to test taking a ldiskfs volume from
2.5.34 to 2.10.4, just to be conservative.
--Jeff
On Mon, Jul 23, 2018 at 10:05 AM, Paul Edmon
My apologies I meant 2.5.34 not 2.6.34. We'd like to get up
to 2.10.4 which is what our clients are running. Recently we
upgraded our cluster to CentOS7 which necessitated the client
upgrade. Our storage servers though stayed behind on 2.5.34.
-Paul Edmon-

Post by Jeff Johnson
Paul,
2.6.34 is a kernel version. What version of Lustre are you
at now? Some updates are easier than others.
--Jeff
On Mon, Jul 23, 2018 at 8:59 AM, Paul Edmon
We have some old large scale Lustre installs that are
running 2.6.34 and we want to get these up to the latest
version of Lustre. I was curious if people in this
group have any experience with doing this and if they
could share them. How do you handle upgrades like
this? How much time does it take? What are the
pitfalls? How do you manage it with minimal customer
interruption? Should we just write off upgrading and
stand up new servers that are on the correct version (in
which case we need to transfer the several PB's worth of
data over to the new system)?
Thanks for your wisdom.
-Paul Edmon-
_______________________________________________
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out
Storage
_______________________________________________
change your subscription (digest mode or unsubscribe)
visithttp://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>

_______________________________________________
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
sponsored by Penguin Computing To change your subscription (digest
mode or unsubscribe)
visithttp://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>

_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe

John Hearns via Beowulf

2018-07-24 14:31:00 UTC

Permalink

Forgive me for saying this, but the philosophy for software defined storage
such as CEPH and Gluster is that forklift style upgrades should not be
necessary.
When a storage server is to be retired the data is copied onto the new
server then the old one taken out of service. Well, copied is not the
correct word, as there are erasure-coded copies of the data. Rebalanced is
probaby a better word.

Sorry if I am seeming to be a smartarse. I have gone through the pain of
forklift style upgrades in the past when storage arrays reach End of Life.
I just really like the Software Defined Storage mantra - no component
should be a point of failure.

Paul Edmon

2018-07-24 14:40:55 UTC

Permalink

While I agree with you in principle, one also has to deal with the
reality as you find yourself in.Â In our case we have more experience
with Lustre than Ceph in an HPC and we got burned pretty badly by
Gluster.Â While I like Ceph in principle I haven't seen it do what
Lustre can do in a HPC setting over IB.Â Now it may be able to do that,
which is great.Â However then you have to get your system set up to do
that and prove that it can.Â After all users have a funny way of
breaking things that work amazingly well in controlled test environs,
especially when you have no control how they will actually use the
system (as in a research environment).Â Certainly we are working on
exploring this option too as it would be awesome and save many headaches.

Anyways no worries about you being a smartarse, it is a valid point.Â
One just needs to consider the realities on the ground in ones own
environment.

-Paul Edmon-

Post by John Hearns via Beowulf
Forgive me for saying this, but the philosophy for software defined
storage such as CEPH and Gluster is that forklift style upgrades
should not be necessary.
When a storage server is to be retired the data is copied onto the new
server then the old one taken out of service. Well, copied is not the
correct word, as there are erasure-coded copies of the data.
Rebalanced is probaby a better word.
Sorry if I am seeming to be a smartarse. I have gone through the pain
of forklift style upgrades in the past when storage arrays reach End
of Life.
I just really like the Software Defined Storage mantra - no component
should be a point of failure.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

John Hearns via Beowulf

2018-07-24 15:02:43 UTC

Permalink

Paul, thanks for the reply.
I would like to ask, if I may. I rather like Glustre, but have not deployed
it in HPC. I have heard a few people comment about Gluster not working well
in HPC. Would you be willing to be more specific?

One research site I talked to did the classic 'converged infrastructure'
idea of attaching storage drives to their compute nodes and distributing
Glustre storage. They were not happy with that IW as told, and I can very
much understand why. But Gluster on dedicated servers I would be interested
to hear about.

While I agree with you in principle, one also has to deal with the reality
as you find yourself in. In our case we have more experience with Lustre
than Ceph in an HPC and we got burned pretty badly by Gluster. While I
like Ceph in principle I haven't seen it do what Lustre can do in a HPC
setting over IB. Now it may be able to do that, which is great. However
then you have to get your system set up to do that and prove that it can.
After all users have a funny way of breaking things that work amazingly
well in controlled test environs, especially when you have no control how
they will actually use the system (as in a research environment).
Certainly we are working on exploring this option too as it would be
awesome and save many headaches.
Anyways no worries about you being a smartarse, it is a valid point. One
just needs to consider the realities on the ground in ones own environment.
-Paul Edmon-
Forgive me for saying this, but the philosophy for software defined
storage such as CEPH and Gluster is that forklift style upgrades should not
be necessary.
When a storage server is to be retired the data is copied onto the new
server then the old one taken out of service. Well, copied is not the
correct word, as there are erasure-coded copies of the data. Rebalanced is
probaby a better word.
Sorry if I am seeming to be a smartarse. I have gone through the pain of
forklift style upgrades in the past when storage arrays reach End of Life.
I just really like the Software Defined Storage mantra - no component
should be a point of failure.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Paul Edmon

2018-07-24 15:56:14 UTC

Permalink

This was several years back so the current version of Gluster may be in
better shape.Â We tried to use it for our primary storage but ran into
scalability problems.Â It especially was the case when it came to
healing bricks and doing replication.Â It just didn't scale well.Â
Eventually we abandoned it for NFS and Lustre, NFS for deep storage and
Lustre for performance.Â We tried it for hosting VM images which worked
pretty well but we've since moved to Ceph for that.

Anyways I have no idea about current Gluster in terms of scalability so
the issues we ran into may not be an problem anymore.Â However it has
made us very gun shy about trying Gluster again.Â Instead we've decided
to use Ceph as we've gained a bunch of experience with Ceph in our
OpenNebula installation.

-Paul Edmon-

Post by John Hearns via Beowulf
Paul, thanks for the reply.
I would like to ask, if I may. I rather like Glustre, but have not
deployed it in HPC. I have heard a few people comment about Gluster
not working well in HPC. Would you be willing to be more specific?
One research site I talked to did the classic 'converged
infrastructure' idea of attaching storage drives to their compute
nodes and distributing Glustre storage. They were not happy with that
IW as told, and I can very much understand why. But Gluster on
dedicated servers I would be interested to hear about.
While I agree with you in principle, one also has to deal with the
reality as you find yourself in.Â In our case we have more
experience with Lustre than Ceph in an HPC and we got burned
pretty badly by Gluster.Â While I like Ceph in principle I haven't
seen it do what Lustre can do in a HPC setting over IB.Â Now it
may be able to do that, which is great.Â However then you have to
get your system set up to do that and prove that it can.Â After
all users have a funny way of breaking things that work amazingly
well in controlled test environs, especially when you have no
control how they will actually use the system (as in a research
environment).Â Certainly we are working on exploring this option
too as it would be awesome and save many headaches.
Anyways no worries about you being a smartarse, it is a valid
point.Â One just needs to consider the realities on the ground in
ones own environment.
-Paul Edmon-

Post by John Hearns via Beowulf
Forgive me for saying this, but the philosophy for software
defined storage such as CEPH and Gluster is that forklift style
upgrades should not be necessary.
When a storage server is to be retired the data is copied onto
the new server then the old one taken out of service. Well,
copied is not the correct word, as there are erasure-coded copies
of the data. Rebalanced is probaby a better word.
Sorry if I am seeming to be a smartarse. I have gone through the
pain of forklift style upgrades in the past when storage arrays
reach End of Life.
I just really like the Software Defined Storage mantra - no
component should be a point of failure.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

John Hearns via Beowulf

2018-07-24 17:15:34 UTC

Permalink

Thankyou for a comprehensive reply.

Post by Paul Edmon
This was several years back so the current version of Gluster may be in
better shape. We tried to use it for our primary storage but ran into
scalability problems. It especially was the case when it came to healing
bricks and doing replication. It just didn't scale well. Eventually we
abandoned it for NFS and Lustre, NFS for deep storage and Lustre for
performance. We tried it for hosting VM images which worked pretty well
but we've since moved to Ceph for that.
Anyways I have no idea about current Gluster in terms of scalability so
the issues we ran into may not be an problem anymore. However it has made
us very gun shy about trying Gluster again. Instead we've decided to use
Ceph as we've gained a bunch of experience with Ceph in our OpenNebula
installation.
-Paul Edmon-
Paul, thanks for the reply.
I would like to ask, if I may. I rather like Glustre, but have not
deployed it in HPC. I have heard a few people comment about Gluster not
working well in HPC. Would you be willing to be more specific?
One research site I talked to did the classic 'converged infrastructure'
idea of attaching storage drives to their compute nodes and distributing
Glustre storage. They were not happy with that IW as told, and I can very
much understand why. But Gluster on dedicated servers I would be interested
to hear about.

Post by Paul Edmon
While I agree with you in principle, one also has to deal with the
reality as you find yourself in. In our case we have more experience with
Lustre than Ceph in an HPC and we got burned pretty badly by Gluster.
While I like Ceph in principle I haven't seen it do what Lustre can do in a
HPC setting over IB. Now it may be able to do that, which is great.
However then you have to get your system set up to do that and prove that
it can. After all users have a funny way of breaking things that work
amazingly well in controlled test environs, especially when you have no
control how they will actually use the system (as in a research
environment). Certainly we are working on exploring this option too as it
would be awesome and save many headaches.
Anyways no worries about you being a smartarse, it is a valid point. One
just needs to consider the realities on the ground in ones own environment.
-Paul Edmon-
Forgive me for saying this, but the philosophy for software defined
storage such as CEPH and Gluster is that forklift style upgrades should not
be necessary.
When a storage server is to be retired the data is copied onto the new
server then the old one taken out of service. Well, copied is not the
correct word, as there are erasure-coded copies of the data. Rebalanced is
probaby a better word.
Sorry if I am seeming to be a smartarse. I have gone through the pain of
forklift style upgrades in the past when storage arrays reach End of Life.
I just really like the Software Defined Storage mantra - no component
should be a point of failure.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Jörg Saßmannshausen

2018-07-26 07:14:54 UTC

Permalink

Dear all,

I once had this idea as well: using the spinning discs which I have in the
compute nodes as part of a distributed scratch space. I was using glusterfs
for that as I thought it might be a good idea. It was not. The reason behind
it is that as soon as a job is creating say 700 GB of scratch data (real job
not some fictional one!), the performance of the node which is hosting part of
that data approaches zero due to the high disc IO. This meant that the job
which was running there was affected. So in the end this led to an
installation which got a separate file server for the scratch space.
I also should add that this was a rather small setup of 8 nodes and it was a
few years back.
The problem I found in computational chemistry is that some jobs require
either large amount of memory, i.e. significantly more than the usual 2 GB per
core, or large amount of scratch space (if there is insufficient memory). You
are in trouble if it requires both. :-)

All the best from a still hot London

Jörg

While I agree with you in principle, one also has to deal with the reality
as you find yourself in. In our case we have more experience with Lustre
than Ceph in an HPC and we got burned pretty badly by Gluster. While I
like Ceph in principle I haven't seen it do what Lustre can do in a HPC
setting over IB. Now it may be able to do that, which is great. However
then you have to get your system set up to do that and prove that it can.
After all users have a funny way of breaking things that work amazingly
well in controlled test environs, especially when you have no control how
they will actually use the system (as in a research environment).
Certainly we are working on exploring this option too as it would be
awesome and save many headaches.
Anyways no worries about you being a smartarse, it is a valid point. One
just needs to consider the realities on the ground in ones own environment.
-Paul Edmon-
Forgive me for saying this, but the philosophy for software defined
storage such as CEPH and Gluster is that forklift style upgrades should not
be necessary.
When a storage server is to be retired the data is copied onto the new
server then the old one taken out of service. Well, copied is not the
correct word, as there are erasure-coded copies of the data. Rebalanced is
probaby a better word.
Sorry if I am seeming to be a smartarse. I have gone through the pain of
forklift style upgrades in the past when storage arrays reach End of Life.
I just really like the Software Defined Storage mantra - no component
should be a point of failure.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

John Hearns via Beowulf

2018-07-26 07:53:35 UTC

Permalink

Jorg,
you should look at BeeGFS and BeeOnDemand https://www.beegfs.io/wiki/BeeOND

On Thu, 26 Jul 2018 at 09:15, JÃ¶rg SaÃmannshausen <

Post by JÃ¶rg SaÃmannshausen
Dear all,
I once had this idea as well: using the spinning discs which I have in the
compute nodes as part of a distributed scratch space. I was using glusterfs
for that as I thought it might be a good idea. It was not. The reason behind
it is that as soon as a job is creating say 700 GB of scratch data (real job
not some fictional one!), the performance of the node which is hosting part of
that data approaches zero due to the high disc IO. This meant that the job
which was running there was affected. So in the end this led to an
installation which got a separate file server for the scratch space.
I also should add that this was a rather small setup of 8 nodes and it was a
few years back.
The problem I found in computational chemistry is that some jobs require
either large amount of memory, i.e. significantly more than the usual 2 GB per
core, or large amount of scratch space (if there is insufficient memory). You
are in trouble if it requires both. :-)
All the best from a still hot London
JÃ¶rg

Post by John Hearns via Beowulf
Paul, thanks for the reply.
I would like to ask, if I may. I rather like Glustre, but have not

deployed

Post by John Hearns via Beowulf
it in HPC. I have heard a few people comment about Gluster not working

well

Post by John Hearns via Beowulf
in HPC. Would you be willing to be more specific?
One research site I talked to did the classic 'converged infrastructure'
idea of attaching storage drives to their compute nodes and distributing
Glustre storage. They were not happy with that IW as told, and I can very
much understand why. But Gluster on dedicated servers I would be

interested

Post by John Hearns via Beowulf
to hear about.

Post by Paul Edmon
While I agree with you in principle, one also has to deal with the

reality