Discussion:
Suggestions to what DFS to use
(too old to reply)
Tony Brian Albers
2017-02-13 07:55:43 UTC
Permalink
Hi guys,

So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.

Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).

I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?

TIA
--
Best regards,

Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Benson Muite
2017-02-13 08:36:55 UTC
Permalink
Hi,

Do you have any performance requirements?

Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
--
--
Hajussüsteemide Teadur
Arvutiteaduse Instituut
Tartu Ülikool
J. Liivi 2, 50409
Tartu
http://kodu.ut.ee/~benson
---
Research Fellow of Distributed Systems
Institute of Computer Science
University of Tartu
J.Liivi 2, 50409
Tartu, Estonia
http://kodu.ut.ee/~benson

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Tony Brian Albers
2017-02-13 09:32:00 UTC
Permalink
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So performance
requirements are not high. But ingest needs to be really fast, we're
talking tens of terabytes here.

/tony
--
Best regards,

Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Justin Y. Shi
2017-02-13 16:30:42 UTC
Permalink
Maybe you would consider Scality (http://www.scality.com/) for your growth
concerns. If you need speed, DDN is faster in rapid data ingestion and for
extreme HPC data needs.

Justin
Post by Tony Brian Albers
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So performance
requirements are not high. But ingest needs to be really fast, we're
talking tens of terabytes here.
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Alex Chekholko
2017-02-13 16:39:01 UTC
Permalink
If you have a preference for Free Software, GlusterFS would work, unless
you have many millions of small files. It would also depend on your
available hardware, as there is not a 1-to-1 correspondence between a
typical GPFS setup and a typical GlusterFS setup. But at least it is free
and easy to try out. The mailing list is active, the software is now mature
( I last used GlusterFS a few years ago) and you can buy support from Red
Hat if you like.

Take a look at the RH whitepapers about typical GlusterFS architecture.

CephFS, on the other hand, is not yet mature enough, IMHO.
Post by Justin Y. Shi
Maybe you would consider Scality (http://www.scality.com/) for your
growth concerns. If you need speed, DDN is faster in rapid data ingestion
and for extreme HPC data needs.
Justin
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So performance
requirements are not high. But ingest needs to be really fast, we're
talking tens of terabytes here.
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
John Hanks
2017-02-13 16:54:48 UTC
Permalink
We've had pretty good luck with BeeGFS lately running on SuperMicro vanilla
hardware with ZFS as the underlying filesystem. It works pretty well for
the cheap end of the hardware spectrum and BeeGFS is free and pretty
amazing. It has held up to abuse under a very mixed and heavy workload and
we can stream large sequential data into it fast enough to saturate a QDR
IB link, all without any in depth tuning. While we don't have redundancy
(other than raidz3), BeeGFS can be set up with some redundancy between
metadata servers and mirroring between storage.
http://www.beegfs.com/content/

jbh
Post by Alex Chekholko
If you have a preference for Free Software, GlusterFS would work, unless
you have many millions of small files. It would also depend on your
available hardware, as there is not a 1-to-1 correspondence between a
typical GPFS setup and a typical GlusterFS setup. But at least it is free
and easy to try out. The mailing list is active, the software is now mature
( I last used GlusterFS a few years ago) and you can buy support from Red
Hat if you like.
Take a look at the RH whitepapers about typical GlusterFS architecture.
CephFS, on the other hand, is not yet mature enough, IMHO.
Maybe you would consider Scality (http://www.scality.com/) for your
growth concerns. If you need speed, DDN is faster in rapid data ingestion
and for extreme HPC data needs.
Justin
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So performance
requirements are not high. But ingest needs to be really fast, we're
talking tens of terabytes here.
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
Jon Tegner
2017-02-14 07:02:34 UTC
Permalink
BeeGFS sounds interesting. Is it possible to say something general about
how it compares to Lustre regarding performance?

/jon
Post by John Hanks
We've had pretty good luck with BeeGFS lately running on SuperMicro
vanilla hardware with ZFS as the underlying filesystem. It works
pretty well for the cheap end of the hardware spectrum and BeeGFS is
free and pretty amazing. It has held up to abuse under a very mixed
and heavy workload and we can stream large sequential data into it
fast enough to saturate a QDR IB link, all without any in depth
tuning. While we don't have redundancy (other than raidz3), BeeGFS can
be set up with some redundancy between metadata servers and mirroring
between storage. http://www.beegfs.com/content/
jbh
On Mon, Feb 13, 2017 at 7:40 PM Alex Chekholko
If you have a preference for Free Software, GlusterFS would work,
unless you have many millions of small files. It would also depend
on your available hardware, as there is not a 1-to-1
correspondence between a typical GPFS setup and a typical
GlusterFS setup. But at least it is free and easy to try out. The
mailing list is active, the software is now mature ( I last used
GlusterFS a few years ago) and you can buy support from Red Hat if
you like.
Take a look at the RH whitepapers about typical GlusterFS
architecture.
CephFS, on the other hand, is not yet mature enough, IMHO.
Maybe you would consider Scality (http://www.scality.com/) for
your growth concerns. If you need speed, DDN is faster in
rapid data ingestion and for extreme HPC data needs.
Justin
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of
nodes(10), not
Post by Benson Muite
Post by Tony Brian Albers
storage(170TB)) hadoop cluster here. Right now we're on
IBM Spectrum
Post by Benson Muite
Post by Tony Brian Albers
Scale(GPFS) which works fine and has POSIX support. On
top of GPFS we
Post by Benson Muite
Post by Tony Brian Albers
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else,
what should I use?
Post by Benson Muite
Post by Tony Brian Albers
It needs to be a fault-tolerant DFS, with POSIX
support(so that users
Post by Benson Muite
Post by Tony Brian Albers
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the
trick, but are
Post by Benson Muite
Post by Tony Brian Albers
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So
performance
requirements are not high. But ingest needs to be really
fast, we're
talking tens of terabytes here.
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 <tel:%2B45%202566%202383> / +45 8946
2316 <tel:%2B45%208946%202316>
_______________________________________________
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
John Hanks
2017-02-14 07:31:00 UTC
Permalink
I can't compare it to Lustre currently, but in the theme of general, we
have 4 major chunks of storage:

1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS clients
on nodes, this is presented to the cluster through cNFS.

2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS presented
via NFS

3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u server
with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS
2015.xx. BeeGFS clients on all nodes.

4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS
presented via NFS

Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over
wheezing under load. ZFS/NFS single server falls over wheezing under
slightly less load. BeeGFS tends to fall over a bit more gracefully under
load. Number 4, NVMe doesn't care what you do, your load doesn't impress
it at all, bring more.

We move workloads around to whichever storage has free space and works best
and put anything metadata or random I/O-ish that will fit onto the NVMe
based storage.

Now, in the theme of specific, why are we using BeeGFS and why are we
currently planning to buy about 4 PB of supermicro to put behind it? When
we asked about improving the performance of the DDN, one recommendation was
to buy GPFS client licenses for all our nodes. The quoted price was about
100k more than we wound up spending on the 460 additional TB of Supermicro
storage and BeeGFS, which performs as well or better. I fail to see the
inherent value of DDN/GPFS that makes it worth that much of a premium in
our environment. My personal opinion is that I'll take hardware over
licenses any day of the week. My general grumpiness towards vendors isn't
improved by the DDN looking suspiciously like a SuperMicro system when I
pull the shiny cover off. Of course, YMMV certainly applies here. But
there's also that incident where we had to do an offline fsck to clean up
some corrupted GPFS foo and the mmfsck tool had an assertion error, not a
warm fuzzy moment...

Last example, we recently stood up a small test cluster built out of
workstations and threw some old 2TB drives in every available slot, then
used BeeGFS to glue them all together. Suddenly there is a 36 TB filesystem
where before there was just old hardware. And as a bonus, it'll do
sustained 2 GB/s for streaming large writes. It's worth a look.

jbh
Post by Jon Tegner
BeeGFS sounds interesting. Is it possible to say something general about
how it compares to Lustre regarding performance?
/jon
We've had pretty good luck with BeeGFS lately running on SuperMicro
vanilla hardware with ZFS as the underlying filesystem. It works pretty
well for the cheap end of the hardware spectrum and BeeGFS is free and
pretty amazing. It has held up to abuse under a very mixed and heavy
workload and we can stream large sequential data into it fast enough to
saturate a QDR IB link, all without any in depth tuning. While we don't
have redundancy (other than raidz3), BeeGFS can be set up with some
redundancy between metadata servers and mirroring between storage.
http://www.beegfs.com/content/
jbh
Post by Alex Chekholko
If you have a preference for Free Software, GlusterFS would work, unless
you have many millions of small files. It would also depend on your
available hardware, as there is not a 1-to-1 correspondence between a
typical GPFS setup and a typical GlusterFS setup. But at least it is free
and easy to try out. The mailing list is active, the software is now mature
( I last used GlusterFS a few years ago) and you can buy support from Red
Hat if you like.
Take a look at the RH whitepapers about typical GlusterFS architecture.
CephFS, on the other hand, is not yet mature enough, IMHO.
Maybe you would consider Scality (http://www.scality.com/) for your
growth concerns. If you need speed, DDN is faster in rapid data ingestion
and for extreme HPC data needs.
Justin
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I
use?
Post by Benson Muite
Post by Tony Brian Albers
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So performance
requirements are not high. But ingest needs to be really fast, we're
talking tens of terabytes here.
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Jörg Saßmannshausen
2017-02-14 10:44:14 UTC
Permalink
Hi John,

thanks for the very interesting and informative post.
I am looking into large storage space right now as well so this came really
timely for me! :-)

One question: I have noticed you were using ZFS on Linux (CentOS 6.8). What
are you experiences with this? Does it work reliable? How did you configure the
file space?
From what I have read is the best way of setting up ZFS is to give ZFS direct
access to the discs and then install the ZFS 'raid5' or 'raid6' on top of
that. Is that what you do as well?

You can contact me offline if you like.

All the best from London

Jörg
Post by John Hanks
I can't compare it to Lustre currently, but in the theme of general, we
1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS clients
on nodes, this is presented to the cluster through cNFS.
2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS presented
via NFS
3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u server
with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS
2015.xx. BeeGFS clients on all nodes.
4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS
presented via NFS
Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over
wheezing under load. ZFS/NFS single server falls over wheezing under
slightly less load. BeeGFS tends to fall over a bit more gracefully under
load. Number 4, NVMe doesn't care what you do, your load doesn't impress
it at all, bring more.
We move workloads around to whichever storage has free space and works best
and put anything metadata or random I/O-ish that will fit onto the NVMe
based storage.
Now, in the theme of specific, why are we using BeeGFS and why are we
currently planning to buy about 4 PB of supermicro to put behind it? When
we asked about improving the performance of the DDN, one recommendation was
to buy GPFS client licenses for all our nodes. The quoted price was about
100k more than we wound up spending on the 460 additional TB of Supermicro
storage and BeeGFS, which performs as well or better. I fail to see the
inherent value of DDN/GPFS that makes it worth that much of a premium in
our environment. My personal opinion is that I'll take hardware over
licenses any day of the week. My general grumpiness towards vendors isn't
improved by the DDN looking suspiciously like a SuperMicro system when I
pull the shiny cover off. Of course, YMMV certainly applies here. But
there's also that incident where we had to do an offline fsck to clean up
some corrupted GPFS foo and the mmfsck tool had an assertion error, not a
warm fuzzy moment...
Last example, we recently stood up a small test cluster built out of
workstations and threw some old 2TB drives in every available slot, then
used BeeGFS to glue them all together. Suddenly there is a 36 TB filesystem
where before there was just old hardware. And as a bonus, it'll do
sustained 2 GB/s for streaming large writes. It's worth a look.
jbh
Post by Jon Tegner
BeeGFS sounds interesting. Is it possible to say something general about
how it compares to Lustre regarding performance?
/jon
We've had pretty good luck with BeeGFS lately running on SuperMicro
vanilla hardware with ZFS as the underlying filesystem. It works pretty
well for the cheap end of the hardware spectrum and BeeGFS is free and
pretty amazing. It has held up to abuse under a very mixed and heavy
workload and we can stream large sequential data into it fast enough to
saturate a QDR IB link, all without any in depth tuning. While we don't
have redundancy (other than raidz3), BeeGFS can be set up with some
redundancy between metadata servers and mirroring between storage.
http://www.beegfs.com/content/
jbh
Post by Alex Chekholko
If you have a preference for Free Software, GlusterFS would work, unless
you have many millions of small files. It would also depend on your
available hardware, as there is not a 1-to-1 correspondence between a
typical GPFS setup and a typical GlusterFS setup. But at least it is free
and easy to try out. The mailing list is active, the software is now mature
( I last used GlusterFS a few years ago) and you can buy support from Red
Hat if you like.
Take a look at the RH whitepapers about typical GlusterFS architecture.
CephFS, on the other hand, is not yet mature enough, IMHO.
Maybe you would consider Scality (http://www.scality.com/) for your
growth concerns. If you need speed, DDN is faster in rapid data ingestion
and for extreme HPC data needs.
Justin
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I
use?
Post by Benson Muite
Post by Tony Brian Albers
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So performance
requirements are not high. But ingest needs to be really fast, we're
talking tens of terabytes here.
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
*************************************************************
Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
20 Gordon Street
London
WC1H 0AJ

email: ***@ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
Tony Brian Albers
2017-02-14 13:16:38 UTC
Permalink
Post by Jörg Saßmannshausen
Hi John,
thanks for the very interesting and informative post.
I am looking into large storage space right now as well so this came really
timely for me! :-)
One question: I have noticed you were using ZFS on Linux (CentOS 6.8). What
are you experiences with this? Does it work reliable? How did you configure the
file space?
From what I have read is the best way of setting up ZFS is to give ZFS direct
access to the discs and then install the ZFS 'raid5' or 'raid6' on top of
that. Is that what you do as well?
You can contact me offline if you like.
All the best from London
Jörg
Post by John Hanks
I can't compare it to Lustre currently, but in the theme of general, we
1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS clients
on nodes, this is presented to the cluster through cNFS.
2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS presented
via NFS
3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u server
with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS
2015.xx. BeeGFS clients on all nodes.
4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS
presented via NFS
Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over
wheezing under load. ZFS/NFS single server falls over wheezing under
slightly less load. BeeGFS tends to fall over a bit more gracefully under
load. Number 4, NVMe doesn't care what you do, your load doesn't impress
it at all, bring more.
We move workloads around to whichever storage has free space and works best
and put anything metadata or random I/O-ish that will fit onto the NVMe
based storage.
Now, in the theme of specific, why are we using BeeGFS and why are we
currently planning to buy about 4 PB of supermicro to put behind it? When
we asked about improving the performance of the DDN, one recommendation was
to buy GPFS client licenses for all our nodes. The quoted price was about
100k more than we wound up spending on the 460 additional TB of Supermicro
storage and BeeGFS, which performs as well or better. I fail to see the
inherent value of DDN/GPFS that makes it worth that much of a premium in
our environment. My personal opinion is that I'll take hardware over
licenses any day of the week. My general grumpiness towards vendors isn't
improved by the DDN looking suspiciously like a SuperMicro system when I
pull the shiny cover off. Of course, YMMV certainly applies here. But
there's also that incident where we had to do an offline fsck to clean up
some corrupted GPFS foo and the mmfsck tool had an assertion error, not a
warm fuzzy moment...
Last example, we recently stood up a small test cluster built out of
workstations and threw some old 2TB drives in every available slot, then
used BeeGFS to glue them all together. Suddenly there is a 36 TB filesystem
where before there was just old hardware. And as a bonus, it'll do
sustained 2 GB/s for streaming large writes. It's worth a look.
jbh
That sounds very interesting, I'd like to hear more about that. How did
you manage to use zfs on centos ?

/tony
--
Best regards,

Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
John Hanks
2017-02-14 13:47:07 UTC
Permalink
Should have included this in my last message:

https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS

One other aspect of ZFS I overlooked in my earlier messages is the built in
compression. At one point I backed up 460TB of data from our GPFS system
onto ~300TB of space on a ZFS system using gzip-9 compression on the target
filesystem, thereby gaining compression that was transparent to the users.
The benefits of ZFS are really too numerous to cover and the flexibility it
adds for managing storage open up whole new solution spaces to explore. For
me it is the go-to filesystem for the first layer on the disks.

jbh
Post by Tony Brian Albers
Post by Jörg Saßmannshausen
Hi John,
thanks for the very interesting and informative post.
I am looking into large storage space right now as well so this came
really
Post by Jörg Saßmannshausen
timely for me! :-)
One question: I have noticed you were using ZFS on Linux (CentOS 6.8).
What
Post by Jörg Saßmannshausen
are you experiences with this? Does it work reliable? How did you
configure the
Post by Jörg Saßmannshausen
file space?
From what I have read is the best way of setting up ZFS is to give ZFS
direct
Post by Jörg Saßmannshausen
access to the discs and then install the ZFS 'raid5' or 'raid6' on top of
that. Is that what you do as well?
You can contact me offline if you like.
All the best from London
Jörg
Post by John Hanks
I can't compare it to Lustre currently, but in the theme of general, we
1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS
clients
Post by Jörg Saßmannshausen
Post by John Hanks
on nodes, this is presented to the cluster through cNFS.
2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS presented
via NFS
3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u server
with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS
2015.xx. BeeGFS clients on all nodes.
4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS
presented via NFS
Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over
wheezing under load. ZFS/NFS single server falls over wheezing under
slightly less load. BeeGFS tends to fall over a bit more gracefully
under
Post by Jörg Saßmannshausen
Post by John Hanks
load. Number 4, NVMe doesn't care what you do, your load doesn't
impress
Post by Jörg Saßmannshausen
Post by John Hanks
it at all, bring more.
We move workloads around to whichever storage has free space and works
best
Post by Jörg Saßmannshausen
Post by John Hanks
and put anything metadata or random I/O-ish that will fit onto the NVMe
based storage.
Now, in the theme of specific, why are we using BeeGFS and why are we
currently planning to buy about 4 PB of supermicro to put behind it?
When
Post by Jörg Saßmannshausen
Post by John Hanks
we asked about improving the performance of the DDN, one recommendation
was
Post by Jörg Saßmannshausen
Post by John Hanks
to buy GPFS client licenses for all our nodes. The quoted price was
about
Post by Jörg Saßmannshausen
Post by John Hanks
100k more than we wound up spending on the 460 additional TB of
Supermicro
Post by Jörg Saßmannshausen
Post by John Hanks
storage and BeeGFS, which performs as well or better. I fail to see the
inherent value of DDN/GPFS that makes it worth that much of a premium in
our environment. My personal opinion is that I'll take hardware over
licenses any day of the week. My general grumpiness towards vendors
isn't
Post by Jörg Saßmannshausen
Post by John Hanks
improved by the DDN looking suspiciously like a SuperMicro system when I
pull the shiny cover off. Of course, YMMV certainly applies here. But
there's also that incident where we had to do an offline fsck to clean
up
Post by Jörg Saßmannshausen
Post by John Hanks
some corrupted GPFS foo and the mmfsck tool had an assertion error, not
a
Post by Jörg Saßmannshausen
Post by John Hanks
warm fuzzy moment...
Last example, we recently stood up a small test cluster built out of
workstations and threw some old 2TB drives in every available slot, then
used BeeGFS to glue them all together. Suddenly there is a 36 TB
filesystem
Post by Jörg Saßmannshausen
Post by John Hanks
where before there was just old hardware. And as a bonus, it'll do
sustained 2 GB/s for streaming large writes. It's worth a look.
jbh
That sounds very interesting, I'd like to hear more about that. How did
you manage to use zfs on centos ?
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 <+45%2025%2066%2023%2083> / +45 8946 2316
<+45%2089%2046%2023%2016>
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
Jeffrey Layton
2017-02-14 17:00:55 UTC
Permalink
Of course there are tons of options depending upon what you want and your
IO patterns of the applications.

Doug's comments about HDFS are great - he's a very good expert in this area.

Depending upon your IO patterns and workload, NFS may work well. I've found
it work quite well unless you have a bunch of clients really hammering it.
There are some tuning options you can use to improve this behavior (i.e.
more clients beating on it before it collapses). It's good to have lots of
memory in the NFS server. Google for "Dell, NSS" and you should find some
documents on tuning options that Dell created that work VERY well.

Another option for NFS is to consider using async mounts. This can
definitely increase performance but you just have to be aware of the
downside - if the server goes down, you could lose data from the clients
(data in flight). But I've seen some massive performance gains when using
async mounts.

BTW - if you have IB, consider using NFS with IPoIB. This can boost
performance as well. The recent kernels have RDMA capability for NFS.

If you need encryption over the wire, then consider sshfs. It uses FUSE so
you can mount directories from any host you have SSH access (be sure to NOT
use password-less SSH :) ). There are some pretty good tuning options for
it as well.

For distributed file systems there are some good options: Lustre, BeeGFS,
OrangeFS, Ceph, Gluster, Moose, OCFS2, etc. (my apologies to any
open-source file systems that I've forgotten). I personally like all of
them :) I've used Lustre, BeeGFS, and OrangeFS in current and past lives.
I've found BeeGFS to be very easy to configure. The performance seems to be
on par with Lustre for the limited testing I did but it's always best to
test your own applications (that's true for any file system or storage
solution).

There are also commercial solutions that should not be ignored if you want
to go that route. There are bunch of them out there - GPFS, Panasas,
Scality, and others.

I hope some of these pointers help.

Jeff
Post by John Hanks
https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS
One other aspect of ZFS I overlooked in my earlier messages is the built
in compression. At one point I backed up 460TB of data from our GPFS system
onto ~300TB of space on a ZFS system using gzip-9 compression on the target
filesystem, thereby gaining compression that was transparent to the users.
The benefits of ZFS are really too numerous to cover and the flexibility it
adds for managing storage open up whole new solution spaces to explore. For
me it is the go-to filesystem for the first layer on the disks.
jbh
Post by Tony Brian Albers
Post by Jörg Saßmannshausen
Hi John,
thanks for the very interesting and informative post.
I am looking into large storage space right now as well so this came
really
Post by Jörg Saßmannshausen
timely for me! :-)
One question: I have noticed you were using ZFS on Linux (CentOS 6.8).
What
Post by Jörg Saßmannshausen
are you experiences with this? Does it work reliable? How did you
configure the
Post by Jörg Saßmannshausen
file space?
From what I have read is the best way of setting up ZFS is to give ZFS
direct
Post by Jörg Saßmannshausen
access to the discs and then install the ZFS 'raid5' or 'raid6' on top
of
Post by Jörg Saßmannshausen
that. Is that what you do as well?
You can contact me offline if you like.
All the best from London
Jörg
Post by John Hanks
I can't compare it to Lustre currently, but in the theme of general, we
1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS
clients
Post by Jörg Saßmannshausen
Post by John Hanks
on nodes, this is presented to the cluster through cNFS.
2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS
presented
Post by Jörg Saßmannshausen
Post by John Hanks
via NFS
3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u server
with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS
2015.xx. BeeGFS clients on all nodes.
4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS
presented via NFS
Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over
wheezing under load. ZFS/NFS single server falls over wheezing under
slightly less load. BeeGFS tends to fall over a bit more gracefully
under
Post by Jörg Saßmannshausen
Post by John Hanks
load. Number 4, NVMe doesn't care what you do, your load doesn't
impress
Post by Jörg Saßmannshausen
Post by John Hanks
it at all, bring more.
We move workloads around to whichever storage has free space and works
best
Post by Jörg Saßmannshausen
Post by John Hanks
and put anything metadata or random I/O-ish that will fit onto the NVMe
based storage.
Now, in the theme of specific, why are we using BeeGFS and why are we
currently planning to buy about 4 PB of supermicro to put behind it?
When
Post by Jörg Saßmannshausen
Post by John Hanks
we asked about improving the performance of the DDN, one
recommendation was
Post by Jörg Saßmannshausen
Post by John Hanks
to buy GPFS client licenses for all our nodes. The quoted price was
about
Post by Jörg Saßmannshausen
Post by John Hanks
100k more than we wound up spending on the 460 additional TB of
Supermicro
Post by Jörg Saßmannshausen
Post by John Hanks
storage and BeeGFS, which performs as well or better. I fail to see the
inherent value of DDN/GPFS that makes it worth that much of a premium
in
Post by Jörg Saßmannshausen
Post by John Hanks
our environment. My personal opinion is that I'll take hardware over
licenses any day of the week. My general grumpiness towards vendors
isn't
Post by Jörg Saßmannshausen
Post by John Hanks
improved by the DDN looking suspiciously like a SuperMicro system when
I
Post by Jörg Saßmannshausen
Post by John Hanks
pull the shiny cover off. Of course, YMMV certainly applies here. But
there's also that incident where we had to do an offline fsck to clean
up
Post by Jörg Saßmannshausen
Post by John Hanks
some corrupted GPFS foo and the mmfsck tool had an assertion error,
not a
Post by Jörg Saßmannshausen
Post by John Hanks
warm fuzzy moment...
Last example, we recently stood up a small test cluster built out of
workstations and threw some old 2TB drives in every available slot,
then
Post by Jörg Saßmannshausen
Post by John Hanks
used BeeGFS to glue them all together. Suddenly there is a 36 TB
filesystem
Post by Jörg Saßmannshausen
Post by John Hanks
where before there was just old hardware. And as a bonus, it'll do
sustained 2 GB/s for streaming large writes. It's worth a look.
jbh
That sounds very interesting, I'd like to hear more about that. How did
you manage to use zfs on centos ?
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 <+45%2025%2066%2023%2083> / +45 8946 2316
<+45%2089%2046%2023%2016>
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Douglas O'Flaherty
2017-02-15 00:54:23 UTC
Permalink
If I can help, I'm inside IBM. I'm the marketing lead for IBM Spectrum
Scale (aka GPFS), but I have solid connections to the field tech support
and development teams.

my corporate email is ***@us.ibm.com

IBM just announced that HortonWorks will be supported on IBM Spectrum
Scale. IBM has a lot of development focus on the Hadoop/Spark use case.
Post by Jeffrey Layton
Of course there are tons of options depending upon what you want and your
IO patterns of the applications.
Doug's comments about HDFS are great - he's a very good expert in this area.
Depending upon your IO patterns and workload, NFS may work well. I've
found it work quite well unless you have a bunch of clients really
hammering it. There are some tuning options you can use to improve this
behavior (i.e. more clients beating on it before it collapses). It's good
to have lots of memory in the NFS server. Google for "Dell, NSS" and you
should find some documents on tuning options that Dell created that work
VERY well.
Another option for NFS is to consider using async mounts. This can
definitely increase performance but you just have to be aware of the
downside - if the server goes down, you could lose data from the clients
(data in flight). But I've seen some massive performance gains when using
async mounts.
BTW - if you have IB, consider using NFS with IPoIB. This can boost
performance as well. The recent kernels have RDMA capability for NFS.
If you need encryption over the wire, then consider sshfs. It uses FUSE so
you can mount directories from any host you have SSH access (be sure to NOT
use password-less SSH :) ). There are some pretty good tuning options for
it as well.
For distributed file systems there are some good options: Lustre, BeeGFS,
OrangeFS, Ceph, Gluster, Moose, OCFS2, etc. (my apologies to any
open-source file systems that I've forgotten). I personally like all of
them :) I've used Lustre, BeeGFS, and OrangeFS in current and past lives.
I've found BeeGFS to be very easy to configure. The performance seems to be
on par with Lustre for the limited testing I did but it's always best to
test your own applications (that's true for any file system or storage
solution).
There are also commercial solutions that should not be ignored if you want
to go that route. There are bunch of them out there - GPFS, Panasas,
Scality, and others.
I hope some of these pointers help.
Jeff
Post by John Hanks
https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS
One other aspect of ZFS I overlooked in my earlier messages is the built
in compression. At one point I backed up 460TB of data from our GPFS system
onto ~300TB of space on a ZFS system using gzip-9 compression on the target
filesystem, thereby gaining compression that was transparent to the users.
The benefits of ZFS are really too numerous to cover and the flexibility it
adds for managing storage open up whole new solution spaces to explore. For
me it is the go-to filesystem for the first layer on the disks.
jbh
Post by Tony Brian Albers
Post by Jörg Saßmannshausen
Hi John,
thanks for the very interesting and informative post.
I am looking into large storage space right now as well so this came
really
Post by Jörg Saßmannshausen
timely for me! :-)
One question: I have noticed you were using ZFS on Linux (CentOS 6.8).
What
Post by Jörg Saßmannshausen
are you experiences with this? Does it work reliable? How did you
configure the
Post by Jörg Saßmannshausen
file space?
From what I have read is the best way of setting up ZFS is to give ZFS
direct
Post by Jörg Saßmannshausen
access to the discs and then install the ZFS 'raid5' or 'raid6' on top
of
Post by Jörg Saßmannshausen
that. Is that what you do as well?
You can contact me offline if you like.
All the best from London
Jörg
Post by John Hanks
I can't compare it to Lustre currently, but in the theme of general,
we
Post by Jörg Saßmannshausen
Post by John Hanks
1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS
clients
Post by Jörg Saßmannshausen
Post by John Hanks
on nodes, this is presented to the cluster through cNFS.
2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS
presented
Post by Jörg Saßmannshausen
Post by John Hanks
via NFS
3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u
server
Post by Jörg Saßmannshausen
Post by John Hanks
with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS
2015.xx. BeeGFS clients on all nodes.
4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS
presented via NFS
Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over
wheezing under load. ZFS/NFS single server falls over wheezing under
slightly less load. BeeGFS tends to fall over a bit more gracefully
under
Post by Jörg Saßmannshausen
Post by John Hanks
load. Number 4, NVMe doesn't care what you do, your load doesn't
impress
Post by Jörg Saßmannshausen
Post by John Hanks
it at all, bring more.
We move workloads around to whichever storage has free space and
works best
Post by Jörg Saßmannshausen
Post by John Hanks
and put anything metadata or random I/O-ish that will fit onto the
NVMe
Post by Jörg Saßmannshausen
Post by John Hanks
based storage.
Now, in the theme of specific, why are we using BeeGFS and why are we
currently planning to buy about 4 PB of supermicro to put behind it?
When
Post by Jörg Saßmannshausen
Post by John Hanks
we asked about improving the performance of the DDN, one
recommendation was
Post by Jörg Saßmannshausen
Post by John Hanks
to buy GPFS client licenses for all our nodes. The quoted price was
about
Post by Jörg Saßmannshausen
Post by John Hanks
100k more than we wound up spending on the 460 additional TB of
Supermicro
Post by Jörg Saßmannshausen
Post by John Hanks
storage and BeeGFS, which performs as well or better. I fail to see
the
Post by Jörg Saßmannshausen
Post by John Hanks
inherent value of DDN/GPFS that makes it worth that much of a premium
in
Post by Jörg Saßmannshausen
Post by John Hanks
our environment. My personal opinion is that I'll take hardware over
licenses any day of the week. My general grumpiness towards vendors
isn't
Post by Jörg Saßmannshausen
Post by John Hanks
improved by the DDN looking suspiciously like a SuperMicro system
when I
Post by Jörg Saßmannshausen
Post by John Hanks
pull the shiny cover off. Of course, YMMV certainly applies here. But
there's also that incident where we had to do an offline fsck to
clean up
Post by Jörg Saßmannshausen
Post by John Hanks
some corrupted GPFS foo and the mmfsck tool had an assertion error,
not a
Post by Jörg Saßmannshausen
Post by John Hanks
warm fuzzy moment...
Last example, we recently stood up a small test cluster built out of
workstations and threw some old 2TB drives in every available slot,
then
Post by Jörg Saßmannshausen
Post by John Hanks
used BeeGFS to glue them all together. Suddenly there is a 36 TB
filesystem
Post by Jörg Saßmannshausen
Post by John Hanks
where before there was just old hardware. And as a bonus, it'll do
sustained 2 GB/s for streaming large writes. It's worth a look.
jbh
That sounds very interesting, I'd like to hear more about that. How did
you manage to use zfs on centos ?
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 <+45%2025%2066%2023%2083> / +45 8946 2316
<+45%2089%2046%2023%2016>
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
John Hanks
2017-02-14 13:40:18 UTC
Permalink
All our nodes, even most of our fileservers (non-DDN), boot statelessly
(warewulf) and all local disks are managed by ZFS, either with JBOD
controllers or with non-JBOD controllers configuring each disk as a 1 drive
RAID0. So if at all possible, ZFS gets control of the raw disk.

ZFS has been extremely reliable. The only problems we have encountered was
an underflow that broke quota's on one of our servers and a recent problem
using a zvol as swap on CentOS 7.x. The ZFS on linux community is pretty
solid at this point and it's nice to know that anything written to disk is
correct.

Compute nodes use striping with no disk redundancy, storage nodes are
almost all raidz3 (3 parity disks per vdev). Because we tend to use large
drives, raidz3 gives us a cushion should a rebuild from a failed drive take
a long time on a full filesystem. There are some mirrors in a few places,
we even have the occasional workstation where we've set up a 3 disk mirror
to provide extra protection for some critical data and work.

jbh


On Tue, Feb 14, 2017 at 1:45 PM Jörg Saßmannshausen <
Post by Jörg Saßmannshausen
Hi John,
thanks for the very interesting and informative post.
I am looking into large storage space right now as well so this came really
timely for me! :-)
One question: I have noticed you were using ZFS on Linux (CentOS 6.8). What
are you experiences with this? Does it work reliable? How did you configure the
file space?
From what I have read is the best way of setting up ZFS is to give ZFS direct
access to the discs and then install the ZFS 'raid5' or 'raid6' on top of
that. Is that what you do as well?
You can contact me offline if you like.
All the best from London
Jörg
Post by John Hanks
I can't compare it to Lustre currently, but in the theme of general, we
1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS
clients
Post by John Hanks
on nodes, this is presented to the cluster through cNFS.
2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS presented
via NFS
3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u server
with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS
2015.xx. BeeGFS clients on all nodes.
4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS
presented via NFS
Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over
wheezing under load. ZFS/NFS single server falls over wheezing under
slightly less load. BeeGFS tends to fall over a bit more gracefully under
load. Number 4, NVMe doesn't care what you do, your load doesn't impress
it at all, bring more.
We move workloads around to whichever storage has free space and works
best
Post by John Hanks
and put anything metadata or random I/O-ish that will fit onto the NVMe
based storage.
Now, in the theme of specific, why are we using BeeGFS and why are we
currently planning to buy about 4 PB of supermicro to put behind it? When
we asked about improving the performance of the DDN, one recommendation
was
Post by John Hanks
to buy GPFS client licenses for all our nodes. The quoted price was about
100k more than we wound up spending on the 460 additional TB of
Supermicro
Post by John Hanks
storage and BeeGFS, which performs as well or better. I fail to see the
inherent value of DDN/GPFS that makes it worth that much of a premium in
our environment. My personal opinion is that I'll take hardware over
licenses any day of the week. My general grumpiness towards vendors isn't
improved by the DDN looking suspiciously like a SuperMicro system when I
pull the shiny cover off. Of course, YMMV certainly applies here. But
there's also that incident where we had to do an offline fsck to clean up
some corrupted GPFS foo and the mmfsck tool had an assertion error, not a
warm fuzzy moment...
Last example, we recently stood up a small test cluster built out of
workstations and threw some old 2TB drives in every available slot, then
used BeeGFS to glue them all together. Suddenly there is a 36 TB
filesystem
Post by John Hanks
where before there was just old hardware. And as a bonus, it'll do
sustained 2 GB/s for streaming large writes. It's worth a look.
jbh
Post by Jon Tegner
BeeGFS sounds interesting. Is it possible to say something general
about
Post by John Hanks
Post by Jon Tegner
how it compares to Lustre regarding performance?
/jon
We've had pretty good luck with BeeGFS lately running on SuperMicro
vanilla hardware with ZFS as the underlying filesystem. It works pretty
well for the cheap end of the hardware spectrum and BeeGFS is free and
pretty amazing. It has held up to abuse under a very mixed and heavy
workload and we can stream large sequential data into it fast enough to
saturate a QDR IB link, all without any in depth tuning. While we don't
have redundancy (other than raidz3), BeeGFS can be set up with some
redundancy between metadata servers and mirroring between storage.
http://www.beegfs.com/content/
jbh
On Mon, Feb 13, 2017 at 7:40 PM Alex Chekholko <
Post by Alex Chekholko
If you have a preference for Free Software, GlusterFS would work,
unless
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
you have many millions of small files. It would also depend on your
available hardware, as there is not a 1-to-1 correspondence between a
typical GPFS setup and a typical GlusterFS setup. But at least it is
free
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
and easy to try out. The mailing list is active, the software is now mature
( I last used GlusterFS a few years ago) and you can buy support from
Red
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
Hat if you like.
Take a look at the RH whitepapers about typical GlusterFS
architecture.
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
CephFS, on the other hand, is not yet mature enough, IMHO.
Maybe you would consider Scality (http://www.scality.com/) for your
growth concerns. If you need speed, DDN is faster in rapid data
ingestion
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
and for extreme HPC data needs.
Justin
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM
Spectrum
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
Post by Benson Muite
Post by Tony Brian Albers
Scale(GPFS) which works fine and has POSIX support. On top of GPFS
we
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
Post by Benson Muite
Post by Tony Brian Albers
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I
use?
Post by Benson Muite
Post by Tony Brian Albers
It needs to be a fault-tolerant DFS, with POSIX support(so that
users
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
Post by Benson Muite
Post by Tony Brian Albers
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but
are
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
Post by Benson Muite
Post by Tony Brian Albers
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So performance
requirements are not high. But ingest needs to be really fast, we're
talking tens of terabytes here.
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 <+45%2025%2066%2023%2083> / +45 8946 2316
<+45%2089%2046%2023%2016>
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
_______________________________________________
Computing
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Computing
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Computing
Post by John Hanks
Post by Jon Tegner
Post by Alex Chekholko
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
_______________________________________________
Computing
Post by John Hanks
Post by Jon Tegner
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Computing
Post by John Hanks
Post by Jon Tegner
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
*************************************************************
Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
20 Gordon Street
London
WC1H 0AJ
web: http://sassy.formativ.net
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
Christopher Samuel
2017-02-15 01:04:48 UTC
Permalink
Post by John Hanks
1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS
clients on nodes, this is presented to the cluster through cNFS.
[...]
Post by John Hanks
Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over
wheezing under load.
I suspect that's more a limitation of NFS than GPFS though in your set
up. We've got DDN SFA10K's with GPFS all the way down and that works
really well for us (so far).

All the best,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
John Hanks
2017-02-15 06:03:15 UTC
Permalink
When we were looking at a possible GPFS client license purchase we ran the
client on our nodes and did some basic testing. The client did give us a
bit of a boost in performance over NFS, but still we could tip GPFS over
with a small fraction of our available nodes. The improvement was not
enough to be worth the license cost involved. And it's pretty hard to beat
the added performance of a whole new storage server especially given the
relative costs.

Multiple chunks of storage make it possible to isolate workloads as well.
In the end, price, flexibility and overall performance in our environment
as a whole beat out the slick GPFS sales presentation :) My rule of thumb
is that a salesman in a suit that expensive may seem impressive but
probably doesn't have my best interests in mind.

jbh
Post by Christopher Samuel
Post by John Hanks
1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS
clients on nodes, this is presented to the cluster through cNFS.
[...]
Post by John Hanks
Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over
wheezing under load.
I suspect that's more a limitation of NFS than GPFS though in your set
up. We've got DDN SFA10K's with GPFS all the way down and that works
really well for us (so far).
All the best,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
http://www.vlsci.org.au/ http://twitter.com/vlsci
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
Christopher Samuel
2017-02-15 06:11:56 UTC
Permalink
Post by John Hanks
When we were looking at a possible GPFS client license purchase we ran
the client on our nodes and did some basic testing. The client did give
us a bit of a boost in performance over NFS, but still we could tip GPFS
over with a small fraction of our available nodes.
Wow that's odd, how large are your clusters?

We were hitting ours with 2 Intel clusters (1,000+ cores each) and 4
racks of BlueGene/Q (65,5535 cores, 4096 nodes).

However, we do have our GPFS metadata on an SSD array connected to 2
dedicated NSD servers (active/active) and our SFA10K's are frontended by
4 NSD servers each (again active/active pairs to give redundancy).

cheers,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
John Hanks
2017-02-15 06:33:01 UTC
Permalink
So "clusters" is a strong word, we have a collection of ~22,000 cores of
assorted systems, basically if someone leaves a laptop laying around
unprotected we might try to run a job on it. And being bioinformatic-y, our
problem with this and all storage is metadata related. The original
procurement did not include dedicated NSD servers (or extra GPFS server
licenses) so we run solely off the SFA12K's.

Could we improve with dedicated NSD frontends and GPFS clients? Yes, most
certainly. But again, we can stand up a PB or more of brand new SuperMicro
storage fronted by BeeGFS that performs as well or better for around the
same cost, if not less. I don't have enough of an emotional investment in
GPFS or DDN to convince myself that suggesting further tuning that requires
money and time is worthwhile for our environment. It more or less serves
the purpose it was bought for, we learn from the experience and move on
down the road.

jbh
Post by Christopher Samuel
Post by John Hanks
When we were looking at a possible GPFS client license purchase we ran
the client on our nodes and did some basic testing. The client did give
us a bit of a boost in performance over NFS, but still we could tip GPFS
over with a small fraction of our available nodes.
Wow that's odd, how large are your clusters?
We were hitting ours with 2 Intel clusters (1,000+ cores each) and 4
racks of BlueGene/Q (65,5535 cores, 4096 nodes).
However, we do have our GPFS metadata on an SSD array connected to 2
dedicated NSD servers (active/active) and our SFA10K's are frontended by
4 NSD servers each (again active/active pairs to give redundancy).
cheers,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
http://www.vlsci.org.au/ http://twitter.com/vlsci
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
Christopher Samuel
2017-02-15 06:53:46 UTC
Permalink
Hi John,
Post by John Hanks
So "clusters" is a strong word, we have a collection of ~22,000 cores of
assorted systems, basically if someone leaves a laptop laying around
unprotected we might try to run a job on it. And being bioinformatic-y,
our problem with this and all storage is metadata related. The original
procurement did not include dedicated NSD servers (or extra GPFS server
licenses) so we run solely off the SFA12K's.
Ah right, so these are the embedded GPFS systems from DDN. Interesting
as our SFA10K's hit EOL in 2019 and so (if our funding continues beyond
2018) we'll need to replace them.
Post by John Hanks
Could we improve with dedicated NSD frontends and GPFS clients? Yes,
most certainly. But again, we can stand up a PB or more of brand new
SuperMicro storage fronted by BeeGFS that performs as well or better
for around the same cost, if not less.
Very nice - and for what you're doing it sounds like just what you need.
Post by John Hanks
I don't have enough of an
emotional investment in GPFS or DDN to convince myself that suggesting
further tuning that requires money and time is worthwhile for our
environment. It more or less serves the purpose it was bought for, we
learn from the experience and move on down the road.
I guess I'm getting my head around how other sites GPFS performs given I
have a current sample size of 1 and that was spec'd out by IBM as part
of a large overarching contract. :-)

I guess I assuming that because that was what we had it was how most
sites did it, apologies for that!

All the best,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Tim Cutts
2017-02-15 09:28:17 UTC
Permalink
In my limited knowedge, that's the primary advantage of GPFS, in that it isn't just a DFS, and fits into a much larger ecosystem of other features like HSM and so on, which is something the other DFS alternatives don't tend to do quite so neatly. Normally I'm wary of proprietary filesystems, having been bitten by the demise of AdvFS following the HP/Compaq merger, but GPFS has been central to IBM's strategy for a long time, so I don't think that risk is terribly great in this case.

Sanger has been a Lustre site for 10+ years, but then we have enough DFS storage to justify headcount to look after it. I've played with BeeGFS in the past, and it's certainly very easy to install and configure, but I haven't ever tried it at a large enough scale on real tin to evaluate its performance properly

Regards,

Tim
--
Head of Scientific Computing
Wellcome Trust Sanger Institute

On 15/02/2017, 06:53, "Beowulf on behalf of Christopher Samuel" <beowulf-***@beowulf.org on behalf of ***@unimelb.edu.au> wrote:

I guess I'm getting my head around how other sites GPFS performs given I
have a current sample size of 1 and that was spec'd out by IBM as part
of a large overarching contract. :-)
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Scott Atchley
2017-02-15 13:43:51 UTC
Permalink
Hi Chris,

Check with me in about a year.

After using Lustre for over 10 years to initially serve ~10 PB of disk and
now serve 30+ PB with very nice DDN gear, later this year we will be
installing 320 PB (250 PB useable) of GPFS (via IBM ESS storage units) to
support Summit, our next gen HPC system from IBM with Power9 CPUs and
NVIDIA Volta GPUs. Our current Lustre system is capable of 1 TB/s for large
sequential writes, but random write performance is much lower (~400 GB/s or
40% of sequential). The target performance for GPFS will be 2.5 TB/s
sequential writes and 2.2 TB/s random (~90% of sequential). The initial
targets are slightly lower, but we are supposed to achieve these rates by
2019.

We are very familiar with Lustre, the good and the bad, and ORNL is the
largest contributor to the Lustre codebase outside of Intel. We have
encountered many bugs at our scale that few other sites can match and we
have tested patches for Intel before their release to see how they perform
at scale. We have been testing GPFS for the last three years in preparation
for the change and IBM has been a very good partner to understand our
performance and scale issues. Improvements that IBM are adding to support
the CORAL systems will also benefit the larger community.

People are attracted to the "free" aspect of Lustre (in addition to the
open source), but it is not truly free. For both of our large Lustre
systems, we bought block storage from DDN and we added Lustre on top. We
have support contracts with DDN for the hardware and Intel for Lustre as
well as a large team within our operations to manage Lustre and a full-time
Lustre developer. The initial price is lower, but at this scale running
without support contracts and an experienced operations team is untenable.
IBM is proud of GPFS and their ESS hardware (i.e. licenses and hardware are
expensive) and they also require support contracts, but the requirements
for operations staff is lower. It is probably more expensive than any other
combination of hardware/licenses/support, but we have one vendor to blame,
which our management sees as a value.

As I said, check back in a year or two to see how this experiment works out.

Scott
Post by Jörg Saßmannshausen
Hi John,
Post by John Hanks
So "clusters" is a strong word, we have a collection of ~22,000 cores of
assorted systems, basically if someone leaves a laptop laying around
unprotected we might try to run a job on it. And being bioinformatic-y,
our problem with this and all storage is metadata related. The original
procurement did not include dedicated NSD servers (or extra GPFS server
licenses) so we run solely off the SFA12K's.
Ah right, so these are the embedded GPFS systems from DDN. Interesting
as our SFA10K's hit EOL in 2019 and so (if our funding continues beyond
2018) we'll need to replace them.
Post by John Hanks
Could we improve with dedicated NSD frontends and GPFS clients? Yes,
most certainly. But again, we can stand up a PB or more of brand new
SuperMicro storage fronted by BeeGFS that performs as well or better
for around the same cost, if not less.
Very nice - and for what you're doing it sounds like just what you need.
Post by John Hanks
I don't have enough of an
emotional investment in GPFS or DDN to convince myself that suggesting
further tuning that requires money and time is worthwhile for our
environment. It more or less serves the purpose it was bought for, we
learn from the experience and move on down the road.
I guess I'm getting my head around how other sites GPFS performs given I
have a current sample size of 1 and that was spec'd out by IBM as part
of a large overarching contract. :-)
I guess I assuming that because that was what we had it was how most
sites did it, apologies for that!
All the best,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
http://www.vlsci.org.au/ http://twitter.com/vlsci
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Douglas O'Flaherty
2017-02-16 14:55:09 UTC
Permalink
A couple of things I see in this thread:

1. Metadata performance! A lot of bioinformatics code uses metadata
searches. Anyone doing these kinds of workloads should tune/spec for
metadata separately from data storage. In Spectrum Scale there is a choice
of dedicated, fully distributed or partially distributed metadata. Most bio
houses I know choose to through really good flash dedicated to the
metadata.
Not familiar with the options for BeeFS, but do know superior distributed
metadata performance was a key target for the Fraunhofer team. Sounds like
they are hitting it. My limited history with Ceph & Gluster suggests they
could not support bioinformatic metadata requirements. It wasn't they were
built to do.

2. Small files! Small file performance is undoubtedly a big gain in IBM
dev from GPFS v 3.x to IBM Spectrum Scale 4.x - Don't compare the old
performance with the new. Spectrum Scale look into the client side caching
for reads and/or writes. A local SSD could be a great boon for performance.
There are lots of tuning options for Scale/GPFS. Small file performance and
simplified install were also target for BeeFS when they started development.

3. Spectrum Scale (GPFS) is absolutely fundamental to IBM. There is a lot
of focus now on making it easier to adopt & tune. IBM dev has delivered new
GUI, monitoring, troubleshooting, performance counters, parameterization of
tuning in the last 18 months.

4. IBM Spectrum Scale does differentiate with multi-cluster, tiered storage
support, migrate data to cloud by policy, HDFS support, etc. These may be
overkill for a lot of this mailing list, but really useful in shared
settings.

Sorry about the guy with the suit. IBM also has a good set of Scale/GPFS
people who don't own ties. Passing along your feedback on the client
license costs to guys with ties.

doug
Post by Scott Atchley
Hi Chris,
Check with me in about a year.
After using Lustre for over 10 years to initially serve ~10 PB of disk and
now serve 30+ PB with very nice DDN gear, later this year we will be
installing 320 PB (250 PB useable) of GPFS (via IBM ESS storage units) to
support Summit, our next gen HPC system from IBM with Power9 CPUs and
NVIDIA Volta GPUs. Our current Lustre system is capable of 1 TB/s for large
sequential writes, but random write performance is much lower (~400 GB/s or
40% of sequential). The target performance for GPFS will be 2.5 TB/s
sequential writes and 2.2 TB/s random (~90% of sequential). The initial
targets are slightly lower, but we are supposed to achieve these rates by
2019.
We are very familiar with Lustre, the good and the bad, and ORNL is the
largest contributor to the Lustre codebase outside of Intel. We have
encountered many bugs at our scale that few other sites can match and we
have tested patches for Intel before their release to see how they perform
at scale. We have been testing GPFS for the last three years in preparation
for the change and IBM has been a very good partner to understand our
performance and scale issues. Improvements that IBM are adding to support
the CORAL systems will also benefit the larger community.
People are attracted to the "free" aspect of Lustre (in addition to the
open source), but it is not truly free. For both of our large Lustre
systems, we bought block storage from DDN and we added Lustre on top. We
have support contracts with DDN for the hardware and Intel for Lustre as
well as a large team within our operations to manage Lustre and a full-time
Lustre developer. The initial price is lower, but at this scale running
without support contracts and an experienced operations team is untenable.
IBM is proud of GPFS and their ESS hardware (i.e. licenses and hardware are
expensive) and they also require support contracts, but the requirements
for operations staff is lower. It is probably more expensive than any other
combination of hardware/licenses/support, but we have one vendor to blame,
which our management sees as a value.
As I said, check back in a year or two to see how this experiment works out.
Scott
Post by Jörg Saßmannshausen
Hi John,
Post by John Hanks
So "clusters" is a strong word, we have a collection of ~22,000 cores of
assorted systems, basically if someone leaves a laptop laying around
unprotected we might try to run a job on it. And being bioinformatic-y,
our problem with this and all storage is metadata related. The original
procurement did not include dedicated NSD servers (or extra GPFS server
licenses) so we run solely off the SFA12K's.
Ah right, so these are the embedded GPFS systems from DDN. Interesting
as our SFA10K's hit EOL in 2019 and so (if our funding continues beyond
2018) we'll need to replace them.
Post by John Hanks
Could we improve with dedicated NSD frontends and GPFS clients? Yes,
most certainly. But again, we can stand up a PB or more of brand new
SuperMicro storage fronted by BeeGFS that performs as well or better
for around the same cost, if not less.
Very nice - and for what you're doing it sounds like just what you need.
Post by John Hanks
I don't have enough of an
emotional investment in GPFS or DDN to convince myself that suggesting
further tuning that requires money and time is worthwhile for our
environment. It more or less serves the purpose it was bought for, we
learn from the experience and move on down the road.
I guess I'm getting my head around how other sites GPFS performs given I
have a current sample size of 1 and that was spec'd out by IBM as part
of a large overarching contract. :-)
I guess I assuming that because that was what we had it was how most
sites did it, apologies for that!
All the best,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
http://www.vlsci.org.au/ http://twitter.com/vlsci
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Tony Brian Albers
2017-02-14 12:53:26 UTC
Permalink
Post by John Hanks
We've had pretty good luck with BeeGFS lately running on SuperMicro
vanilla hardware with ZFS as the underlying filesystem. It works pretty
well for the cheap end of the hardware spectrum and BeeGFS is free and
pretty amazing. It has held up to abuse under a very mixed and heavy
workload and we can stream large sequential data into it fast enough to
saturate a QDR IB link, all without any in depth tuning. While we don't
have redundancy (other than raidz3), BeeGFS can be set up with some
redundancy between metadata servers and mirroring between
storage. http://www.beegfs.com/content/
jbh
If you have a preference for Free Software, GlusterFS would work,
unless you have many millions of small files. It would also depend
on your available hardware, as there is not a 1-to-1 correspondence
between a typical GPFS setup and a typical GlusterFS setup. But at
least it is free and easy to try out. The mailing list is active,
the software is now mature ( I last used GlusterFS a few years ago)
and you can buy support from Red Hat if you like.
Take a look at the RH whitepapers about typical GlusterFS architecture.
CephFS, on the other hand, is not yet mature enough, IMHO.
Maybe you would consider Scality (http://www.scality.com/) for
your growth concerns. If you need speed, DDN is faster in rapid
data ingestion and for extreme HPC data needs.
Justin
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So
performance
requirements are not high. But ingest needs to be really
fast, we're
talking tens of terabytes here.
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 <tel:%2B45%202566%202383> / +45 8946 2316
<tel:%2B45%208946%202316>
_______________________________________________
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
BeeGFS sounds very promising. I'll have a look at that too.

/tony

--
Best regards,

Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/b
Bogdan Costescu
2017-02-15 01:02:00 UTC
Permalink
I can second the recommendation for BeeGFS. We have it in use for ~4
years with very good results, by now on 3 different FSes. We also run
it on SuperMicro hardware and Infiniband, but use the "classic"
combination with ext4 for metadata and xfs for storage servers - of
course, with RAID controllers underneath :) With 6TB SATA disks, we
get around 330TB usable capacity per storage server, which is composed
of an actual server and a JBOD unit. Maybe the most useful feature for
us is that it can be very easily expanded by adding another such
storage server, though this could become a bit problematic due to the
lack of rebalancing. (please note: I'm not saying that other FSes
don't have this feature, just that it was very useful for us :)) We
too see high performance under a very mixed load without much tuning,
however this also has a reverse side: it puts a high load on the
hardware and the software stack underneath, exposing faults in them,
such that f.e. we needed to downgrade the IB stack in one case, or
upgrade the firmware of the RAID controller and SSDs on a metadata
server in another case. The users are especially happy about the
metadata performance when working with very many small files;
streaming reads or writes of large data is basically limited by
hardware or underlying stacks, not by BeeGFS.

Cheers,
Bogdan
Post by John Hanks
We've had pretty good luck with BeeGFS lately running on SuperMicro vanilla
hardware with ZFS as the underlying filesystem. It works pretty well for the
cheap end of the hardware spectrum and BeeGFS is free and pretty amazing. It
has held up to abuse under a very mixed and heavy workload and we can stream
large sequential data into it fast enough to saturate a QDR IB link, all
without any in depth tuning. While we don't have redundancy (other than
raidz3), BeeGFS can be set up with some redundancy between metadata servers
and mirroring between storage. http://www.beegfs.com/content/
jbh
Post by Alex Chekholko
If you have a preference for Free Software, GlusterFS would work, unless
you have many millions of small files. It would also depend on your
available hardware, as there is not a 1-to-1 correspondence between a
typical GPFS setup and a typical GlusterFS setup. But at least it is free
and easy to try out. The mailing list is active, the software is now mature
( I last used GlusterFS a few years ago) and you can buy support from Red
Hat if you like.
Take a look at the RH whitepapers about typical GlusterFS architecture.
CephFS, on the other hand, is not yet mature enough, IMHO.
Post by Justin Y. Shi
Maybe you would consider Scality (http://www.scality.com/) for your
growth concerns. If you need speed, DDN is faster in rapid data ingestion
and for extreme HPC data needs.
Justin
Post by Tony Brian Albers
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So performance
requirements are not high. But ingest needs to be really fast, we're
talking tens of terabytes here.
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beow
Joe Landman
2017-02-15 01:31:09 UTC
Permalink
Post by Bogdan Costescu
I can second the recommendation for BeeGFS. We have it in use for ~4
years with very good results, by now on 3 different FSes. We also run
I'll freely admit to being biased here, but BeeGFS is definitely
something you should be evaluating/using. Even for this case, with
HDFS, there is a connector:

http://www.beegfs.com/wiki/HadoopConnector

We've had excellent results with BeeGFS on spinning rust:

https://scalability.org/2014/05/massive-unapologetic-firepower-2tb-write-in-73-seconds/

and the same system at the customer site

https://scalability.org/2014/10/massive-unapologetic-firepower-part-2-the-dashboard/

(look closely at the plot, and note the vertical axes ... I had messed
up the scale on it, but that is in thousands of MB/s).

as well as with an NVMe unit


https://scalability.org/2016/03/not-even-breaking-a-sweat-10gbs-write-to-single-node-forte-unit-over-100gb-net-realhyperconverged-hpc-storage/

Excellent performance, ease of configuration is what you should expect
from BeeGFS.
--
Joe Landman
e: ***@gmail.com
t: @hpcjoe
c: +1 734 612 4615
w: https://scalability.org
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Tony Brian Albers
2017-02-14 12:43:38 UTC
Permalink
Post by Justin Y. Shi
Maybe you would consider Scality (http://www.scality.com/) for your
growth concerns. If you need speed, DDN is faster in rapid data
ingestion and for extreme HPC data needs.
Justin
Post by Benson Muite
Hi,
Do you have any performance requirements?
Benson
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
Well, we're not going to be doing a huge amount of I/O. So performance
requirements are not high. But ingest needs to be really fast, we're
talking tens of terabytes here.
/tony
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 <tel:%2B45%202566%202383> / +45 8946 2316
<tel:%2B45%208946%202316>
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
Thanks, I'll take a look at these two. Probably has to be open source
though.

/tony
--
Best regards,

Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Greg Lindahl
2017-02-13 19:00:17 UTC
Permalink
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
I don't understand the question. Hadoop comes with HDFS, and HDFS runs
happily on top of shared-nothing, direct-attach storage. Is there
something about your hardware or usage that makes this a non-starter?
If so, that might help folks make better suggestions.

-- greg

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Ellis H. Wilson III
2017-02-13 19:45:05 UTC
Permalink
Post by Greg Lindahl
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
I don't understand the question. Hadoop comes with HDFS, and HDFS runs
happily on top of shared-nothing, direct-attach storage. Is there
something about your hardware or usage that makes this a non-starter?
If so, that might help folks make better suggestions.
I'm guessing the "POSIX support" is the piece that's missing with a
native HDFS installation. You can kinda-sorta get a form of it with
plug-ins, but it's not a first-class citizen like in most DFS and when I
used it last it was not performant. Native HDFS makes large datasets
expensive to work with in anything but Hadoop-ready (largely MR)
applications. If there is a mixed workload, having a filesystem that
can support both POSIX access and HDFS /without/ copies is invaluable.
With extremely large datasets (170TB is not that huge anymore), copies
may be a non-starter. With dated codebases or applications that don't
fit the MR model, complete movement to HDFS may also be a non-starter.

The questions I feel need to be answered here to get good answers rather
than a shotgun full of random DFS's are:

1. How much time and effort are you willing to commit to setup and
administration of the DFS? For many completely open source solutions
(Lustre and HDFS come to mind) setup and more critically maintenance can
become quite heavyweight, and performance tuning can grow to
summer-grad-student-internship level.

2. Are you looking to replace the hardware, or just the DFS? These
days, 170 TB is at the fringes (IMHO) of what can fit reasonably into a
single (albeit rather large) box. It wouldn't be completely unthinkable
to run all of your storage with ZFS/BTRFS, a very beefy server,
redundant 10, 25 or 40GE NICs, some SSD acceleration, a UPS, and
plain-jane NFS (or your protocol of choice out of most Linux distros).
You could even host the HDFS daemons on that node, pointing at POSIX
paths rather than devices. But this falls into the category of "host it
yourself," so that might be too much work.

3. How committed to HDFS are you (i.e., what features of it do your
applications actually leverage)? Many map reduce applications actually
have zero attachment to HDFS whatsoever. You can reasonably re-point
them at posix-complaint NAS and they'll "just work." Plus you get
cross-protocol access to the files without any wizardry, copying, etc.
HBase is a notable example of where they've built dependence on HDFS
into the code, but that's more the exception than the norm.

Best,

ellis

Disclaimer: I work for Panasas, a storage appliance vendor. I don't
think I'm shamelessly plugging anywhere above as I love when people host
themselves, but it's not for everybody.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Tony Brian Albers
2017-02-14 12:57:55 UTC
Permalink
Post by Ellis H. Wilson III
Post by Greg Lindahl
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
I don't understand the question. Hadoop comes with HDFS, and HDFS runs
happily on top of shared-nothing, direct-attach storage. Is there
something about your hardware or usage that makes this a non-starter?
If so, that might help folks make better suggestions.
I'm guessing the "POSIX support" is the piece that's missing with a
native HDFS installation. You can kinda-sorta get a form of it with
plug-ins, but it's not a first-class citizen like in most DFS and when I
used it last it was not performant. Native HDFS makes large datasets
expensive to work with in anything but Hadoop-ready (largely MR)
applications. If there is a mixed workload, having a filesystem that
can support both POSIX access and HDFS /without/ copies is invaluable.
With extremely large datasets (170TB is not that huge anymore), copies
may be a non-starter. With dated codebases or applications that don't
fit the MR model, complete movement to HDFS may also be a non-starter.
The questions I feel need to be answered here to get good answers rather
1. How much time and effort are you willing to commit to setup and
administration of the DFS? For many completely open source solutions
(Lustre and HDFS come to mind) setup and more critically maintenance can
become quite heavyweight, and performance tuning can grow to
summer-grad-student-internship level.
2. Are you looking to replace the hardware, or just the DFS? These
days, 170 TB is at the fringes (IMHO) of what can fit reasonably into a
single (albeit rather large) box. It wouldn't be completely unthinkable
to run all of your storage with ZFS/BTRFS, a very beefy server,
redundant 10, 25 or 40GE NICs, some SSD acceleration, a UPS, and
plain-jane NFS (or your protocol of choice out of most Linux distros).
You could even host the HDFS daemons on that node, pointing at POSIX
paths rather than devices. But this falls into the category of "host it
yourself," so that might be too much work.
3. How committed to HDFS are you (i.e., what features of it do your
applications actually leverage)? Many map reduce applications actually
have zero attachment to HDFS whatsoever. You can reasonably re-point
them at posix-complaint NAS and they'll "just work." Plus you get
cross-protocol access to the files without any wizardry, copying, etc.
HBase is a notable example of where they've built dependence on HDFS
into the code, but that's more the exception than the norm.
Best,
ellis
Disclaimer: I work for Panasas, a storage appliance vendor. I don't
think I'm shamelessly plugging anywhere above as I love when people host
themselves, but it's not for everybody.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
1) Pretty much whatever it takes. We have the mentioned cluster, a
second one running only hbase(for now) and the third is a storage
cluster for our DSpace installation which will probably grow to tens of
petabytes within a couple of years. To be able to use the same FS on all
would be nice. (yes I know, there's probably not a swiss-knife -but we
are willing to compromise)

2) Just the DFS (having issues with IBM support(not on the DFS alone)).

3) HBase. Doesn't work without HDFS AFAIK.

/tony
--
Best regards,

Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Skylar Thompson
2017-02-14 01:36:51 UTC
Permalink
Is there anything in particular that is causing you to move away from GPFS?

Skylar
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Tony Brian Albers
2017-02-14 12:58:19 UTC
Permalink
Post by Skylar Thompson
Is there anything in particular that is causing you to move away from GPFS?
Skylar
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Yes, IBM.
--
Best regards,

Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Douglas Eadline
2017-02-14 02:00:17 UTC
Permalink
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
HDFS does have a NFSv3 gateway which helps users move
data around in a familiar fashion (without the -put -get commands).
If you need to use HDFS for big block local streaming performance
that feature can be useful. If you are doing Spark or MR where data
locality is important, then HDFS is a low cost alternative
to other file systems. Plus if you use something like
Ambari/Hortonworks the management is somewhat integrated
in the web-GUI. (Hortonworks is open source rpm based)
If you don't care about locality, then another file system
will work.

As an aside, having done a handful of Hadoop/Spark workshops
in the last year, I have found the single most difficult
aspect of Hadoop/HDFS and Spark on Hadoop/HDFS is understanding
the "remote" or non-local aspect of HDFS, i.e. the fact that
a copy of the data must be loaded into HDFS before it
can be used. The NFS gateway helps because files can
be seen in a users local file system. But I digress ...

--
Doug
Post by Tony Brian Albers
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
Mailscanner: Clean
--
Doug
--
Mailscanner: Clean

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Tony Brian Albers
2017-02-14 13:14:08 UTC
Permalink
Post by Douglas Eadline
Post by Tony Brian Albers
Hi guys,
So, we're running a small(as in a small number of nodes(10), not
storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
have a GPFS transparency connector so that HDFS uses GPFS.
Now, if I'd like to replace GPFS with something else, what should I use?
It needs to be a fault-tolerant DFS, with POSIX support(so that users
can move data to and from it with standard tools).
HDFS does have a NFSv3 gateway which helps users move
data around in a familiar fashion (without the -put -get commands).
If you need to use HDFS for big block local streaming performance
that feature can be useful. If you are doing Spark or MR where data
locality is important, then HDFS is a low cost alternative
to other file systems. Plus if you use something like
Ambari/Hortonworks the management is somewhat integrated
in the web-GUI. (Hortonworks is open source rpm based)
If you don't care about locality, then another file system
will work.
As an aside, having done a handful of Hadoop/Spark workshops
in the last year, I have found the single most difficult
aspect of Hadoop/HDFS and Spark on Hadoop/HDFS is understanding
the "remote" or non-local aspect of HDFS, i.e. the fact that
a copy of the data must be loaded into HDFS before it
can be used. The NFS gateway helps because files can
be seen in a users local file system. But I digress ...
--
Doug
Post by Tony Brian Albers
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
TIA
--
Best regards,
Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
Mailscanner: Clean
Some very good points there. No doubt the NFS gateway can be useful.
But, NFS gateway in itself is not enough for our purposes.
--
Best regards,

Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Christopher Samuel
2017-02-15 01:35:30 UTC
Permalink
Post by Tony Brian Albers
I've looked at MooseFS which seems to be able to do the trick, but are
there any others that might do?
There are some folks elsewhere at the university here that are looking
at CephFS, so I'd be glad to hear about any experiences with that.

They're looking at the 4.4.x LTS mainline kernel as their base.

cheers,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Loading...