Discussion:
[Beowulf] Bright Cluster Manager
Robert Taylor
2018-05-01 20:57:40 UTC
Permalink
Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
My boss has been looking into it, so I wanted to tap into the collective
HPC consciousness and see
what people think about it.
It appears to do node management, monitoring, and provisioning, so we would
still need a job scheduler like lsf, slurm,etc, as well. Is that correct?

If you have experience with Bright, let me know. Feel free to contact me
off list or on.
Chris Dagdigian
2018-05-01 21:41:44 UTC
Permalink
Bright Cluster Manager is a great product and the only knock is it can
be pretty expensive. The most value/love I've seen for it is in the
enterprise / corporate space where there is nobody who can do real
hands-on HPC support/operations and the reduction in
administrative/operational burden it brings is worth 10x the price tag.
Corporate IT shops that are forced to manage a research/HPC environment
love it.

Basically it's fantastic in shops where software dollars are easier to
come by than specialist Linux or HPC support staff but the hardcore HPC
snobs are suspicious because Bright does a lot of the knob and feature
fiddling that they are used to doing themselves -- and there will always
be legit and valid disagreement over the 'proper' way to do deployment,
provisioning and configuration management.

I tell my clients that Bright is legit and it's worth sitting through
their sales pitch / overview presentation to get a sense of what they
offer. After that the decision is up to them.

My $.02 of course!

Chris
May 1, 2018 at 4:57 PM
Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
My boss has been looking into it, so I wanted to tap into the
collective HPC consciousness and see
what people think about it.
It appears to do node management, monitoring, and provisioning, so we
would still need a job scheduler like lsf, slurm,etc, as well. Is that
correct?
If you have experience with Bright, let me know. Feel free to contact
me off list or on.
Christopher Samuel
2018-05-01 23:24:13 UTC
Permalink
Post by Robert Taylor
It appears to do node management, monitoring, and provisioning, so we
would still need a job scheduler like lsf, slurm,etc, as well. Is
that correct?
I've not used it, but I've heard from others that it can/does supply
schedulers like Slurm, but (at least then) out of date versions.

I've heard from people who like Bright and who don't, so YMMV. :-)
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/li
John Hearns via Beowulf
2018-05-02 07:32:44 UTC
Permalink
Robert,
I have had a great deal of experience with Bright Cluster Manager and I
am happy to share my thoughts.


My experience with Bright has been as a system integrator in the UK, where
I deployed Bright for a government defence client,
for a university in London and on our in-house cluster for benchmarking and
demos.
I have a good relationship with the Bright employees in the UK and in
Europe.

Over the last year I have worked with a very big high tech company in the
Netherlands, who use Bright to manage their clusters
which run a whole range of applications.

I would say that Bright is surprisingly easy to install - you should be
going from bare metal to a functioning cluster within an hour.
The node discovery mecahnism is either to switch on each node in turn and
confirm the name.
Or to note down which port in your Ethernet switch a node is connected to
and Bright will do a MAC address lookup on that port.
Hint - do the Ethernet port mapping. Make a sensible choice of node to port
numbering on each switch.
You of course have to identify the switches also to Bright.
But it is then a matter of switching all the nodes on at once, then go off
for well deserved coffee. Happy days.

Bright can cope with most network topologies, including booting over
Infiniband.
If you run into problems their support guys are pretty responsive and very
clueful. If you get stuck they will schedule a Webex
and get you out of whatever hole you have dug for yourself. There is even a
reverse ssh tunnel built in to their software,
so you can 'call home' and someone can log in to help diagnose your problem.

I back up what Chris Dagdidian says. You pays your money and you takes
your choice.

Regarding the job scheduler, Brigh comes with pre-packaged and integrated
Slurm, PBSpro, Gridengine and I am sure LSF.
So right out of the box you have a default job scheduler set up. All you
have to do is choose which one at install time.
Bright rather like Slurm, as I do also. But I stress that it works
perfectly well with PBSPro, as I have worked in that environment over the
last year.
Should you wish to install your own version of Slurm/PBSPro etc. you can do
that, again I know this works.

I also stress PBSPro - this is now on a dual support model, so it is open
source if you dont need the formal support from Altair.

Please ask some more questions - I will tune in later.

Also it should be said that if you choose not to go with Bright a good open
source alternative is OpenHPC.
But that is a different beast, and takes a lot more care and feeding.
Post by Robert Taylor
It appears to do node management, monitoring, and provisioning, so we
would still need a job scheduler like lsf, slurm,etc, as well. Is
that correct?
I've not used it, but I've heard from others that it can/does supply
schedulers like Slurm, but (at least then) out of date versions.
I've heard from people who like Bright and who don't, so YMMV. :-)
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
John Hearns via Beowulf
2018-05-02 07:53:44 UTC
Permalink
Post by Christopher Samuel
I've not used it, but I've heard from others that it can/does supply
schedulers like Slurm, but (at least then) out of date versions.
Chris, this is true to some extent. When a new release of Slurm or, say,
Singularity is out you need to wait for Bright to package it up and test it
works with their setup.
This makes sense if you think about it - Bright is a supported product and
no company worh their salt would rush out a bleeding edge version of X
without testing.
I can say that the versions tend to be up to date but not bleeding edge - I
cannot give a specific example at the moment, sorry.

But as I say above, if it really matters to you, you can install your own
version on the master and the node images and create a Module file which
brings it into the users PATH.
Post by Christopher Samuel
Robert,
I have had a great deal of experience with Bright Cluster Manager and
I am happy to share my thoughts.
My experience with Bright has been as a system integrator in the UK, where
I deployed Bright for a government defence client,
for a university in London and on our in-house cluster for benchmarking
and demos.
I have a good relationship with the Bright employees in the UK and in
Europe.
Over the last year I have worked with a very big high tech company in the
Netherlands, who use Bright to manage their clusters
which run a whole range of applications.
I would say that Bright is surprisingly easy to install - you should be
going from bare metal to a functioning cluster within an hour.
The node discovery mecahnism is either to switch on each node in turn and
confirm the name.
Or to note down which port in your Ethernet switch a node is connected to
and Bright will do a MAC address lookup on that port.
Hint - do the Ethernet port mapping. Make a sensible choice of node to
port numbering on each switch.
You of course have to identify the switches also to Bright.
But it is then a matter of switching all the nodes on at once, then go off
for well deserved coffee. Happy days.
Bright can cope with most network topologies, including booting over
Infiniband.
If you run into problems their support guys are pretty responsive and very
clueful. If you get stuck they will schedule a Webex
and get you out of whatever hole you have dug for yourself. There is even
a reverse ssh tunnel built in to their software,
so you can 'call home' and someone can log in to help diagnose your problem.
I back up what Chris Dagdidian says. You pays your money and you takes
your choice.
Regarding the job scheduler, Brigh comes with pre-packaged and integrated
Slurm, PBSpro, Gridengine and I am sure LSF.
So right out of the box you have a default job scheduler set up. All you
have to do is choose which one at install time.
Bright rather like Slurm, as I do also. But I stress that it works
perfectly well with PBSPro, as I have worked in that environment over the
last year.
Should you wish to install your own version of Slurm/PBSPro etc. you can
do that, again I know this works.
I also stress PBSPro - this is now on a dual support model, so it is open
source if you dont need the formal support from Altair.
Please ask some more questions - I will tune in later.
Also it should be said that if you choose not to go with Bright a good
open source alternative is OpenHPC.
But that is a different beast, and takes a lot more care and feeding.
Post by Robert Taylor
It appears to do node management, monitoring, and provisioning, so we
would still need a job scheduler like lsf, slurm,etc, as well. Is
that correct?
I've not used it, but I've heard from others that it can/does supply
schedulers like Slurm, but (at least then) out of date versions.
I've heard from people who like Bright and who don't, so YMMV. :-)
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Andrew Holway
2018-05-02 12:30:49 UTC
Permalink
Post by Robert Taylor
Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
I used to work for ClusterVision from which Bright Cluster Manager was
born. Although my experience is now quite some years out of date I would
still recommend it mainly because Martijn de Vries is still CTO after 8
years and they have a very stable team of gifted developers. The company
has a single focus and they have been at it for a long time.

Back in the day I was able to deploy a complete cluster within a couple of
hours using BCM. All the nodes would boot over PXE and perform an
interesting "pivot root" operation to switch to the freshly installed HDD
from the PXE target. The software supported roles which would integrate
with SLURM allowing GPU node pools for instance. It was quite impressive
that people were able to get their code running so quickly.

I would say that, as a package, its definitely worth the money unless you
have a team of engineers kicking around. The CLI and API were a bit rough
and ready but its been 6 years since I last used it.

They also managed to successfully integrate OpenStack which is a bit of a
feat in its self.
Jeff White
2018-05-02 19:52:41 UTC
Permalink
I never used Bright.  Touched it and talked to a salesperson at a
conference but I wasn't impressed.

Unpopular opinion: I don't see a point in using "cluster managers"
unless you have a very tiny cluster and zero Linux experience.  These
are just Linux boxes with a couple applications (e.g. Slurm) running on
them.  Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way
more than they help IMO.  They are mostly crappy wrappers around free
software (e.g. ISC's dhcpd) anyway.  When they aren't it's proprietary
trash.

I install CentOS nodes and use
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and
software.  This also means I'm not suck with "node images" and can
instead build everything as plain old text files (read: write SaltStack
states), update them at will, and push changes any time.  My "base
image" is CentOS and I need no "baby's first cluster" HPC software to
install/PXEboot it.  YMMV


Jeff White
Post by Robert Taylor
Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
My boss has been looking into it, so I wanted to tap into the
collective HPC consciousness and see
what people think about it.
It appears to do node management, monitoring, and provisioning, so we
would still need a job scheduler like lsf, slurm,etc, as well. Is that
correct?
If you have experience with Bright, let me know. Feel free to contact
me off list or on.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf&d=DwIGaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk&m=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc&s=kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q&e=
Chris Dagdigian
2018-05-02 20:19:48 UTC
Permalink
I never used Bright. Touched it and talked to a salesperson at a
conference but I wasn't impressed.
Unpopular opinion: I don't see a point in using "cluster managers"
unless you have a very tiny cluster and zero Linux experience. These
are just Linux boxes with a couple applications (e.g. Slurm) running
on them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the
way more than they help IMO. They are mostly crappy wrappers around
free software (e.g. ISC's dhcpd) anyway. When they aren't it's
proprietary trash.
I install CentOS nodes and use
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and
software. This also means I'm not suck with "node images" and can
instead build everything as plain old text files (read: write
SaltStack states), update them at will, and push changes any time. My
"base image" is CentOS and I need no "baby's first cluster" HPC
software to install/PXEboot it. YMMV
Totally legit opinion and probably not unpopular at all given the user
mix on this list!

The issue here is assuming a level of domain expertise with Linux,
bare-metal provisioning, DevOps and (most importantly) HPC-specific
configStuff that may be pervasive or easily available in your
environment but is often not easily available in a
commercial/industrial environment where HPC or "scientific computing"
is just another business area that a large central IT organization must
support.

If you have that level of expertise available then the self-managed DIY
method is best. It's also my preference

But in the commercial world where HPC is becoming more and more
important you run into stuff like:

- Central IT may not actually have anyone on staff who knows Linux (more
common than you expect; I see this in Pharma/Biotech all the time)

- The HPC user base is not given budget or resource to self-support
their own stack because of a drive to centralize IT ops and support

- And if they do have Linux people on staff they may be novice-level
people or have zero experience with HPC schedulers, MPI fabric tweaking
and app needs (the domain stuff)

- And if miracles occur and they do have expert level linux people then
more often than not these people are overworked or stretched in many
directions


So what happens in these environments is that organizations will
willingly (and happily) pay commercial pricing and adopt closed-source
products if they can deliver a measurable reduction in administrative
burden, operational effort or support burden.

This is where Bright, Univa etc. all come in -- you can buy stuff from
them that dramatically reduces that onsite/local IT has to manage the
care and feeding of.

Just having a vendor to call for support on Grid Engine oddities makes
the cost of Univa licensing worthwhile. Just having a vendor like Bright
be on the hook for "cluster operations" is a huge win for an overworked
IT staff that does not have linux or HPC specialists on-staff or easily
available.

My best example of "paying to reduce operational burden in HPC" comes
from a massive well known genome shop in the cambridge, MA area. They
often tell this story:

- 300 TB of new data generation per week (many years ago)
- One of the initial storage tiers was ZFS running on commodity server
hardware
- Keeping the DIY ZFS appliances online and running took the FULL TIME
efforts of FIVE STORAGE ENGINEERS

They realized that staff support was not scalable with DIY/ZFS at
300TB/week of new data generation so they went out and bought a giant
EMC Isilon scale-out NAS platform

And you know what? After the Isilon NAS was deployed the management of
*many* petabytes of single-namespace storage was now handled by the IT
Director in his 'spare time' -- And the five engineers who used to do
nothing but keep ZFS from falling over were re-assigned to more
impactful and presumably more fun/interesting work.


They actually went on stage at several conferences and told the story of
how Isilon allowed senior IT leadership to manage petabyte volumes of
data "in their spare time" -- this was a huge deal and really resonated
. Really reinforced for me how in some cases it's actually a good idea
to pay $$$ for commercial stuff if it delivers gains in
ops/support/management.


Sorry to digress! This is a topic near and dear to me. I often have to
do HPC work in commercial environments where the skills simply don't
exist onsite. Or more commonly -- they have budget to buy software or
hardware but they are under a hiring freeze and are not allowed to bring
in new Humans.

Quite a bit of my work on projects like this is helping people make
sober decisions regarding "build" or "buy" -- and in those environments
it's totally clear that for some things it makes sense for them to pay
for an expensive commercially supported "thing" that they don't have to
manage or support themselves


My $.02 ...






_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowu
Jörg Saßmannshausen
2018-05-02 21:04:45 UTC
Permalink
Dear Chris,
Post by Chris Dagdigian
- And if miracles occur and they do have expert level linux people then
more often than not these people are overworked or stretched in many
directions
This is exactly what has happened to me at the old work place: pulled into too
many different directions.

I am a bit surprised about the ZFS experiences. Although I did not have
petabyte of storage and I did not generate 300 TB per week, I did have a
fairly large storage space running on xfs and ext4 for backups and
provisioning of file space. Some of it was running on old hardware (please sit
down, I am talking about me messing around with SCSI cables) and I gradually
upgraded to newer one. So, I am not quite sure what went wrong with the ZFS
storage here.

However, there is a common trend, at least what I observe here in the UK, to
out-source problems: pass the bucket to somebody else and we pay for it.
I am personally still more of an in-house expert than an out-sourced person
who may or may not be able to understand what you are doing.
I should add I am working in academia and I know little about the commercial
world here. Having said that, my friends in commerce are telling me that the
company likes to outsource as it is 'cheaper'.
I agree with the Linux expertise. I think I am one of the two who are Linux
admins in the present work place. The official line is: we do not support Linux
(but we teach it).

Anyhow, I don't want to digress here too much. However, "..do HPC work in
commercial environments where the skills simply don't exist onsite."
Are we a dying art?

My 1 shilling here from a still cold and dark London.

Jörg
Post by Chris Dagdigian
I never used Bright. Touched it and talked to a salesperson at a
conference but I wasn't impressed.
Unpopular opinion: I don't see a point in using "cluster managers"
unless you have a very tiny cluster and zero Linux experience. These
are just Linux boxes with a couple applications (e.g. Slurm) running
on them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the
way more than they help IMO. They are mostly crappy wrappers around
free software (e.g. ISC's dhcpd) anyway. When they aren't it's
proprietary trash.
I install CentOS nodes and use
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and
software. This also means I'm not suck with "node images" and can
instead build everything as plain old text files (read: write
SaltStack states), update them at will, and push changes any time. My
"base image" is CentOS and I need no "baby's first cluster" HPC
software to install/PXEboot it. YMMV
Totally legit opinion and probably not unpopular at all given the user
mix on this list!
The issue here is assuming a level of domain expertise with Linux,
bare-metal provisioning, DevOps and (most importantly) HPC-specific
configStuff that may be pervasive or easily available in your
environment but is often not easily available in a
commercial/industrial environment where HPC or "scientific computing"
is just another business area that a large central IT organization must
support.
If you have that level of expertise available then the self-managed DIY
method is best. It's also my preference
But in the commercial world where HPC is becoming more and more
- Central IT may not actually have anyone on staff who knows Linux (more
common than you expect; I see this in Pharma/Biotech all the time)
- The HPC user base is not given budget or resource to self-support
their own stack because of a drive to centralize IT ops and support
- And if they do have Linux people on staff they may be novice-level
people or have zero experience with HPC schedulers, MPI fabric tweaking
and app needs (the domain stuff)
- And if miracles occur and they do have expert level linux people then
more often than not these people are overworked or stretched in many
directions
So what happens in these environments is that organizations will
willingly (and happily) pay commercial pricing and adopt closed-source
products if they can deliver a measurable reduction in administrative
burden, operational effort or support burden.
This is where Bright, Univa etc. all come in -- you can buy stuff from
them that dramatically reduces that onsite/local IT has to manage the
care and feeding of.
Just having a vendor to call for support on Grid Engine oddities makes
the cost of Univa licensing worthwhile. Just having a vendor like Bright
be on the hook for "cluster operations" is a huge win for an overworked
IT staff that does not have linux or HPC specialists on-staff or easily
available.
My best example of "paying to reduce operational burden in HPC" comes
from a massive well known genome shop in the cambridge, MA area. They
- 300 TB of new data generation per week (many years ago)
- One of the initial storage tiers was ZFS running on commodity server
hardware
- Keeping the DIY ZFS appliances online and running took the FULL TIME
efforts of FIVE STORAGE ENGINEERS
They realized that staff support was not scalable with DIY/ZFS at
300TB/week of new data generation so they went out and bought a giant
EMC Isilon scale-out NAS platform
And you know what? After the Isilon NAS was deployed the management of
*many* petabytes of single-namespace storage was now handled by the IT
Director in his 'spare time' -- And the five engineers who used to do
nothing but keep ZFS from falling over were re-assigned to more
impactful and presumably more fun/interesting work.
They actually went on stage at several conferences and told the story of
how Isilon allowed senior IT leadership to manage petabyte volumes of
data "in their spare time" -- this was a huge deal and really resonated
. Really reinforced for me how in some cases it's actually a good idea
to pay $$$ for commercial stuff if it delivers gains in
ops/support/management.
Sorry to digress! This is a topic near and dear to me. I often have to
do HPC work in commercial environments where the skills simply don't
exist onsite. Or more commonly -- they have budget to buy software or
hardware but they are under a hiring freeze and are not allowed to bring
in new Humans.
Quite a bit of my work on projects like this is helping people make
sober decisions regarding "build" or "buy" -- and in those environments
it's totally clear that for some things it makes sense for them to pay
for an expensive commercially supported "thing" that they don't have to
manage or support themselves
My $.02 ...
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/li
John Hearns via Beowulf
2018-05-03 07:23:44 UTC
Permalink
Jorg, I did not know that you used Bright. Or I may have forgotten!
I thought you were a Debian fan. Of relevance, Bright 8 now supports
Debian.

You commented on the Slurm configuration file being changed.
I found during the install at Greenwich, where we put in a custom
slurm.conf, that Bright has an option
to 'freeze' files. This is defined in the cmd.conf file. So if new nodes
are added, or other changes made,
the slurm.conf gile is left unchanged and you have to manually manage it.
I am not 100% sure what happens with an update of the RPMs, but I would
imagine the freeze state is respected.
Post by Jörg Saßmannshausen
I should add I am working in academia and I know little about the commercial
world here. Having said that, my friends in commerce are telling me that the
company likes to outsource as it is 'cheaper'.
I would not say cheaper. However (see below) HPC skills are scarce.
And if you are in industry you commit to your management that HPC resources
will be up and running
for XX % of a year - ie you have some explaining to do if there is extended
downtime.
HPC is looked upon as something comparable to machine tools - in Formula 1
we competed for beudget against
fize axis milling machines for instance. Can you imagine what would happen
if the machine shop supervisor said
"Sorry - no parts being made today. My guys have the covers off and we are
replacing one of the motors with one we got off Ebay"


So yes you do want commercial support for aspects of your setup - let us
say that jobs are going into hold states
on your batch system, or jobs are immediately terminating. Do you:

a) spend all day going through logs with a fine tooth comb, and send out an
email to the Slurm/PBS/SGE list and hope you get
some sort of answer

b) take a dump of the relevant logs and get a ticket opened with your
support people

Actually in real life you do both, but path (b) is going to get you up and
running quicker.

Also for storage, in industry you really want support on your storage.
Post by Jörg Saßmannshausen
Anyhow, I don't want to digress here too much. However, "..do HPC work in
commercial environments where the skills simply don't exist onsite."
Are we a dying art?
Jorg, yes. HPC skills are rare, as are the people who take the time and
trouble to learn deeply about the systems they operate.
I know this as recruitment consultants tell me this regularly.
I find that often in life people do the minimum they need, and once they
are given instructions they never change,
even when the configuration steps they carry out have lost meaning.
I have met that attitude in several companies. Echoing Richard Feynman I
call this 'cargo cult systems'
The people like you who are willing to continually learn and to abandon old
ways of work
are invaluable.

I am consulting at the moment with a biotech firm in Denmark. Replying to
Chris Dagdigian, this company does have excellent in-house
Linux skills, so I suppose is the exception to the rule!
Post by Jörg Saßmannshausen
Dear Chris,
Post by Chris Dagdigian
- And if miracles occur and they do have expert level linux people then
more often than not these people are overworked or stretched in many
directions
This is exactly what has happened to me at the old work place: pulled into too
many different directions.
I am a bit surprised about the ZFS experiences. Although I did not have
petabyte of storage and I did not generate 300 TB per week, I did have a
fairly large storage space running on xfs and ext4 for backups and
provisioning of file space. Some of it was running on old hardware (please sit
down, I am talking about me messing around with SCSI cables) and I gradually
upgraded to newer one. So, I am not quite sure what went wrong with the ZFS
storage here.
However, there is a common trend, at least what I observe here in the UK, to
out-source problems: pass the bucket to somebody else and we pay for it.
I am personally still more of an in-house expert than an out-sourced person
who may or may not be able to understand what you are doing.
I should add I am working in academia and I know little about the commercial
world here. Having said that, my friends in commerce are telling me that the
company likes to outsource as it is 'cheaper'.
I agree with the Linux expertise. I think I am one of the two who are Linux
admins in the present work place. The official line is: we do not support Linux
(but we teach it).
Anyhow, I don't want to digress here too much. However, "..do HPC work in
commercial environments where the skills simply don't exist onsite."
Are we a dying art?
My 1 shilling here from a still cold and dark London.
Jörg
Post by Chris Dagdigian
I never used Bright. Touched it and talked to a salesperson at a
conference but I wasn't impressed.
Unpopular opinion: I don't see a point in using "cluster managers"
unless you have a very tiny cluster and zero Linux experience. These
are just Linux boxes with a couple applications (e.g. Slurm) running
on them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the
way more than they help IMO. They are mostly crappy wrappers around
free software (e.g. ISC's dhcpd) anyway. When they aren't it's
proprietary trash.
I install CentOS nodes and use
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and
software. This also means I'm not suck with "node images" and can
instead build everything as plain old text files (read: write
SaltStack states), update them at will, and push changes any time. My
"base image" is CentOS and I need no "baby's first cluster" HPC
software to install/PXEboot it. YMMV
Totally legit opinion and probably not unpopular at all given the user
mix on this list!
The issue here is assuming a level of domain expertise with Linux,
bare-metal provisioning, DevOps and (most importantly) HPC-specific
configStuff that may be pervasive or easily available in your
environment but is often not easily available in a
commercial/industrial environment where HPC or "scientific computing"
is just another business area that a large central IT organization must
support.
If you have that level of expertise available then the self-managed DIY
method is best. It's also my preference
But in the commercial world where HPC is becoming more and more
- Central IT may not actually have anyone on staff who knows Linux (more
common than you expect; I see this in Pharma/Biotech all the time)
- The HPC user base is not given budget or resource to self-support
their own stack because of a drive to centralize IT ops and support
- And if they do have Linux people on staff they may be novice-level
people or have zero experience with HPC schedulers, MPI fabric tweaking
and app needs (the domain stuff)
- And if miracles occur and they do have expert level linux people then
more often than not these people are overworked or stretched in many
directions
So what happens in these environments is that organizations will
willingly (and happily) pay commercial pricing and adopt closed-source
products if they can deliver a measurable reduction in administrative
burden, operational effort or support burden.
This is where Bright, Univa etc. all come in -- you can buy stuff from
them that dramatically reduces that onsite/local IT has to manage the
care and feeding of.
Just having a vendor to call for support on Grid Engine oddities makes
the cost of Univa licensing worthwhile. Just having a vendor like Bright
be on the hook for "cluster operations" is a huge win for an overworked
IT staff that does not have linux or HPC specialists on-staff or easily
available.
My best example of "paying to reduce operational burden in HPC" comes
from a massive well known genome shop in the cambridge, MA area. They
- 300 TB of new data generation per week (many years ago)
- One of the initial storage tiers was ZFS running on commodity server
hardware
- Keeping the DIY ZFS appliances online and running took the FULL TIME
efforts of FIVE STORAGE ENGINEERS
They realized that staff support was not scalable with DIY/ZFS at
300TB/week of new data generation so they went out and bought a giant
EMC Isilon scale-out NAS platform
And you know what? After the Isilon NAS was deployed the management of
*many* petabytes of single-namespace storage was now handled by the IT
Director in his 'spare time' -- And the five engineers who used to do
nothing but keep ZFS from falling over were re-assigned to more
impactful and presumably more fun/interesting work.
They actually went on stage at several conferences and told the story of
how Isilon allowed senior IT leadership to manage petabyte volumes of
data "in their spare time" -- this was a huge deal and really resonated
. Really reinforced for me how in some cases it's actually a good idea
to pay $$$ for commercial stuff if it delivers gains in
ops/support/management.
Sorry to digress! This is a topic near and dear to me. I often have to
do HPC work in commercial environments where the skills simply don't
exist onsite. Or more commonly -- they have budget to buy software or
hardware but they are under a hiring freeze and are not allowed to bring
in new Humans.
Quite a bit of my work on projects like this is helping people make
sober decisions regarding "build" or "buy" -- and in those environments
it's totally clear that for some things it makes sense for them to pay
for an expensive commercially supported "thing" that they don't have to
manage or support themselves
My $.02 ...
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
John Hearns via Beowulf
2018-05-03 07:45:52 UTC
Permalink
Post by Chris Dagdigian
And you know what? After the Isilon NAS was deployed the management of
*many* petabytes of single-namespace storage was now handled by the IT
Director in his 'spare time' -- And the five engineers who used to do
nothing > >but keep ZFS from falling over were re-assigned to more
impactful and presumably more fun/interesting work.

The person who runs the huge JASMIN climate research project in the UK
makes the same comment, only with Panasas storage.
He is able to manage petabytes of Panasas storage with himself and one
other person. A lto of that storage installed by my fair hands.
To be honest though installing Panasas is a matter of how fast you can
unbox the blades (*)

(*) Well, that is not so in real life! During that install we had several
'funnies' - all of which were diagnosed and a fix given by the superb
Panasas support.
Including the shelf where after replacing every component over the period
of two weeks - something like Triggers Broom
http://foolsandhorses.weebly.com/triggers-broom.html
we at last found the bent pin in the multiway connector (ahem)
Post by Chris Dagdigian
Jorg, I did not know that you used Bright. Or I may have forgotten!
I thought you were a Debian fan. Of relevance, Bright 8 now supports
Debian.
You commented on the Slurm configuration file being changed.
I found during the install at Greenwich, where we put in a custom
slurm.conf, that Bright has an option
to 'freeze' files. This is defined in the cmd.conf file. So if new nodes
are added, or other changes made,
the slurm.conf gile is left unchanged and you have to manually manage it.
I am not 100% sure what happens with an update of the RPMs, but I would
imagine the freeze state is respected.
Post by Jörg Saßmannshausen
I should add I am working in academia and I know little about the
commercial
Post by Jörg Saßmannshausen
world here. Having said that, my friends in commerce are telling me that
the
Post by Jörg Saßmannshausen
company likes to outsource as it is 'cheaper'.
I would not say cheaper. However (see below) HPC skills are scarce.
And if you are in industry you commit to your management that HPC
resources will be up and running
for XX % of a year - ie you have some explaining to do if there is
extended downtime.
HPC is looked upon as something comparable to machine tools - in Formula 1
we competed for beudget against
fize axis milling machines for instance. Can you imagine what would happen
if the machine shop supervisor said
"Sorry - no parts being made today. My guys have the covers off and we are
replacing one of the motors with one we got off Ebay"
So yes you do want commercial support for aspects of your setup - let us
say that jobs are going into hold states
a) spend all day going through logs with a fine tooth comb, and send out
an email to the Slurm/PBS/SGE list and hope you get
some sort of answer
b) take a dump of the relevant logs and get a ticket opened with your
support people
Actually in real life you do both, but path (b) is going to get you up and
running quicker.
Also for storage, in industry you really want support on your storage.
Post by Jörg Saßmannshausen
Anyhow, I don't want to digress here too much. However, "..do HPC work in
commercial environments where the skills simply don't exist onsite."
Are we a dying art?
Jorg, yes. HPC skills are rare, as are the people who take the time and
trouble to learn deeply about the systems they operate.
I know this as recruitment consultants tell me this regularly.
I find that often in life people do the minimum they need, and once they
are given instructions they never change,
even when the configuration steps they carry out have lost meaning.
I have met that attitude in several companies. Echoing Richard Feynman I
call this 'cargo cult systems'
The people like you who are willing to continually learn and to abandon
old ways of work
are invaluable.
I am consulting at the moment with a biotech firm in Denmark. Replying to
Chris Dagdigian, this company does have excellent in-house
Linux skills, so I suppose is the exception to the rule!
Post by Jörg Saßmannshausen
Dear Chris,
Post by Chris Dagdigian
- And if miracles occur and they do have expert level linux people then
more often than not these people are overworked or stretched in many
directions
This is exactly what has happened to me at the old work place: pulled into too
many different directions.
I am a bit surprised about the ZFS experiences. Although I did not have
petabyte of storage and I did not generate 300 TB per week, I did have a
fairly large storage space running on xfs and ext4 for backups and
provisioning of file space. Some of it was running on old hardware (please sit
down, I am talking about me messing around with SCSI cables) and I gradually
upgraded to newer one. So, I am not quite sure what went wrong with the ZFS
storage here.
However, there is a common trend, at least what I observe here in the UK, to
out-source problems: pass the bucket to somebody else and we pay for it.
I am personally still more of an in-house expert than an out-sourced person
who may or may not be able to understand what you are doing.
I should add I am working in academia and I know little about the commercial
world here. Having said that, my friends in commerce are telling me that the
company likes to outsource as it is 'cheaper'.
I agree with the Linux expertise. I think I am one of the two who are Linux
admins in the present work place. The official line is: we do not support Linux
(but we teach it).
Anyhow, I don't want to digress here too much. However, "..do HPC work in
commercial environments where the skills simply don't exist onsite."
Are we a dying art?
My 1 shilling here from a still cold and dark London.
Jörg
Post by Chris Dagdigian
I never used Bright. Touched it and talked to a salesperson at a
conference but I wasn't impressed.
Unpopular opinion: I don't see a point in using "cluster managers"
unless you have a very tiny cluster and zero Linux experience. These
are just Linux boxes with a couple applications (e.g. Slurm) running
on them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the
way more than they help IMO. They are mostly crappy wrappers around
free software (e.g. ISC's dhcpd) anyway. When they aren't it's
proprietary trash.
I install CentOS nodes and use
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs
and
Post by Chris Dagdigian
software. This also means I'm not suck with "node images" and can
instead build everything as plain old text files (read: write
SaltStack states), update them at will, and push changes any time. My
"base image" is CentOS and I need no "baby's first cluster" HPC
software to install/PXEboot it. YMMV
Totally legit opinion and probably not unpopular at all given the user
mix on this list!
The issue here is assuming a level of domain expertise with Linux,
bare-metal provisioning, DevOps and (most importantly) HPC-specific
configStuff that may be pervasive or easily available in your
environment but is often not easily available in a
commercial/industrial environment where HPC or "scientific computing"
is just another business area that a large central IT organization must
support.
If you have that level of expertise available then the self-managed DIY
method is best. It's also my preference
But in the commercial world where HPC is becoming more and more
- Central IT may not actually have anyone on staff who knows Linux (more
common than you expect; I see this in Pharma/Biotech all the time)
- The HPC user base is not given budget or resource to self-support
their own stack because of a drive to centralize IT ops and support
- And if they do have Linux people on staff they may be novice-level
people or have zero experience with HPC schedulers, MPI fabric tweaking
and app needs (the domain stuff)
- And if miracles occur and they do have expert level linux people then
more often than not these people are overworked or stretched in many
directions
So what happens in these environments is that organizations will
willingly (and happily) pay commercial pricing and adopt closed-source
products if they can deliver a measurable reduction in administrative
burden, operational effort or support burden.
This is where Bright, Univa etc. all come in -- you can buy stuff from
them that dramatically reduces that onsite/local IT has to manage the
care and feeding of.
Just having a vendor to call for support on Grid Engine oddities makes
the cost of Univa licensing worthwhile. Just having a vendor like Bright
be on the hook for "cluster operations" is a huge win for an overworked
IT staff that does not have linux or HPC specialists on-staff or easily
available.
My best example of "paying to reduce operational burden in HPC" comes
from a massive well known genome shop in the cambridge, MA area. They
- 300 TB of new data generation per week (many years ago)
- One of the initial storage tiers was ZFS running on commodity server
hardware
- Keeping the DIY ZFS appliances online and running took the FULL TIME
efforts of FIVE STORAGE ENGINEERS
They realized that staff support was not scalable with DIY/ZFS at
300TB/week of new data generation so they went out and bought a giant
EMC Isilon scale-out NAS platform
And you know what? After the Isilon NAS was deployed the management of
*many* petabytes of single-namespace storage was now handled by the IT
Director in his 'spare time' -- And the five engineers who used to do
nothing but keep ZFS from falling over were re-assigned to more
impactful and presumably more fun/interesting work.
They actually went on stage at several conferences and told the story of
how Isilon allowed senior IT leadership to manage petabyte volumes of
data "in their spare time" -- this was a huge deal and really resonated
. Really reinforced for me how in some cases it's actually a good idea
to pay $$$ for commercial stuff if it delivers gains in
ops/support/management.
Sorry to digress! This is a topic near and dear to me. I often have to
do HPC work in commercial environments where the skills simply don't
exist onsite. Or more commonly -- they have budget to buy software or
hardware but they are under a hiring freeze and are not allowed to bring
in new Humans.
Quite a bit of my work on projects like this is helping people make
sober decisions regarding "build" or "buy" -- and in those environments
it's totally clear that for some things it makes sense for them to pay
for an expensive commercially supported "thing" that they don't have to
manage or support themselves
My $.02 ...
_______________________________________________
Computing
Post by Chris Dagdigian
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Chris Samuel
2018-05-04 14:15:32 UTC
Permalink
Post by Chris Dagdigian
- Keeping the DIY ZFS appliances online and running took the FULL TIME
efforts of FIVE STORAGE ENGINEERS
That sounds very fishy. Either they had really flakey hardware or something
else weird was going on there.
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listi
Douglas Eadline
2018-05-03 13:04:38 UTC
Permalink
Here is where I see it going

1. Computer nodes with a base minimal generic Linux OS
(with PR_SET_NO_NEW_PRIVS in kernel, added in 3.5)

2. A Scheduler (that supports containers)

3. Containers (Singularity mostly)

All "provisioning" is moved to the container. There will be edge cases of
course, but applications will be pulled down from
a container repos and "just run"

--
Doug
Post by Jeff White
I never used Bright.  Touched it and talked to a salesperson at a
conference but I wasn't impressed.
Unpopular opinion: I don't see a point in using "cluster managers"
unless you have a very tiny cluster and zero Linux experience.  These
are just Linux boxes with a couple applications (e.g. Slurm) running on
them.  Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way
more than they help IMO.  They are mostly crappy wrappers around free
software (e.g. ISC's dhcpd) anyway.  When they aren't it's proprietary
trash.
I install CentOS nodes and use
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and
software.  This also means I'm not suck with "node images" and can
instead build everything as plain old text files (read: write SaltStack
states), update them at will, and push changes any time.  My "base
image" is CentOS and I need no "baby's first cluster" HPC software to
install/PXEboot it.  YMMV
Jeff White
Post by Robert Taylor
Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
My boss has been looking into it, so I wanted to tap into the
collective HPC consciousness and see
what people think about it.
It appears to do node management, monitoring, and provisioning, so we
would still need a job scheduler like lsf, slurm,etc, as well. Is that
correct?
If you have experience with Bright, let me know. Feel free to contact
me off list or on.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf&d=DwIGaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk&m=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc&s=kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q&e=
--
MailScanner: Clean
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
Doug
--
MailScanner: Clean

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) vis
John Hearns via Beowulf
2018-05-03 13:53:14 UTC
Permalink
I agree with Doug. The way forward is a lightweight OS with containers for
the applications.
I think we need to learn from the new kids on the block - the webscale
generation.
They did not go out and look at how massive supercomputer clusters are put
together.
No, they went out and build scale out applications built on public clouds.
We see 'applications designed to fail' and 'serverless'

Yes, I KNOW that scale out applications like these are Web type
applications, and all application examples you
see are based on the load balancer/web server/database (or whatever style)
paradigm

The art of this will be deploying the more tightly coupled applications
with HPC has,
which depend upon MPI communications over a reliable fabric, which depend
upon GPUs etc.

The other hat I will toss into the ring is separating parallel tasks which
require computation on several
servers and MPI communication between them versus 'embarrassingly parallel'
operations which may run on many, many cores
but do not particularly need communication between them.

The best successes I have seen on clusters is where the heavy parallel
applications get exclusive compute nodes.
Cleaner, you get all the memory and storage bandwidth and easy to clean up.
Hell, reboot the things after each job. You got an exclusive node.
I think many designs of HPC clusters still try to cater for all workloads
- Oh Yes, we can run an MPI weather forecasting/ocean simulation
But at the same time we have this really fast IO system and we can run your
Hadoop jobs.

I wonder if we are going to see a fork in HPC. With the massively parallel
applications being deployed, as Doug says, on specialised
lightweight OSes which have dedicated high speed, reliable fabrics and with
containers.
You won't really be able to manage those systems like individual Linux
servers. Will you be able to ssh in for instance?
ssh assumes there is an ssh daemon running. Does a lightweight OS have ssh?
Authentication Services? The kitchen sink?

The less parallel applications being run more and more on cloud type
installations, either on-premise clouds or public clouds.
I confound myself here, as I cant say what the actual difference between
those two types of machines is, as you always needs
an interconnect fabric and storage, so why not have the same for both types
of tasks.
Maybe one further quip to stimulate some conversation. Silicon is cheap.
No, really it is.
Your friendly Intel salesman may wince when you say that. After all those
lovely Xeon CPUs cost north of 1000 dollars each.
But again I throw in some talking points:

power and cooling costs the same if not more than your purchase cost over
several years

are we exploiting all the capabilities of those Xeon CPUs
Post by Douglas Eadline
Here is where I see it going
1. Computer nodes with a base minimal generic Linux OS
(with PR_SET_NO_NEW_PRIVS in kernel, added in 3.5)
2. A Scheduler (that supports containers)
3. Containers (Singularity mostly)
All "provisioning" is moved to the container. There will be edge cases of
course, but applications will be pulled down from
a container repos and "just run"
--
Doug
I never used Bright. Touched it and talked to a salesperson at a
conference but I wasn't impressed.
Unpopular opinion: I don't see a point in using "cluster managers"
unless you have a very tiny cluster and zero Linux experience. These
are just Linux boxes with a couple applications (e.g. Slurm) running on
them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way
more than they help IMO. They are mostly crappy wrappers around free
software (e.g. ISC's dhcpd) anyway. When they aren't it's proprietary
trash.
I install CentOS nodes and use
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and
software. This also means I'm not suck with "node images" and can
instead build everything as plain old text files (read: write SaltStack
states), update them at will, and push changes any time. My "base
image" is CentOS and I need no "baby's first cluster" HPC software to
install/PXEboot it. YMMV
Jeff White
Post by Robert Taylor
Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
My boss has been looking into it, so I wanted to tap into the
collective HPC consciousness and see
what people think about it.
It appears to do node management, monitoring, and provisioning, so we
would still need a job scheduler like lsf, slurm,etc, as well. Is that
correct?
If you have experience with Bright, let me know. Feel free to contact
me off list or on.
_______________________________________________
Computing
Post by Robert Taylor
To change your subscription (digest mode or unsubscribe) visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.
beowulf.org_mailman_listinfo_beowulf&d=DwIGaQ&c=C3yme8gMkxg_
ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-
BRZ05t9kW9SXZk&m=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc&s=
kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q&e=
--
MailScanner: Clean
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
Doug
--
MailScanner: Clean
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Joe Landman
2018-05-03 14:20:02 UTC
Permalink
I agree with both John and Doug.  I've believed for a long time that
OSes are merely specific details of a particular job, and you should be
ready to change them out at a moments notice, as part of a job.   This
way you can always start in a pristine and identical state across your
fleet of compute nodes.

Moreover, with the emergence of Docker, k8s, and others on Linux, I've
been of the opinion that most of the value of distributions has been
usurped, in that you can craft an ideal environment for your job, which
is then portable across nodes.

Singularity looks like it has done the job correctly as compared to
Docker et al. so now you can far more securely distribute your jobs as
statically linked black boxes to nodes.   All you need is a good
substrate to run them on.

Not astroturfing here ... have a look at
https://github.com/joelandman/nyble , an early stage project of mine*,
which is all about building a PXE or USB bootable substrate system,
based upon your favorite OS (currently supporting Debian9, CentOS7,
others to be added).  No real docs just yet, though I expect to add them
soon.  Basically,

    git clone https://github.com/joelandman/nyble
    cd nyble
    # edit makefile to set the DISTRO= variable, and config/all.conf
    # edit the urls.conf and OS/${DISTRO}/distro_urls.conf as need to
point to local repos and kernel repos
    make

then sane PXE bootable kernel and initramfs appear some time later in
/mnt/root/boot .   My goal here is to make sure we view the substrate OS
as a software appliance substrate upon which to run containers, jobs, etc.

Why this is better than other substrates for Singularity/kvm/etc. comes
down to the fact that you start from a known immutable image.  That is,
you always boot the same image, unless you decide to change it.  You
configure everything you need after boot.  You don't need to worry about
various package manager states and collisions.   You only need to
install what you need for the substrate (HVM, containers, drivers,
etc.).  Also, there is no OS disk management, which for very large
fleets, is an issue.  Roll forwards/backs are trivial and exact, testing
is trivial, and can be done in prod on a VM or canary machine.  This can
easily become part of a CD system, so that your hardware and OS
substrate can be treated as if it were code.  Which is what you want.



* This is a reworking of the SIOS framework from Scalable Informatics
days.  We had used that successfully for years to pxe boot all of our
systems from a single small management node.   Its not indicated there
yet, but it has an apache 2.0 license.  A commit this afternoon should
show this.
Post by John Hearns via Beowulf
I agree with Doug. The way forward is a lightweight OS with containers
for the applications.
I think we need to learn from the new kids on the block - the webscale
generation.
They did not go out and look at how massive supercomputer clusters are
put together.
No, they went out and build scale out applications built on public clouds.
We see 'applications designed to fail' and 'serverless'
Yes, I KNOW that scale out applications like these are Web type
applications, and all application examples you
see are based on the load balancer/web server/database (or whatever
style) paradigm
The art of this will be deploying the more tightly coupled
applications with HPC has,
which depend upon MPI communications over a reliable fabric, which
depend upon GPUs etc.
The other hat I will toss into the ring is separating parallel tasks
which require computation on several
servers and MPI communication between them versus 'embarrassingly
parallel' operations which may run on many, many cores
but do not particularly need communication between them.
The best successes I have seen on clusters is where the heavy parallel
applications get exclusive compute nodes.
Cleaner, you get all the memory and storage bandwidth and easy to
clean up. Hell, reboot the things after each job. You got an exclusive
node.
I think many designs of HPC clusters still try to cater for all
workloads  - Oh Yes, we can run an MPI weather forecasting/ocean
simulation
But at the same time we have this really fast IO system and we can run
your Hadoop jobs.
I wonder if we are going to see a fork in HPC. With the massively
parallel applications being deployed, as Doug says, on specialised
lightweight OSes which have dedicated high speed, reliable fabrics and
with containers.
You won't really be able to manage those systems like individual Linux
servers. Will you be able to ssh in for instance?
ssh assumes there is an ssh daemon running. Does a lightweight OS have
ssh? Authentication Services? The kitchen sink?
The less parallel applications being run more and more on cloud type
installations, either on-premise clouds or public clouds.
I confound myself here, as I cant say what the actual difference
between those two types of machines is, as you always needs
an interconnect fabric and storage, so why not have the same for both
types of tasks.
Maybe one further quip to stimulate some conversation. Silicon is
cheap. No, really it is.
Your friendly Intel salesman may wince when you say that. After all
those lovely Xeon CPUs cost north of 1000 dollars each.
power and cooling costs the same if not more than your purchase cost
over several years
are we exploiting all the capabilities of those Xeon CPUs
Here is where I see it going
1. Computer nodes with a base minimal generic Linux OS
   (with PR_SET_NO_NEW_PRIVS in kernel, added in 3.5)
2. A Scheduler (that supports containers)
3. Containers (Singularity mostly)
All "provisioning" is moved to the container. There will be edge cases of
course, but applications will be pulled down from
a container repos and "just run"
--
Doug
Post by Jeff White
I never used Bright.  Touched it and talked to a salesperson at a
conference but I wasn't impressed.
Unpopular opinion: I don't see a point in using "cluster managers"
unless you have a very tiny cluster and zero Linux experience. 
These
Post by Jeff White
are just Linux boxes with a couple applications (e.g. Slurm) running on
them.  Nothing special. xcat/Warewulf/Scyld/Rocks just get in
the way
Post by Jeff White
more than they help IMO.  They are mostly crappy wrappers
around free
Post by Jeff White
software (e.g. ISC's dhcpd) anyway.  When they aren't it's
proprietary
Post by Jeff White
trash.
I install CentOS nodes and use
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my
configs and
Post by Jeff White
software.  This also means I'm not suck with "node images" and can
instead build everything as plain old text files (read: write SaltStack
states), update them at will, and push changes any time.  My "base
image" is CentOS and I need no "baby's first cluster" HPC software to
install/PXEboot it.  YMMV
Jeff White
Post by Robert Taylor
Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
My boss has been looking into it, so I wanted to tap into the
collective HPC consciousness and see
what people think about it.
It appears to do node management, monitoring, and provisioning,
so we
Post by Jeff White
Post by Robert Taylor
would still need a job scheduler like lsf, slurm,etc, as well.
Is that
Post by Jeff White
Post by Robert Taylor
correct?
If you have experience with Bright, let me know. Feel free to
contact
Post by Jeff White
Post by Robert Taylor
me off list or on.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf&d=DwIGaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk&m=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc&s=kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q&e=
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf&d=DwIGaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk&m=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc&s=kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q&e=>
Post by Jeff White
--
MailScanner: Clean
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
--
Doug
--
MailScanner: Clean
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
Joe Landman
e: ***@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo
Chris Samuel
2018-05-04 14:36:03 UTC
Permalink
Post by John Hearns via Beowulf
The best successes I have seen on clusters is where the heavy parallel
applications get exclusive compute nodes. Cleaner, you get all the memory
and storage bandwidth and easy to clean up. Hell, reboot the things after
each job. You got an exclusive node.
You are describing the BlueGene/Q philosophy there John. :-)

This idea tends to break when you throw GPUs in to the mix as there
(hopefully) you only need a couple of cores on the node to shovel data around
and the GPU does the gruntwork. That means you'll generally have cores left
over that could be doing something useful.

On the cluster I'm currently involved with we've got 36 cores per node and a
pair of P100 GPUs. We have 2 Slurm partitions per node, one for non-GPU jobs
that can only use up to 32 cores per node and another for GPU jobs that has no
restriction. This means we always keep at least 4 cores per node free for
GPU jobs.

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowul
Chris Samuel
2018-05-12 07:33:05 UTC
Permalink
While I’d never claim my pack of beagles is HPC, it does share some aspects
– there’s parallel work going on, the nodes need to be aware of each other
and synchronize their behavior (that is, it’s not an embarrassingly
parallel task that’s farmed out from a queue), and most importantly, the
management has to be scalable. While I might have 4 beagles on the bench
right now – the idea is to scale the approach to hundreds. Typing “sudo
apt-get install tbd-package” on 4 nodes sequentially might be ok (although
pdsh and csshx help a lot) it’s not viable for 100 nodes.
At ${JOB-1) we moved to diskless nodes and booting RAMdisk images from the
management node back in 2013 and it worked really well for us. You no longer
have the issue about nodes getting out of step because one of them was down
when you ran your install of a package across the cluster, removed HDD
failures from the picture (though that's likely less an issue with SSDs these
days) and did I mention the peace of mind of knowing everything is the same?
:-)

It's not new, the Blue Gene systems we had (BG/P 2010-2012 and BG/Q 2012-2016)
booted RAMdisks as they were designed to scale up to huge systems from the
beginning and to try and remove as many points of failure as possible - no
moving parts on the node cards, no local storage, no local state,

Where I am now we're pretty much the same, except instead of booting a pure
RAM disk we boot an initrd that pivots onto an image stored on our Lustre
filesystem instead. These nodes do have local SSDs for local scratch, but
again no real local state.

I think the place where this is going to get hard is on the application side
of things, there were things like Fault-Tolerant MPI (which got subsumed into
Open-MPI) but it still relies on the applications being written to use and
cope with that. Slurm includes fault tolerance support too, in that you can
request an allocation and should a node fail you can have "hot-spare" nodes
replace the dead node but again your application needs to be able to cope with
it!

It's a fascinating subject, and the exascale folks have been talking about it
for a while - LLNL's Dona Crawford keynote was about it at the Slurm User
Group in 2013 and is well worth a read.

https://slurm.schedmd.com/SUG13/keynote.pdf

Slide 21 talks about the reliability/recovery side of things:

# Mean time between failures of minutes or seconds for exascale
[...]
# Need 100X improvement in MTTI so that applications
# can run for many hours. Goal is 10X improvement in
# hardware reliability. Local recovery and migration may
# yield another 10X. However, for exascale, applications
# will need to be fault resilient

She also made the point that checkpoint/restart doesn't scale, you will likely
end up spending all your compute time doing C/R at exascale due to failures
and never actually getting any work done.

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/
Michael Di Domenico
2018-05-14 11:53:02 UTC
Permalink
Post by Chris Samuel
Where I am now we're pretty much the same, except instead of booting a pure
RAM disk we boot an initrd that pivots onto an image stored on our Lustre
filesystem instead. These nodes do have local SSDs for local scratch, but
again no real local state.
Can you expand on "image stored on lustre" part? I'm pretty sure i
understand the gist, but i'd like to know more.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/b
Michael Di Domenico
2018-05-17 12:32:33 UTC
Permalink
The compute nodes boot a RHEL7 kernel with custom initrd, that
includes the necessary OPA and Lustre kernel modules & config
to get the networking working and access the Lustre filesystem,
the kernel then pivots its root filesystem from the initrd to
the master copy on Lustre via overlayfs2 to ensure the compute
node sees it as read/write but without the possibility of it
modifying the master (as the master is read-only in overlayfs2).
Does that help?
it does. the overlayfs part is the interesting bit. i'll have to
read up some about that
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mai
Lux, Jim (337K)
2018-05-15 22:19:38 UTC
Permalink
Yes.. the checkpoint restart thing was discussed on the list some years ago..

The reason I hadn't looked at "diskless boot from a server" is the size of the image - assume you don't have a high bandwidth or reliable link.
While I’d never claim my pack of beagles is HPC, it does share some aspects
– there’s parallel work going on, the nodes need to be aware of each other
and synchronize their behavior (that is, it’s not an embarrassingly
parallel task that’s farmed out from a queue), and most importantly, the
management has to be scalable. While I might have 4 beagles on the bench
right now – the idea is to scale the approach to hundreds. Typing “sudo
apt-get install tbd-package” on 4 nodes sequentially might be ok (although
pdsh and csshx help a lot) it’s not viable for 100 nodes.
At ${JOB-1) we moved to diskless nodes and booting RAMdisk images from the
management node back in 2013 and it worked really well for us. You no longer
have the issue about nodes getting out of step because one of them was down
when you ran your install of a package across the cluster, removed HDD
failures from the picture (though that's likely less an issue with SSDs these
days) and did I mention the peace of mind of knowing everything is the same?
:-)

It's not new, the Blue Gene systems we had (BG/P 2010-2012 and BG/Q 2012-2016)
booted RAMdisks as they were designed to scale up to huge systems from the
beginning and to try and remove as many points of failure as possible - no
moving parts on the node cards, no local storage, no local state,

Where I am now we're pretty much the same, except instead of booting a pure
RAM disk we boot an initrd that pivots onto an image stored on our Lustre
filesystem instead. These nodes do have local SSDs for local scratch, but
again no real local state.

I think the place where this is going to get hard is on the application side
of things, there were things like Fault-Tolerant MPI (which got subsumed into
Open-MPI) but it still relies on the applications being written to use and
cope with that. Slurm includes fault tolerance support too, in that you can
request an allocation and should a node fail you can have "hot-spare" nodes
replace the dead node but again your application needs to be able to cope with
it!

It's a fascinating subject, and the exascale folks have been talking about it
for a while - LLNL's Dona Crawford keynote was about it at the Slurm User
Group in 2013 and is well worth a read.

https://slurm.schedmd.com/SUG13/keynote.pdf

Slide 21 talks about the reliability/recovery side of things:

# Mean time between failures of minutes or seconds for exascale
[...]
# Need 100X improvement in MTTI so that applications
# can run for many hours. Goal is 10X improvement in
# hardware reliability. Local recovery and migration may
# yield another 10X. However, for exascale, applications
# will need to be fault resilient

She also made the point that checkpoint/restart doesn't scale, you will likely
end up spending all your compute time doing C/R at exascale due to failures
and never actually getting any work done.

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mai
Roland Fehrenbacher
2018-05-17 13:00:39 UTC
Permalink
J> The reason I hadn't looked at "diskless boot from a
J> server" is the size of the image - assume you don't have a high
J> bandwidth or reliable link.

This is not something to worry about with Qlustar. A (compressed)
Qlustar 10.0 image containing e.g. the core OS + slurm + OFED + Lustre is
just a mere 165MB to be transferred (eating 420MB of RAM when unpacked
as the OS on the node) from the head to a node. Qlustar (and its
non-public ancestors) were never using anything but RAMDisks (with real
disks for scratch), the first cluster running this at the end of 2001 was on
Athlons ... and eaten-up RAM in the range of 100MB still mattered a lot
at that time :)

So over the years, we perfected our image build mechanism to achieve a
close to minimal (size-wise) OS, minimal in the sense of: Given required
functionality (wanted kernel modules, services, binaries/scripts, libs),
generate an image (module) of minimal size providing it. That is maximum
light-weight by definition.

Yes, I know, you'll probably say "well, but it's just Ubuntu ...". Not for
much longer though: CentOS support (incl. OpenHPC integration) coming
very soon ... And all Open-Source and free.

Best,

Roland

-------
https://www.q-leap.com / https://qlustar.com
--- HPC / Storage / Cloud Linux Cluster OS ---

J> On 5/12/18, 12:33 AM, "Beowulf on behalf of Chris Samuel"
J> <beowulf-***@beowulf.org on behalf of ***@csamuel.org>
J> wrote:

J> On Wednesday, 9 May 2018 2:34:11 AM AEST Lux, Jim (337K)
While I’d never claim my pack of beagles is HPC, it does share
some aspects – there’s parallel work going on, the nodes need to
be aware of each other and synchronize their behavior (that is,
it’s not an embarrassingly parallel task that’s farmed out from a
queue), and most importantly, the management has to be scalable.
While I might have 4 beagles on the bench right now – the idea is
to scale the approach to hundreds. Typing “sudo apt-get install
tbd-package” on 4 nodes sequentially might be ok (although pdsh
and csshx help a lot) it’s not viable for 100 nodes.
J> At ${JOB-1) we moved to diskless nodes and booting RAMdisk
J> images from the management node back in 2013 and it worked
J> really well for us. You no longer have the issue about nodes
J> getting out of step because one of them was down when you ran
J> your install of a package across the cluster, removed HDD
J> failures from the picture (though that's likely less an issue
J> with SSDs these days) and did I mention the peace of mind of
J> knowing everything is the same? :-)

J> It's not new, the Blue Gene systems we had (BG/P 2010-2012
J> and BG/Q 2012-2016) booted RAMdisks as they were designed to
J> scale up to huge systems from the beginning and to try and
J> remove as many points of failure as possible - no moving
J> parts on the node cards, no local storage, no local state,

J> Where I am now we're pretty much the same, except instead of
J> booting a pure RAM disk we boot an initrd that pivots onto an
J> image stored on our Lustre filesystem instead. These nodes
J> do have local SSDs for local scratch, but again no real local
J> state.

J> I think the place where this is going to get hard is on the
J> application side of things, there were things like
J> Fault-Tolerant MPI (which got subsumed into Open-MPI) but it
J> still relies on the applications being written to use and
J> cope with that. Slurm includes fault tolerance support too,
J> in that you can request an allocation and should a node fail
J> you can have "hot-spare" nodes replace the dead node but
J> again your application needs to be able to cope with it!

J> It's a fascinating subject, and the exascale folks have been
J> talking about it for a while - LLNL's Dona Crawford keynote
J> was about it at the Slurm User Group in 2013 and is well
J> worth a read.

J> https://slurm.schedmd.com/SUG13/keynote.pdf

J> Slide 21 talks about the reliability/recovery side of things:

J> # Mean time between failures of minutes or seconds for
J> # exascale
J> [...]
J> # Need 100X improvement in MTTI so that applications can run
J> # for many hours. Goal is 10X improvement in hardware
J> # reliability. Local recovery and migration may yield another
J> # 10X. However, for exascale, applications will need to be
J> # fault resilient

J> She also made the point that checkpoint/restart doesn't
J> scale, you will likely end up spending all your compute time
J> doing C/R at exascale due to failures and never actually
J> getting any work done.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.
Roland Fehrenbacher
2018-05-18 08:36:29 UTC
Permalink
JH> Roland, the OpenHPC integration IS interesting. I am on the
JH> OpenHPC list and look forward to the announcement there.

Yes, we'll post there when ready.
J> The reason I hadn't looked at "diskless boot from a
J> server" is the size of the image - assume you don't have a high
J> bandwidth or reliable link.

R> This is not something to worry about with Qlustar. A (compressed)
R> Qlustar 10.0 image containing e.g. the core OS + slurm + OFED +
R> Lustre is just a mere 165MB to be transferred (eating 420MB of
R> RAM when unpacked as the OS on the node) from the head to a
R> node. Qlustar (and its non-public ancestors) were never using
R> anything but RAMDisks (with real disks for scratch), the first
R> cluster running this at the end of 2001 was on Athlons ... and
R> eaten-up RAM in the range of 100MB still mattered a lot at that
R> time :)

R> So over the years, we perfected our image build mechanism to
R> achieve a close to minimal (size-wise) OS, minimal in the sense
R> of: Given required functionality (wanted kernel modules,
R> services, binaries/scripts, libs), generate an image (module) of
R> minimal size providing it. That is maximum light-weight by
R> definition.

R> Yes, I know, you'll probably say "well, but it's just Ubuntu
R> ...". Not for much longer though: CentOS support (incl. OpenHPC
R> integration) coming very soon ... And all Open-Source and free.

R> Best,

R> Roland

R> ------- https://www.q-leap.com / https://qlustar.com
R> --- HPC / Storage / Cloud Linux Cluster OS ---
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visi
Lux, Jim (337K)
2018-05-20 01:36:18 UTC
Permalink
These are compute nodes in space, but booting over a wireless link from another node many km away.

The data rates for space are surprisingly low - it's a "joules/bit" kind of thing and power is precious.

The relay link between rovers on the surface of Mars and orbiters overhead is a few Mbit/sec at the fastest. Links between Earth and Mars are usually a few kbps from earth to spacecraft (uplink) and a few Mbps back to Earth (downlink).

In Earth orbit, there are some spacecraft with high rate "crosslinks" - Iridium NEXT is a good example - there's a 12.5 Mbps half duplex link between the satellites.

The other problem is that most "remote" protocols are fairly error intolerant - the strategy is usually - if you get an error, just retry the whole thing.
TFTP (used by netboot) uses UDP, so it must have some way to deal with dropped blocks.

For instance, FTP doesn't have a good way to "restart" a file transfer in the middle (although good old Zmodem does<grin>)
I don't know about scp. I think rsync can do restarts.
J> The reason I hadn't looked at "diskless boot from a server" is
J> the size of the image - assume you don't have a high bandwidth or
J> reliable link.
Post by Roland Fehrenbacher
This is not something to worry about with Qlustar. A (compressed)
Qlustar 10.0 image containing e.g. the core OS + slurm + OFED +
Lustre is just a mere 165MB to be transferred (eating 420MB of
RAM
J> 165 MB = 1.3 Gbit At 64 kbps that's about 6 hrs.

Ouch. Sure, with 64 kbps you've had it. Wouldn't have expected that kind
of throughput at NASA in 2018, or are these compute nodes in space that
you want to boot from a head-node in Houston :)
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowul
John Hearns via Beowulf
2018-05-09 06:43:02 UTC
Permalink
All of a sudden simple “send the same command to all nodes” just doesn’t
work. And that’s what will inevitably be the case as we scale up in the
HPC world – there will always be dead or malfunctioning nodes.

Jim, this is true. And 'we' should be looking to the webscale generation
for the answers. They thought about computing at scale from the beginning.

Regarding hardware failures, I heard a shaggy dog story that
Microsoft/Amazon/Google order servers ready racked in shipping containers.
When a certain proportion of servers are dead, they simply close it down
and move on.
Can anyone confirm or deny this story?

Which brings me to another one of my hobby horses - the environmental costs
of HPC. When pitching HPC clusters you often put in an option for a
mid-life upgrade. I think upping the RAM is quite common, but processors
and interconnect much less so.

So kit is hopefulyl worked hard for five years, till the cost of power and
cooling is outweighed by the performance of a new generation. But where
does the kit get recycled? Again when pitching clusters you have to put in
guarantees about WEE (or the equivalent in the USA)
*Date: *Thursday, May 3, 2018 at 6:54 AM
*Subject: *Re: [Beowulf] Bright Cluster Manager
I agree with Doug. The way forward is a lightweight OS with containers for
the applications.
I think we need to learn from the new kids on the block - the webscale
generation.
They did not go out and look at how massive supercomputer clusters are put
together.
No, they went out and build scale out applications built on public clouds.
We see 'applications designed to fail' and 'serverless'
Yes, I KNOW that scale out applications like these are Web type
applications, and all application examples you
see are based on the load balancer/web server/database (or whatever style)
paradigm
The art of this will be deploying the more tightly coupled applications
with HPC has,
which depend upon MPI communications over a reliable fabric, which depend
upon GPUs etc.
The other hat I will toss into the ring is separating parallel tasks which
require computation on several
servers and MPI communication between them versus 'embarrassingly
parallel' operations which may run on many, many cores
but do not particularly need communication between them.
The best successes I have seen on clusters is where the heavy parallel
applications get exclusive compute nodes.
Cleaner, you get all the memory and storage bandwidth and easy to clean
up. Hell, reboot the things after each job. You got an exclusive node.
I think many designs of HPC clusters still try to cater for all workloads
- Oh Yes, we can run an MPI weather forecasting/ocean simulation
But at the same time we have this really fast IO system and we can run
your Hadoop jobs.
I wonder if we are going to see a fork in HPC. With the massively parallel
applications being deployed, as Doug says, on specialised
lightweight OSes which have dedicated high speed, reliable fabrics and
with containers.
You won't really be able to manage those systems like individual Linux
servers. Will you be able to ssh in for instance?
ssh assumes there is an ssh daemon running. Does a lightweight OS have
ssh? Authentication Services? The kitchen sink?
The less parallel applications being run more and more on cloud type
installations, either on-premise clouds or public clouds.
I confound myself here, as I cant say what the actual difference between
those two types of machines is, as you always needs
an interconnect fabric and storage, so why not have the same for both
types of tasks.
Maybe one further quip to stimulate some conversation. Silicon is cheap.
No, really it is.
Your friendly Intel salesman may wince when you say that. After all those
lovely Xeon CPUs cost north of 1000 dollars each.
power and cooling costs the same if not more than your purchase cost over
several years
are we exploiting all the capabilities of those Xeon CPUs
And another aspect of this - I’ve been doing stuff with “loose clusters”
of low capability processors (Arduino, Rpi, Beagle) doing distributed
sensing kinds of tasks – leaving aside the Arduino (no OS) – the other two
wind up with some flavor of Debian but often with lots of stuff you don’t
need (i.e. Apache). Once you’ve fiddled with one node to get the
configuration right, you want to replicate it across a bunch of nodes –
right now that means sneakernet of SD cards - although in theory, one
should be able to push an image out to the local file system (typically 4GB
eMMC in the case of beagles), and tell it to write that to the “boot area”
– but I’ve not tried it.
While I’d never claim my pack of beagles is HPC, it does share some
aspects – there’s parallel work going on, the nodes need to be aware of
each other and synchronize their behavior (that is, it’s not an
embarrassingly parallel task that’s farmed out from a queue), and most
importantly, the management has to be scalable. While I might have 4
beagles on the bench right now – the idea is to scale the approach to
hundreds. Typing “sudo apt-get install tbd-package” on 4 nodes
sequentially might be ok (although pdsh and csshx help a lot) it’s not
viable for 100 nodes.
The other aspect of my application that’s interesting, and applicable to
exascale kinds of problems, is tolerance to failures – if I have a low data
rate link among nodes (with not necessarily all to all connectivity), one
can certainly distribute a new OS image (or container) with time. There’s
some ways to deal with errors in the transfers (other than just retransmit
all – which doesn’t work if the error rate is high enough that you can
guarantee at least one error will occur in a long transfer). But how do
you **manage** a cluster with hundreds or thousands of nodes where some
fail randomly, reset randomly, etc.
All of a sudden simple “send the same command to all nodes” just doesn’t
work. And that’s what will inevitably be the case as we scale up in the
HPC world – there will always be dead or malfunctioning nodes.
Here is where I see it going
1. Computer nodes with a base minimal generic Linux OS
(with PR_SET_NO_NEW_PRIVS in kernel, added in 3.5)
2. A Scheduler (that supports containers)
3. Containers (Singularity mostly)
All "provisioning" is moved to the container. There will be edge cases of
course, but applications will be pulled down from
a container repos and "just run"
--
Doug
I never used Bright. Touched it and talked to a salesperson at a
conference but I wasn't impressed.
Unpopular opinion: I don't see a point in using "cluster managers"
unless you have a very tiny cluster and zero Linux experience. These
are just Linux boxes with a couple applications (e.g. Slurm) running on
them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way
more than they help IMO. They are mostly crappy wrappers around free
software (e.g. ISC's dhcpd) anyway. When they aren't it's proprietary
trash.
I install CentOS nodes and use
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and
software. This also means I'm not suck with "node images" and can
instead build everything as plain old text files (read: write SaltStack
states), update them at will, and push changes any time. My "base
image" is CentOS and I need no "baby's first cluster" HPC software to
install/PXEboot it. YMMV
Jeff White
Post by Robert Taylor
Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
My boss has been looking into it, so I wanted to tap into the
collective HPC consciousness and see
what people think about it.
It appears to do node management, monitoring, and provisioning, so we
would still need a job scheduler like lsf, slurm,etc, as well. Is that
correct?
If you have experience with Bright, let me know. Feel free to contact
me off list or on.
_______________________________________________
Computing
Post by Robert Taylor
To change your subscription (digest mode or unsubscribe) visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.
beowulf.org_mailman_listinfo_beowulf&d=DwIGaQ&c=C3yme8gMkxg_
ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-
BRZ05t9kW9SXZk&m=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc&s=
kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q&e=
--
MailScanner: Clean
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
Doug
--
MailScanner: Clean
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Chris Samuel
2018-05-04 14:43:35 UTC
Permalink
Post by Douglas Eadline
Here is where I see it going
1. Computer nodes with a base minimal generic Linux OS
(with PR_SET_NO_NEW_PRIVS in kernel, added in 3.5)
Depends on your containerisation method, some don't need to rely on that as
the proactively disarm containers of dangerous abilities (setuid/setgid/
capabilities) before the user gets near them.

That said, even RHEL6 has support for that, so you'd be hard pressed to find an
up-to-date system that doesn't have that ability.
Post by Douglas Eadline
2. A Scheduler (that supports containers)
3. Containers (Singularity mostly)
All "provisioning" is moved to the container. There will be edge cases of
course, but applications will be pulled down from
a container repos and "just run"
This then relies on people building containers that have the right libraries
for the hardware you are using. For instance I tried to use some Singularity
containers on our system for MPI work but can't because the base OS is too old
to include support for our OmniPath interconnect.

The other issue is that it encourages people to build generic binaries rather
than optimised binaries to broaden the systems the container can run on and/or
because they don't have a proprietary compiler (or the distro has a version of
GCC too old to optimise for the hardware).

I would argue that there is a place for that sort of work, but that it's the
cloud not so much HPC (as they're not trying to get the most out of the
hardware).

I'm conflicted on this because I also have great sympathies for the
reproducibility side of the coin!

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listin
Douglas Eadline
2018-05-04 20:14:22 UTC
Permalink
Good points. I should have mentioned I was talking more about
"generic mainstream HPC" (like you say "cloud")
and not the performance cases where running
on bare metal is essential.

--
Doug
Post by Chris Samuel
Post by Douglas Eadline
Here is where I see it going
1. Computer nodes with a base minimal generic Linux OS
(with PR_SET_NO_NEW_PRIVS in kernel, added in 3.5)
Depends on your containerisation method, some don't need to rely on that as
the proactively disarm containers of dangerous abilities (setuid/setgid/
capabilities) before the user gets near them.
That said, even RHEL6 has support for that, so you'd be hard pressed to find an
up-to-date system that doesn't have that ability.
Post by Douglas Eadline
2. A Scheduler (that supports containers)
3. Containers (Singularity mostly)
All "provisioning" is moved to the container. There will be edge cases of
course, but applications will be pulled down from
a container repos and "just run"
This then relies on people building containers that have the right libraries
for the hardware you are using. For instance I tried to use some Singularity
containers on our system for MPI work but can't because the base OS is too old
to include support for our OmniPath interconnect.
The other issue is that it encourages people to build generic binaries rather
than optimised binaries to broaden the systems the container can run on and/or
because they don't have a proprietary compiler (or the distro has a version of
GCC too old to optimise for the hardware).
I would argue that there is a place for that sort of work, but that it's the
cloud not so much HPC (as they're not trying to get the most out of the
hardware).
I'm conflicted on this because I also have great sympathies for the
reproducibility side of the coin!
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
MailScanner: Clean
--
Doug
--
MailScanner: Clean

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/li
Chris Samuel
2018-05-04 14:30:26 UTC
Permalink
Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way more than
they help IMO.
To my mind having built clusters with xCAT and then used systems that have
been done in a DIY manner I always run into tooling that I'm missing with the
latter. Usually around node discovery (and BMC config), centralised logging and
IPMI/HMC tooling (remote power control, SoL console logging, IPMI sensor
information, event logs, etc).

Yes you can roll your own there, but having a consistent toolset that takes
the drudgery out of rolling your own and means you don't need to think "wait,
is this an IPMI v2 node or managed via an HMC?" and then use different methods
depending on the answer is a big win.

It's the same reason things like EasyBuild and Spack exist; we've spent
decades building software from scratch and creating little shell scripts to do
the config/build for each new version, but abstracting that and building a
framework to make it easy is a good thing at scale. It also means you can
add things like checksums for tarballs and catch projects that re-release
their 1.7.0 tarball with new patches without changing the version number (yes,
TensorFlow, I'm looking at you).

But unpopular opinions are good, and the great thing about the Beowulf
philosophy is that there is the ability to do things your own way. It's like
building a Linux system with Linux From Scratch, yes you could install Ubuntu
or some other distro that makes it easy but you learn a hell of a lot from
doing it the hard way - and anyone with a strong interest in Linux should try
that at least once in their life.

Aside: Be aware if you are using Puppet that some folks on the Slurm list have
found that when it runs it can move HPC jobs out of the Slurm control group.

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) vis
Jörg Saßmannshausen
2018-05-02 20:48:10 UTC
Permalink
Dear all,

at least something I can contribute here: at the new work place the small
cluster I am looking after is using Bright Cluster Manager to manage the 20
nodes and the 10 or so GPU nodes.

I was not around when it all got installed so I cannot comment on how quickly
it can be done or how easily.

I used to do larger installations with up to 112 compute nodes which have
different physical hardware. So I needed at least 2 images. I done all of that
with a bit of scripting and not with a GUI. I did not use LDAP and
authentication was done locally. It all provided a robust system. Maybe not as
easy to manage as a system which got a GUI which does it all for you but on
the flip side I knew exactly what the scripts were doing and what I need to do
if there was a problem.

By enlarge I agree with what John Hearns said for example. To be frank: I
still consider the Bright Cluster Manager tool to be good for people who do
not know about HPC (I stick to that for this argument), don't know much about
Linux etc. So in my personal opinion it is good for those who's day-to-day job
is not HPC but something different. People who are coming from a GUI world (I
don't mean that nasty here). For situations where it does not reckon to have a
dedicated support. So for this it is fantastic: it works, there is a good
support if things go wrong.
We are using SLURM and the only issue I had when I first started at the new
place a year ago that during a routine update SLRUM got re-installed and all
the configurations were gone. This could be as it was not installed properly in
the first place, it could be a bug, we don't know as the support did not manage
to reproduce this.
I am having some other minor issues with the authentication (we are
authenticating against external AD) but again that could be the way it was
installed at the time. I don't know who done that.

Having said all of that: I am personally more a hands-on person so I know what
the system is doing. This usually gets obscured by a GUI which does things in
the background you may or may not want it to do. I had some problems at the
old work place with ROCKS which lead me to removing it and install Debian on
the clusters. They were working rock solid, even on hardware which had issues
with the ROCKS installation.

So, for me the answer to the question is: it depends: If you got a capable HPC
admin who is well networked and you got a larger, specialized cluster, you
might be better off to use the money and buy some additional compute nodes.
For a installation where you do not have a dedicated admin, and you might have
a smaller, homogeneous installation, you might be better off with a cluster
management tool light the one Bright is offering.
If money is an issue, you need to carefully balance the two: a good HPC admin
does more than installing software, they do user support as well for example
and make sure users can work. If you are lucky, you get one who actually
understands what the users are doing.

I think that is basically what everybody here says in different words: your
mileage will vary.

My to shillings from a rather cold London! :-)

Jörg
Post by Robert Taylor
Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
My boss has been looking into it, so I wanted to tap into the collective
HPC consciousness and see
what people think about it.
It appears to do node management, monitoring, and provisioning, so we would
still need a job scheduler like lsf, slurm,etc, as well. Is that correct?
If you have experience with Bright, let me know. Feel free to contact me
off list or on.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman
Douglas Eadline
2018-05-03 21:12:53 UTC
Permalink
And, I forgot to mention, the other important aspect here is
reproducibility. Create/modify a code, put it in
a signed container (like Singularity), use it,
write the paper. Five years later the machine on which it ran is
gone, your new grad student wants to re-run some data. Easy, because it is
in a container just run it on any system that supports your
containers. No need to ask a kindly sysadmin to help you track down
libraries, compile, and run an older code.

--
Doug
Post by Douglas Eadline
Here is where I see it going
1. Computer nodes with a base minimal generic Linux OS
(with PR_SET_NO_NEW_PRIVS in kernel, added in 3.5)
2. A Scheduler (that supports containers)
3. Containers (Singularity mostly)
All "provisioning" is moved to the container. There will be edge cases
of course, but applications will be pulled down from
Post by Douglas Eadline
a container repos and "just run"
--
Doug
I never used Bright.  Touched it and talked to a salesperson at a
conference but I wasn't impressed.
Post by Douglas Eadline
Unpopular opinion: I don't see a point in using "cluster managers"
unless you have a very tiny cluster and zero Linux experience. 
These are just Linux boxes with a couple applications (e.g. Slurm)
running on them.  Nothing special. xcat/Warewulf/Scyld/Rocks just
get in the way more than they help IMO.  They are mostly crappy
wrappers around free software (e.g. ISC's dhcpd) anyway.  When they
aren't it's
Post by Douglas Eadline
proprietary
trash.
I install CentOS nodes and use
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and
software.  This also means I'm not suck with "node images" and can
instead build everything as plain old text files (read: write SaltStack
states), update them at will, and push changes any time.  My "base
image" is CentOS and I need no "baby's first cluster" HPC software to
install/PXEboot it.  YMMV
Post by Douglas Eadline
Jeff White
Post by Robert Taylor
Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
My boss has been looking into it, so I wanted to tap into the
collective HPC consciousness and see
what people think about it.
It appears to do node management, monitoring, and provisioning, so we
would still need a job scheduler like lsf, slurm,etc, as well. Is that
correct?
Post by Douglas Eadline
Post by Robert Taylor
If you have experience with Bright, let me know. Feel free to contact
me off list or on.
Post by Douglas Eadline
Post by Robert Taylor
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf&d=DwIGaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk&m=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc&s=kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q&e=
Post by Douglas Eadline
--
MailScanner: Clean
_______________________________________________
Computing To change your subscription (digest mode or unsubscribe)
visit
Post by Douglas Eadline
http://www.beowulf.org/mailman/listinfo/beowulf
--
Doug
--
MailScanner: Clean
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
Post by Douglas Eadline
http://www.beowulf.org/mailman/listinfo/beowulf
--
Doug
--
Doug
--
MailScanner: Clean

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.be
Continue reading on narkive:
Loading...