[Beowulf] cluster deployment and config management

Discussion:

Stu Midgley

2017-09-05 05:24:27 UTC

Morning everyone

I am in the process of redeveloping our cluster deployment and config
management environment and wondered what others are doing?

First, everything we currently have is basically home-grown.

Our cluster deployment is a system that I've developed over the years and
is pretty simple - if you know BASH and how pxe booting works. It has
everything from setting the correct parameters in the bios, zfs ram disks
for the OS, lustre for state files (usually in /var) - all in the initrd.

We use it to boot cluster nodes, lustre servers, misc servers and desktops.

We basically treat everything like a cluster.

However... we do have a proliferation of images... and all need to be kept
up-to-date and managed. Most of the changes from one image to the next are
config files.

We don't have a good config management (which might, hopefully, reduce the
number of images we need). We tried puppet, but it seems everyone hates
it. Its too complicated? Not the right tool?

I was thinking of using git for config files, dumping a list of rpm's,
dumping the active services from systemd and somehow munging all that
together in the initrd. ie. git checkout the server to get config files
and systemctl enable/start the appropriate services etc.

It started to get complicated.

Any feedback/experiences appreciated. What works well? What doesn't?

Thanks.

--
Dr Stuart Midgley
***@gmail.com

Lachlan Musicman

2017-09-05 05:46:51 UTC

Permalink

Post by Stu Midgley
Morning everyone
I am in the process of redeveloping our cluster deployment and config
management environment and wondered what others are doing?
First, everything we currently have is basically home-grown.
Our cluster deployment is a system that I've developed over the years and
is pretty simple - if you know BASH and how pxe booting works. It has
everything from setting the correct parameters in the bios, zfs ram disks
for the OS, lustre for state files (usually in /var) - all in the initrd.
We use it to boot cluster nodes, lustre servers, misc servers and desktops.
We basically treat everything like a cluster.
However... we do have a proliferation of images... and all need to be kept
up-to-date and managed. Most of the changes from one image to the next are
config files.
We don't have a good config management (which might, hopefully, reduce the
number of images we need). We tried puppet, but it seems everyone hates
it. Its too complicated? Not the right tool?
I was thinking of using git for config files, dumping a list of rpm's,
dumping the active services from systemd and somehow munging all that
together in the initrd. ie. git checkout the server to get config files
and systemctl enable/start the appropriate services etc.
It started to get complicated.
Any feedback/experiences appreciated. What works well? What doesn't?

We are a small installation, with manageable needs. In our first step up
from where you are, we ended up on:

- Katello/Foreman (in RedHat it's called Satellite) for management of
software repositories, in discrete sets and slices. We started with
Spacewalk but it is a little old and fusty and just isn't appropriate
anymore.
- git for config management of environment module files
- Ansible for easy day to day management of servers

We no longer manage configs as such, since there is a shared data store,
and the Ansible/Katello mix means we can rebuild any server from scratch.

Note that Ansible and Katello/Foreman can be integrated - we haven't gone
that far yet. Are quite happy with the two being apart. That will change in
the near future I think.

Cheers
L.

------
"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics
is the insistence that we cannot ignore the truth, nor should we panic
about it. It is a shared consciousness that our institutions have failed
and our ecosystem is collapsing, yet we are still here â and we are
creative agents who can shape our destinies. Apocalyptic civics is the
conviction that the only way out is through, and the only way through is
together. "

*Greg Bloom* @greggish
https://twitter.com/greggish/status/873177525903609857

Stu Midgley

2017-09-05 06:43:56 UTC

Permalink

Interesting. Ansible has come up a few times.

Our largest cluster is 2000 KNL nodes and we are looking towards 10k... so
it needs to scale well :)

Post by Lachlan Musicman

Post by Stu Midgley
Morning everyone
I am in the process of redeveloping our cluster deployment and config
management environment and wondered what others are doing?
First, everything we currently have is basically home-grown.
Our cluster deployment is a system that I've developed over the years and
is pretty simple - if you know BASH and how pxe booting works. It has
everything from setting the correct parameters in the bios, zfs ram disks
for the OS, lustre for state files (usually in /var) - all in the initrd.
We use it to boot cluster nodes, lustre servers, misc servers and desktops.
We basically treat everything like a cluster.
However... we do have a proliferation of images... and all need to be
kept up-to-date and managed. Most of the changes from one image to the
next are config files.
We don't have a good config management (which might, hopefully, reduce
the number of images we need). We tried puppet, but it seems everyone
hates it. Its too complicated? Not the right tool?
I was thinking of using git for config files, dumping a list of rpm's,
dumping the active services from systemd and somehow munging all that
together in the initrd. ie. git checkout the server to get config files
and systemctl enable/start the appropriate services etc.
It started to get complicated.
Any feedback/experiences appreciated. What works well? What doesn't?

We are a small installation, with manageable needs. In our first step up
- Katello/Foreman (in RedHat it's called Satellite) for management of
software repositories, in discrete sets and slices. We started with
Spacewalk but it is a little old and fusty and just isn't appropriate
anymore.
- git for config management of environment module files
- Ansible for easy day to day management of servers
We no longer manage configs as such, since there is a shared data store,
and the Ansible/Katello mix means we can rebuild any server from scratch.
Note that Ansible and Katello/Foreman can be integrated - we haven't gone
that far yet. Are quite happy with the two being apart. That will change in
the near future I think.
Cheers
L.
------
"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic
civics is the insistence that we cannot ignore the truth, nor should we
panic about it. It is a shared consciousness that our institutions have
failed and our ecosystem is collapsing, yet we are still here â and we are
creative agents who can shape our destinies. Apocalyptic civics is the
conviction that the only way out is through, and the only way through is
together. "
status/873177525903609857

--
Dr Stuart Midgley
***@gmail.com

Carsten Aulbert

2017-09-05 06:57:48 UTC

Permalink

Interesting. Ansible has come up a few times.
Our largest cluster is 2000 KNL nodes and we are looking towards 10k...
so it needs to scale well :)

--
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
Callinstraße 38, 30167 Hannover, Germany
Phone: +49 511 762 17185
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit h

John Hearns via Beowulf

2017-09-05 07:21:19 UTC

Permalink

Fusty? Lachlan - you really are fromt he Western Isles aren't you?

Another word: 'oose' - the fluff which collects under the bed. Or inside
servers.

Post by Carsten Aulbert
Hi

Post by Stu Midgley
Interesting. Ansible has come up a few times.
Our largest cluster is 2000 KNL nodes and we are looking towards 10k...
so it needs to scale well :)

We went with ansible at the end of 2015 until we hit a road block with
it not using a client daemon a fat ferew months. When having a few 1000
states to perform on each client, the lag for initiating the next state
centrally from the server was quite noticeable - in the end a single run
took more than half an hour without any changes (for a single host!).
After that we re-evaluated with salt stack being the outcome scaling
well enough for our O(2500) clients.
Note, I ave not tracked if and how ansible progressed over the past
~2yrs which may or may not exhibit the same problems today.
Cheers
Carsten
--
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany
Phone: +49 511 762 17185
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Lachlan Musicman

2017-09-05 09:06:25 UTC

Permalink

Sure am!

Re scale, I can't speak to that because we just don't have that size. But
Ansible has been bought/absorbed into Redhat now, so the Ansible Tower
infrastructure may scale. You would need to test :)

cheers
L.

------
"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics
is the insistence that we cannot ignore the truth, nor should we panic
about it. It is a shared consciousness that our institutions have failed
and our ecosystem is collapsing, yet we are still here â and we are
creative agents who can shape our destinies. Apocalyptic civics is the
conviction that the only way out is through, and the only way through is
together. "

*Greg Bloom* @greggish
https://twitter.com/greggish/status/873177525903609857

Post by John Hearns via Beowulf
Fusty? Lachlan - you really are fromt he Western Isles aren't you?
Another word: 'oose' - the fluff which collects under the bed. Or inside
servers.

Post by Carsten Aulbert
Hi

Post by Stu Midgley
Interesting. Ansible has come up a few times.
Our largest cluster is 2000 KNL nodes and we are looking towards 10k...
so it needs to scale well :)

_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Rémy Dernat

2017-09-05 11:52:50 UTC

Permalink

Hi,

Post by Carsten Aulbert
Hi

Post by Stu Midgley
Interesting. Ansible has come up a few times.
Our largest cluster is 2000 KNL nodes and we are looking towards 10k...
so it needs to scale well :)

+1 for SaltStack here. It really performs very well on large
infrastructure (from doc.
https://docs.saltstack.com/en/latest/topics/tutorials/intro_scale.html )
and allows complex rules with reactors and orchestrators (including some
ways to manage post-reboot/connections).

There is also a github project which allows to deploy a cluster from
scratch with SaltStack, on a CentOS base, with PXE, dhcp, dns,
kickstart... :
https://github.com/oxedions/banquise/

Personnally, I will use (it works, but it needs some additionnal tests)
SaltStack with FAI ( https://fai-project.org/ ) to deploy my nodes. Or
maybe, I will switch to banquise, but for now, this project is still a
bit too young and I need a debian base OS (but I know it is planned;
waiting for the preseed config management through Salt). I am using
gitfs as a SaltStack backend and I also have some configs files as
another git repository (eg : environment modules files).

Best regards,
Rémy.

Post by Carsten Aulbert
Note, I ave not tracked if and how ansible progressed over the past
~2yrs which may or may not exhibit the same problems today.
Cheers
Carsten

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowul

Arif Ali

2017-09-05 15:31:00 UTC

Permalink

Post by RÃ©my Dernat

Post by Carsten Aulbert

Interesting.Â Ansible has come up a few times.
Our largest cluster is 2000 KNL nodes and we are looking towards 10k...
so it needs to scale well :)

Very Interesting, as we have been deploying clusters using SaltStack
too, and has played very well for us. We haven't toyed with reactors or
orchestrators yet, but that's probably the next phase for us.

I have looked at Ansible a couple of years back, and we then went the
SaltStack route, after doing some internal testing

Post by RÃ©my Dernat
There is also a github project which allows to deploy a cluster from
scratch with SaltStack, on a CentOS base, with PXE, dhcp, dns,
https://github.com/oxedions/banquise/

Interesting project, for provisioning setup using Salt ;). We use xCAT
to do our provisioning side, and then let SaltStack take over after
that. We have internally developed a xCAT external pillar, that allows
us to use the xCAT information in SaltStack to do various things.

We heavily utilise Salt formulas, and developed home grown formulas too,
specifically for HPC type installations.

It's refreshing to see that there are other people in the community,
that are using SaltStack to do their HPC configurations/installations

--
regards,

Arif Ali

Gavin W. Burris

2017-09-12 16:07:42 UTC

Permalink

Hi, All.

We use a minimal pxe kickstart for hardware nodes, then Ansible after that. It is a thing of beauty to have the entire cluster defined in one git repo. This also lends itself to configuring cloud node images with the exact same code. Reusable roles and conditionals, FTW!

With regards to scaling, Ansible will by default fork only 8 parallel processes. This can be scaled way up, maybe hundreds at a time. If there are thousands of states / idempotent plays to run on a single host, those are going to take some time regardless of the configuration language, correct? A solution would be to tag up the plays and only run required tags during an update, versus a full run on fresh installs. The fact caching feature may help here. SSH accelerated mode or pipelining are newer feature, too, which will reduce the number of new connections required, a big time saver.

Cheers.

Post by Carsten Aulbert
Hi

Interesting. Ansible has come up a few times.
Our largest cluster is 2000 KNL nodes and we are looking towards 10k...
so it needs to scale well :)

We went with ansible at the end of 2015 until we hit a road block with
it not using a client daemon a fat ferew months. When having a few 1000
states to perform on each client, the lag for initiating the next state
centrally from the server was quite noticeable - in the end a single run
took more than half an hour without any changes (for a single host!).
After that we re-evaluated with salt stack being the outcome scaling
well enough for our O(2500) clients.
Note, I ave not tracked if and how ansible progressed over the past
~2yrs which may or may not exhibit the same problems today.
Cheers
Carsten
--
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
Callinstraße 38, 30167 Hannover, Germany
Phone: +49 511 762 17185
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania
Search our documentation: http://research-it.wharton.upenn.edu/about/
Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Andrew Latham

2017-09-12 16:16:25 UTC

Permalink

As Gavin mentioned there are options to configure for Ansible et al.

With any configuration management tool de jour you can configure your DHCP,
TFTP, and HTTP for PXE and then enforce a desired state on the nodes. I am
a developer on several configuration management tools and from the mailing
list the speed issues come from people doing things the hard way.

If you need an API, try Open Stack Ironic

Saltstack has a GUI if you need warm and fuzzy

Python Fabric + expect can do anything at speed but will require some
thinking.

Post by Gavin W. Burris
Hi, All.
We use a minimal pxe kickstart for hardware nodes, then Ansible after
that. It is a thing of beauty to have the entire cluster defined in one
git repo. This also lends itself to configuring cloud node images with the
exact same code. Reusable roles and conditionals, FTW!
With regards to scaling, Ansible will by default fork only 8 parallel
processes. This can be scaled way up, maybe hundreds at a time. If there
are thousands of states / idempotent plays to run on a single host, those
are going to take some time regardless of the configuration language,
correct? A solution would be to tag up the plays and only run required
tags during an update, versus a full run on fresh installs. The fact
caching feature may help here. SSH accelerated mode or pipelining are
newer feature, too, which will reduce the number of new connections
required, a big time saver.
Cheers.

Post by Carsten Aulbert
Hi

Post by Stu Midgley
Interesting. Ansible has come up a few times.
Our largest cluster is 2000 KNL nodes and we are looking towards 10k...
so it needs to scale well :)

http://www.beowulf.org/mailman/listinfo/beowulf
--
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania
Search our documentation: http://research-it.wharton.upenn.edu/about/
Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

--
- Andrew "lathama" Latham ***@gmail.com http://lathama.com
<http://lathama.org> -

Joe Landman

2017-09-05 12:20:03 UTC

Permalink

Good morning ...

Nothing wrong with this, if it adequately solves the problem. Many of
the frameworks people use for these things are highly opinionated, and
often, you'll find their opinions grate on your expectations. At
$dayjob-1, I developed our own kit precisely because so many of the
other toolkits did little to big things wrong; not simply from an
opinion point of view, but actively made specific errors that the
developers glossed over as that aspect was unimportant to them ... while
being of critical importance to me and my customers at the time.

Post by Stu Midgley
Our cluster deployment is a system that I've developed over the years
and is pretty simple - if you know BASH and how pxe booting works. It
has everything from setting the correct parameters in the bios, zfs
ram disks for the OS, lustre for state files (usually in /var) - all
in the initrd.
We use it to boot cluster nodes, lustre servers, misc servers and desktops.
We basically treat everything like a cluster.

The most competent baked distro out there for this was (in the past,
haven't used it recently) Warewulf. See https://github.com/warewulf/ .
Still under active development, and Greg and team do a generally great
job. Least opinionated distro around, most flexible, and some of the
best tooling.

Post by Stu Midgley
However... we do have a proliferation of images... and all need to be
kept up-to-date and managed. Most of the changes from one image to
the next are config files.

Ahhh ... One of the things we did with our toolchain (it is open source,
I've just never pushed it to github) was to completely separate booting
from configuration. That is, units booted to an operational state
before we applied configuration. This was in part due to long
experience with nodes hanging during bootup with incorrect
configurations. If you minimize the chance for this, your nodes
(barring physical device failure) always boot. The only specific
opinion we had w.r.t. this system was that the nodes had to be bootable
via PXE, and there fore a working dhcp needed to exist on the network.

Post boot configuration, we drove via a script that downloaded and
launched other scripts. Since we PXE booted, network addresses were
fine. We didn't even enforce final network address determination on PXE
startup.

We looked at the booting process as a state machine. Lower level was
raw hardware, no power. Subsequent levels were bios POST, PXE of
kernel, configuration phase. During configuration phase *everything*
was on the table w.r.t. changes. We could (and did) alter networking,
using programmatic methods, databases, etc. to determine and configure
final network configs. Same for disks, and other resources.

Configuration changes could be pushed post boot by updating a script and
either pushing (not normally recommended for clusters of reasonable
size) or triggering a pull/run cycle for that script/dependencies.

This allowed us to update images and configuration asynchronously.

We had to manage images, but this turned out to be generally simple. I
was in the midst of putting image mappings into a distributed object
store when the company died. Config store is similarly simple, again
using the same mechanisms, and could be driven entirely programmatically.

Of course, for the chef/puppet/ansible/salt/cloudformation/... people,
we could drive their process as well.

Post by Stu Midgley
We don't have a good config management (which might, hopefully, reduce
the number of images we need). We tried puppet, but it seems everyone
hates it. Its too complicated? Not the right tool?

Highly opinionated config management is IMO (and yes, I am aware this is
redundant humor) generally a bad idea. Config management that gets out
of your way until you need it is the right approach. Which is why we
never tried to dictate what config management our users would use. We
simply handled getting the system up to an operational state, and they
could use ours, theirs, or Frankensteinian kludges.

Post by Stu Midgley
I was thinking of using git for config files, dumping a list of rpm's,
dumping the active services from systemd and somehow munging all that
together in the initrd. ie. git checkout the server to get config
files and systemctl enable/start the appropriate services etc.
It started to get complicated.
Any feedback/experiences appreciated. What works well? What doesn't?

IMO things that tie together config and booting are problematic at
scale. Leads to nearly unmanageable piles of images, as you've
experienced. Booting to an operational state, and applying all config
post boot (ask me about my fstab replacement some day), makes for a very
nice operational solution that scales wonderfully .... you can replicate
images to local image servers if you wish, replicate config servers,
load balance the whole thing to whatever scale you need.

Post by Stu Midgley
Thanks.
--
Dr Stuart Midgley
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
Joe Landman
e: ***@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www

Tim Cutts

2017-09-05 19:48:13 UTC

Permalink

Sanger made a similar separation with our FAI-based Ubuntu deployments. The FAI part of the installation was kept as minimal as possible, with the task purely of partitioning and formatting the hard disk of the machine, determining the appropriate network card configuration, and unpacking a minimal OS image. Further package installations and configuration was done later by cfengine (and later ansible, when we moved to that)

Regards,

Tim

On 05/09/2017, 08:21, "Beowulf on behalf of Joe Landman" <beowulf-***@beowulf.org on behalf of ***@gmail.com> wrote:

Ahhh ... One of the things we did with our toolchain (it is open source,
I've just never pushed it to github) was to completely separate booting
from configuration. That is, units booted to an operational state
before we applied configuration. This was in part due to long
experience with nodes hanging during bootup with incorrect
configurations. If you minimize the chance for this, your nodes
(barring physical device failure) always boot. The only specific
opinion we had w.r.t. this system was that the nodes had to be bootable
via PXE, and there fore a working dhcp needed to exist on the network.

--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://

psc

2017-09-05 18:19:42 UTC

Permalink

Hey everyone, .. any idea what happened with perceus?
http://www.linux-mag.com/id/6386/
https://github.com/perceus/perceus

.. yeah; what happened with Arthur Stevens (Perceus, GravityFS/OS Green
Provisioning, etc. ) where he is now; who is maintain, if anyone, perceus ?

.. and come on Greg K. ... we know you are luring there somewhere being
busy with singularity
http://singularity.lbl.gov/ (kudos .. great job as always !!!)
.. wasn't perceus yours original baby?
https://gmkurtzer.github.io/
.. can you bring some light what happened with the perceus project? ..
I'd love to see it integrated with singularity -- that would made my
day/month/year !!!!!!

thanks!
cheers,
psc

p.s. .. there there used to be rocks clusters (not sure about it's
status these days)
http://www.rocksclusters.org/wordpress/

p.s.s. .. I'd say Warewulf is the "best" bet in most cases .. why keep
reinventing the wheel ?

Send Beowulf mailing list submissions to
To subscribe or unsubscribe via the World Wide Web, visit
http://www.beowulf.org/mailman/listinfo/beowulf
or, via email, send a message with subject or body 'help' to
You can reach the person managing the list at
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beowulf digest..."
1. Re: cluster deployment and config management (Joe Landman)
2. Re: cluster deployment and config management (Arif Ali)
3. RAID5 rebuild, remount with write without reboot? (mathog)
4. Re: RAID5 rebuild, remount with write without reboot?
(John Hearns)
----------------------------------------------------------------------
Message: 1
Date: Tue, 5 Sep 2017 08:20:03 -0400
Subject: Re: [Beowulf] cluster deployment and config management
Content-Type: text/plain; charset=utf-8; format=flowed
Good morning ...

Post by Stu Midgley
Our cluster deployment is a system that I've developed over the years
and is pretty simple - if you know BASH and how pxe booting works. It
has everything from setting the correct parameters in the bios, zfs
ram disks for the OS, lustre for state files (usually in /var) - all
in the initrd.
We use it to boot cluster nodes, lustre servers, misc servers and
desktops.
We basically treat everything like a cluster.

Post by Stu Midgley
However... we do have a proliferation of images... and all need to be
kept up-to-date and managed. Most of the changes from one image to
the next are config files.

Ahhh ... One of the things we did with our toolchain (it is open source,
I've just never pushed it to github) was to completely separate booting
from configuration. That is, units booted to an operational state
before we applied configuration. This was in part due to long
experience with nodes hanging during bootup with incorrect
configurations. If you minimize the chance for this, your nodes
(barring physical device failure) always boot. The only specific
opinion we had w.r.t. this system was that the nodes had to be bootable
via PXE, and there fore a working dhcp needed to exist on the network.
Post boot configuration, we drove via a script that downloaded and
launched other scripts. Since we PXE booted, network addresses were
fine. We didn't even enforce final network address determination on PXE
startup.
We looked at the booting process as a state machine. Lower level was
raw hardware, no power. Subsequent levels were bios POST, PXE of
kernel, configuration phase. During configuration phase *everything*
was on the table w.r.t. changes. We could (and did) alter networking,
using programmatic methods, databases, etc. to determine and configure
final network configs. Same for disks, and other resources.
Configuration changes could be pushed post boot by updating a script and
either pushing (not normally recommended for clusters of reasonable
size) or triggering a pull/run cycle for that script/dependencies.
This allowed us to update images and configuration asynchronously.
We had to manage images, but this turned out to be generally simple. I
was in the midst of putting image mappings into a distributed object
store when the company died. Config store is similarly simple, again
using the same mechanisms, and could be driven entirely programmatically.
Of course, for the chef/puppet/ansible/salt/cloudformation/... people,
we could drive their process as well.

John Hearns via Beowulf

2017-09-05 21:10:02 UTC

Permalink

Regarding Rocks clusters, permit me to vent a little.
In my last employ we provided Rocks clusters.... Rocks is firmly embedded
in the Redhat 6 era with out of date 2.6 kernels.
It uses kickstart for installations (which is OK). However with modern
generations of Intel processors you get a warning from Kickstart that the
architecture is unsupported... so you must add the flag to accept
unsupported architectures to your kickstart.cfg
If at this point the Forbidden Planet "Danger Will Robinson!" soundbite is
not playing on your head then I have a bridge to sell you in London...
Rocks version 7 is supposed to be on the horizon, but I doubt it. It may
have been launched, but I left the Rocks list five months ago and I do not
feel the lack. Sorry Rocks types if you are on here... no intention to give
you a kicking.

Post by psc
Hey everyone, .. any idea what happened with perceus?
http://www.linux-mag.com/id/6386/
https://github.com/perceus/perceus
.. yeah; what happened with Arthur Stevens (Perceus, GravityFS/OS Green
Provisioning, etc. ) where he is now; who is maintain, if anyone, perceus ?
.. and come on Greg K. ... we know you are luring there somewhere being
busy with singularity
http://singularity.lbl.gov/ (kudos .. great job as always !!!)
.. wasn't perceus yours original baby?
https://gmkurtzer.github.io/
.. can you bring some light what happened with the perceus project? .. I'd
love to see it integrated with singularity -- that would made my
day/month/year !!!!!!
thanks!
cheers,
psc
p.s. .. there there used to be rocks clusters (not sure about it's status
these days)
http://www.rocksclusters.org/wordpress/
p.s.s. .. I'd say Warewulf is the "best" bet in most cases .. why keep
reinventing the wheel ?

Post by Stu Midgley
and is pretty simple - if you know BASH and how pxe booting works. It
has everything from setting the correct parameters in the bios, zfs
ram disks for the OS, lustre for state files (usually in /var) - all
in the initrd.
We use it to boot cluster nodes, lustre servers, misc servers and
desktops.
We basically treat everything like a cluster.

Post by Stu Midgley
kept up-to-date and managed. Most of the changes from one image to
the next are config files.

Ahhh ... One of the things we did with our toolchain (it is open source,
I've just never pushed it to github) was to completely separate booting
from configuration. That is, units booted to an operational state
before we applied configuration. This was in part due to long
experience with nodes hanging during bootup with incorrect
configurations. If you minimize the chance for this, your nodes
(barring physical device failure) always boot. The only specific
opinion we had w.r.t. this system was that the nodes had to be bootable
via PXE, and there fore a working dhcp needed to exist on the network.
Post boot configuration, we drove via a script that downloaded and
launched other scripts. Since we PXE booted, network addresses were
fine. We didn't even enforce final network address determination on PXE
startup.
We looked at the booting process as a state machine. Lower level was
raw hardware, no power. Subsequent levels were bios POST, PXE of
kernel, configuration phase. During configuration phase *everything*
was on the table w.r.t. changes. We could (and did) alter networking,
using programmatic methods, databases, etc. to determine and configure
final network configs. Same for disks, and other resources.
Configuration changes could be pushed post boot by updating a script and
either pushing (not normally recommended for clusters of reasonable
size) or triggering a pull/run cycle for that script/dependencies.
This allowed us to update images and configuration asynchronously.
We had to manage images, but this turned out to be generally simple. I
was in the midst of putting image mappings into a distributed object
store when the company died. Config store is similarly simple, again
using the same mechanisms, and could be driven entirely programmatically.
Of course, for the chef/puppet/ansible/salt/cloudformation/... people,
we could drive their process as well.
We don't have a good config management (which might, hopefully, reduce

Post by Stu Midgley
the number of images we need). We tried puppet, but it seems everyone
hates it. Its too complicated? Not the right tool?

Post by Stu Midgley
dumping the active services from systemd and somehow munging all that
together in the initrd. ie. git checkout the server to get config
files and systemctl enable/start the appropriate services etc.
It started to get complicated.
Any feedback/experiences appreciated. What works well? What doesn't?

Post by Stu Midgley
--
Dr Stuart Midgley
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Douglas Eadline

2017-09-05 21:26:30 UTC

Permalink

I woudl suggest you talk to Arthur Stevens.

Post by psc
.. and come on Greg K. ... we know you are luring there somewhere being
busy with singularity
http://singularity.lbl.gov/ (kudos .. great job as always !!!)
.. wasn't perceus yours original baby?

Perceus was based on Warewulf, the current warewulf is 3.6
and works quite well.

Post by psc
https://gmkurtzer.github.io/
.. can you bring some light what happened with the perceus project? ..
I'd love to see it integrated with singularity -- that would made my
day/month/year !!!!!!

You can do that now.

--
Doug

Post by psc
thanks!
cheers,
psc
p.s. .. there there used to be rocks clusters (not sure about it's
status these days)
http://www.rocksclusters.org/wordpress/
p.s.s. .. I'd say Warewulf is the "best" bet in most cases .. why keep
reinventing the wheel ?

Post by Stu Midgley
Our cluster deployment is a system that I've developed over the years
and is pretty simple - if you know BASH and how pxe booting works. It
has everything from setting the correct parameters in the bios, zfs
ram disks for the OS, lustre for state files (usually in /var) - all
in the initrd.
We use it to boot cluster nodes, lustre servers, misc servers and
desktops.
We basically treat everything like a cluster.

Post by Stu Midgley
However... we do have a proliferation of images... and all need to be
kept up-to-date and managed. Most of the changes from one image to
the next are config files.

Ahhh ... One of the things we did with our toolchain (it is open source,
I've just never pushed it to github) was to completely separate booting
from configuration. That is, units booted to an operational state
before we applied configuration. This was in part due to long
experience with nodes hanging during bootup with incorrect
configurations. If you minimize the chance for this, your nodes
(barring physical device failure) always boot. The only specific
opinion we had w.r.t. this system was that the nodes had to be bootable
via PXE, and there fore a working dhcp needed to exist on the network.
Post boot configuration, we drove via a script that downloaded and
launched other scripts. Since we PXE booted, network addresses were
fine. We didn't even enforce final network address determination on PXE
startup.
We looked at the booting process as a state machine. Lower level was
raw hardware, no power. Subsequent levels were bios POST, PXE of
kernel, configuration phase. During configuration phase *everything*
was on the table w.r.t. changes. We could (and did) alter networking,
using programmatic methods, databases, etc. to determine and configure
final network configs. Same for disks, and other resources.
Configuration changes could be pushed post boot by updating a script and
either pushing (not normally recommended for clusters of reasonable
size) or triggering a pull/run cycle for that script/dependencies.
This allowed us to update images and configuration asynchronously.
We had to manage images, but this turned out to be generally simple. I
was in the midst of putting image mappings into a distributed object
store when the company died. Config store is similarly simple, again
using the same mechanisms, and could be driven entirely
programmatically.
Of course, for the chef/puppet/ansible/salt/cloudformation/... people,
we could drive their process as well.

_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
MailScanner: Clean

--
Doug
--
MailScanner: Clean

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/b

Christopher Samuel

2017-09-05 23:51:46 UTC

Permalink

Post by Stu Midgley
I am in the process of redeveloping our cluster deployment and config
management environment and wondered what others are doing?

--
Christopher Samuel Senior Systems Administrator
Melbourne Bioinformatics - The University of Melbourne
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit ht

Christopher Samuel

2017-09-05 23:58:14 UTC

Permalink

Post by Christopher Samuel
Nothing like your scale, of course, but it works and we know if a node
has booted a particular image it will be identical to any other node
that's set to boot the same image.

I should mention that we set the osimage for nodes via groups, not
directly on the nodes, so on two of our HPC clusters we've just got a
single "compute" group that covers all nodes and on the one other we've
got "phi" and "nophi" groups.

The data infrastructure one has groups for NSD servers connected to IB
vs FC storage, TSM servers, HSM (TSM-SM) servers, GPFS-NFS cluster
servers, etc but those are all more traditional stateful installs
(kickstart installs all scripted via xCAT).

cheers,
Chris

Stu Midgley

2017-09-06 02:14:05 UTC

Permalink

I'm not feeling much love for puppet.

Post by Christopher Samuel

Post by Stu Midgley
I am in the process of redeveloping our cluster deployment and config
management environment and wondered what others are doing?

xCAT here for all HPC related infrastructure. Stateful installs for
GPFS NSD servers and TSM servers, compute nodes are all statelite, so a
immutable RAMdisk image is built on the management node for the compute
cluster and then on boot they mount various items over NFS (including
the GPFS state directory).
Nothing like your scale, of course, but it works and we know if a node
has booted a particular image it will be identical to any other node
that's set to boot the same image.
Healthcheck scripts mark nodes offline if they don't have the current
production kernel and GPFS versions (and other checks too of course)
plus Slurm's "scontrol reboot" lets us do rolling reboots without
needing to spot when nodes have become idle.
I've got to say I really prefer this to systems like Puppet, Salt, etc,
where you need to go and tweak an image after installation.
For our VM infrastructure (web servers, etc) we do use Salt for that. We
used to use Puppet but we switched when the only person who understood
it left. Don't miss it at all...
cheers,
Chris
--
Christopher Samuel Senior Systems Administrator
Melbourne Bioinformatics - The University of Melbourne
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

--
Dr Stuart Midgley
***@gmail.com

Bill Broadley

2017-09-06 09:46:19 UTC

Permalink

Post by Stu Midgley
I'm not feeling much love for puppet.

I'm pretty fond of puppet for managing clusters. We use cobbler to go from PXE
boot -> installed, then puppet takes over.

Some of my favorite features:
* Inheritance is handy node -> node for a particular cluster -> compute node ->
head node
* Tags for handling users is handy, 1200 users, dozen clusters, and various
other bits of infrastructure makes it really easy to manage who gets access
to what.
* I like the self healing aspect, defining the system state, not how to get
there. That way if I need to repurpose, patch, or mistakenly make a node
unique in some way the next puppet run fixes it.
* Definitely helps with re-use across clusters. Makes for a higher incentive
to do it right the first time.
* Using facts to make decisions is really useful. Things like detecting if you
are a virtual machine, or updating autofs maps if IB is down.

Post by Stu Midgley

Post by Stu Midgley
I am in the process of redeveloping our cluster deployment and config
management environment and wondered what others are doing?

xCAT here for all HPC related infrastructure. Stateful installs for
GPFS NSD servers and TSM servers, compute nodes are all statelite, so a
immutable RAMdisk image is built on the management node for the compute
cluster and then on boot they mount various items over NFS (including
the GPFS state directory).
Nothing like your scale, of course, but it works and we know if a node
has booted a particular image it will be identical to any other node
that's set to boot the same image.
Healthcheck scripts mark nodes offline if they don't have the current
production kernel and GPFS versions (and other checks too of course)
plus Slurm's "scontrol reboot" lets us do rolling reboots without
needing to spot when nodes have become idle.
I've got to say I really prefer this to systems like Puppet, Salt, etc,
where you need to go and tweak an image after installation.
For our VM infrastructure (web servers, etc) we do use Salt for that. We
used to use Puppet but we switched when the only person who understood
it left. Don't miss it at all...
cheers,
Chris
--
Christopher Samuel Senior Systems Administrator
Melbourne Bioinformatics - The University of Melbourne
903 55545 <tel:%2B61%20%280%293%20903%2055545>
_______________________________________________
sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
--
Dr Stuart Midgley
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf