Discussion:
[Beowulf] HPC workflows
John Hearns via Beowulf
2018-11-26 15:26:42 UTC
Permalink
This may not be the best place to discuss this - please suggest a better
forum if you have one.
I have come across this question in a few locations. Being specific, I am a
fan of the Julia language. Ont he Juia forum a respected developer recently
asked what the options were for keeping code developed on a laptop in sync
with code being deployed on an HPC system.
There was some discussion of having Git style repositories which can be
synced to/from.
My suggestion was an ssh mount of the home directory on the HPC system,
which I have configured effectively int he past when using remote HPC
systems.

At a big company I worked with recently, the company provided home
directories on NFS Servers. But the /home/username directory on the HPC was
different - on higher performance storage. The 'company' home was mounted -
so you could copy between them. But we did have the inevitable incidents of
jobs being run from company NFS - and pulling code across the head node
interfaces etc.

Developers these days are used to carrying their Mac laptops around and
working at hotdesks, at home, at conferences. ME too - and I love it.
Though I have a lovely HP Spectre Ultrabook.
Again their workflow is to develop on the laptop and upload code to Github
type repositories. Then when running on a cloud service the software ids
downloaded from the Repo.
There are of course HPC services on the cloud, with gateways to access them.

This leads me to ask - shoudl we be presenting HPC services as a 'cloud'
service, no matter that it is a non-virtualised on-premise setup?
In which case the way to deploy software would be via downloading from
Repos.
I guess this is actually more common nowadays.

I think out loud that many HPC codes depend crucially on a $HOME directory
being presnet on the compute nodes as the codes look for dot files etc. in
$HOME. I guess this can be dealt with by fake $HOMES which again sync back
to the Repo.

And yes I know containerisation may be the saviour here!

Sorry for a long post.
Gerald Henriksen
2018-11-27 02:49:53 UTC
Permalink
Post by John Hearns via Beowulf
This leads me to ask - shoudl we be presenting HPC services as a 'cloud'
service, no matter that it is a non-virtualised on-premise setup?
In which case the way to deploy software would be via downloading from
Repos.
I guess this is actually more common nowadays.
Simple answer yes.

If on premise HPC doesn't change to reflect the way the software is
developed today then the users will in the future prefer cloud HPC.

I guess it is a brave new world for on premise HPC as far as that the
users now, and likely more in the future, will have alternatives thus
forcing the on premise HPC to "compete" in order to survive.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) v
Michael Di Domenico
2018-11-27 12:51:06 UTC
Permalink
Post by Gerald Henriksen
If on premise HPC doesn't change to reflect the way the software is
developed today then the users will in the future prefer cloud HPC.
I guess it is a brave new world for on premise HPC as far as that the
users now, and likely more in the future, will have alternatives thus
forcing the on premise HPC to "compete" in order to survive.
this seems a bit too stringent of a statement for me. i don't dismiss
or disagree with your premise, but i don't entirely agree that HPC
"must" change in order to compete. We've all heard this kind of stuff
in the past if x doesn't change y will take over the world! I'm sure
we could come up with a heck of a list. there is, and i believe will
always be, a large percentage of the "HPC" population that doesn't get
counted on the Top500 list and will not or can not use the cloud.

i also believe these are two separate issues. in my opinion, how code
is developed shouldn't really have anything to do with how an HPC
resource is run. having said that however, i suspect in a few years
there's going to be an "HPC Code" revolution. The generic code base
is getting too complicated, (ie look at the mess openmpi has become)

---
"It's a machine, Schroeder. It doesn't get pissed off. It doesn't get
happy, it doesn't get sad, it doesn't laugh at your jokes. It just
runs programs." (Newton Crosby, 1986)
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo
Gerald Henriksen
2018-11-28 00:41:40 UTC
Permalink
Post by Michael Di Domenico
Post by Gerald Henriksen
If on premise HPC doesn't change to reflect the way the software is
developed today then the users will in the future prefer cloud HPC.
I guess it is a brave new world for on premise HPC as far as that the
users now, and likely more in the future, will have alternatives thus
forcing the on premise HPC to "compete" in order to survive.
this seems a bit too stringent of a statement for me. i don't dismiss
or disagree with your premise, but i don't entirely agree that HPC
"must" change in order to compete. We've all heard this kind of stuff
in the past if x doesn't change y will take over the world!
HPC, like most things, exists to get something done.

If HPC doesn't change to reflect the changes in society and the way
the software is developed (*) then the users will look for more modern
ways to replace traditional HPC. As noted the software is no longer
developed on workstations that are connected to the lab/company
network but rather on laptops that stay with the user wherever they
go.

This in turn is at least in part what has driven to the rise of
distributed version control, git in particular.

If HPC doesn't make it easy for these users to transfer their workflow
to the cluster, and the cloud providers do, then the users will move
to using the cloud even if it costs them 10%, 20% more because at the
end of the day it is about getting the job done and not about spending
time to work to antiquated methods of putting jobs in a cluster.

And of course if the users would rather spend their department budgets
with Amazon, Azure, Google, or others then the next upgrade cycle
their won't be any money for the in house cluster...


* - note the HPC isn't unique in this regard. The Linux distributions
are facing their own version of this, where much of the software is no
longer packagable in the traditional sense as it instead relies on
language specific packaging systems and languages that don't lend
themselves to the older rpm/deb style system.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/
John Hearns via Beowulf
2018-11-28 08:03:08 UTC
Permalink
Post by Gerald Henriksen
* - note the HPC isn't unique in this regard. The Linux distributions
are facing their own version of this, where much of the software is no
longer packagable in the traditional sense as it instead relies on
language specific packaging systems and languages that don't lend
themselves to the older rpm/deb style system.
Gerald, very well said. The term in the UK At the moment would be
'friction'. On-premise HPC has to be frictionless as cloud HPC.

Note that I referred to Julia, which has a packaging system.
The Julia community has given a lot of thought to the packaging system for
1.0 and it has concepts such as environments different projects.

I hate to single out Python, but have experience of users using Anaconda
which means a huge variation in what everyone has.
And more importantly for HPC systems the packages are placed in the users
home directory (by default).
On the system I am thinking about there was very limited space on /home and
it was an NFS mount. MEaning any parallel program startup would
pull lots of data from NFS.
Post by Gerald Henriksen
Post by Michael Di Domenico
Post by Gerald Henriksen
If on premise HPC doesn't change to reflect the way the software is
developed today then the users will in the future prefer cloud HPC.
I guess it is a brave new world for on premise HPC as far as that the
users now, and likely more in the future, will have alternatives thus
forcing the on premise HPC to "compete" in order to survive.
this seems a bit too stringent of a statement for me. i don't dismiss
or disagree with your premise, but i don't entirely agree that HPC
"must" change in order to compete. We've all heard this kind of stuff
in the past if x doesn't change y will take over the world!
HPC, like most things, exists to get something done.
If HPC doesn't change to reflect the changes in society and the way
the software is developed (*) then the users will look for more modern
ways to replace traditional HPC. As noted the software is no longer
developed on workstations that are connected to the lab/company
network but rather on laptops that stay with the user wherever they
go.
This in turn is at least in part what has driven to the rise of
distributed version control, git in particular.
If HPC doesn't make it easy for these users to transfer their workflow
to the cluster, and the cloud providers do, then the users will move
to using the cloud even if it costs them 10%, 20% more because at the
end of the day it is about getting the job done and not about spending
time to work to antiquated methods of putting jobs in a cluster.
And of course if the users would rather spend their department budgets
with Amazon, Azure, Google, or others then the next upgrade cycle
their won't be any money for the in house cluster...
* - note the HPC isn't unique in this regard. The Linux distributions
are facing their own version of this, where much of the software is no
longer packagable in the traditional sense as it instead relies on
language specific packaging systems and languages that don't lend
themselves to the older rpm/deb style system.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
John Hearns via Beowulf
2018-11-28 08:03:43 UTC
Permalink
Julia packaging https://docs.julialang.org/en/v1/stdlib/Pkg/index.html
Post by Gerald Henriksen
Post by Michael Di Domenico
Post by Gerald Henriksen
If on premise HPC doesn't change to reflect the way the software is
developed today then the users will in the future prefer cloud HPC.
I guess it is a brave new world for on premise HPC as far as that the
users now, and likely more in the future, will have alternatives thus
forcing the on premise HPC to "compete" in order to survive.
this seems a bit too stringent of a statement for me. i don't dismiss
or disagree with your premise, but i don't entirely agree that HPC
"must" change in order to compete. We've all heard this kind of stuff
in the past if x doesn't change y will take over the world!
HPC, like most things, exists to get something done.
If HPC doesn't change to reflect the changes in society and the way
the software is developed (*) then the users will look for more modern
ways to replace traditional HPC. As noted the software is no longer
developed on workstations that are connected to the lab/company
network but rather on laptops that stay with the user wherever they
go.
This in turn is at least in part what has driven to the rise of
distributed version control, git in particular.
If HPC doesn't make it easy for these users to transfer their workflow
to the cluster, and the cloud providers do, then the users will move
to using the cloud even if it costs them 10%, 20% more because at the
end of the day it is about getting the job done and not about spending
time to work to antiquated methods of putting jobs in a cluster.
And of course if the users would rather spend their department budgets
with Amazon, Azure, Google, or others then the next upgrade cycle
their won't be any money for the in house cluster...
* - note the HPC isn't unique in this regard. The Linux distributions
are facing their own version of this, where much of the software is no
longer packagable in the traditional sense as it instead relies on
language specific packaging systems and languages that don't lend
themselves to the older rpm/deb style system.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
John Pellman
2018-11-28 15:05:06 UTC
Permalink
Post by Gerald Henriksen
If HPC doesn't make it easy for these users to transfer their workflow
to the cluster, and the cloud providers do, then the users will move
to using the cloud even if it costs them 10%, 20% more because at the
end of the day it is about getting the job done and not about spending
time to work to antiquated methods of putting jobs in a cluster.
And of course if the users would rather spend their department budgets
with Amazon, Azure, Google, or others then the next upgrade cycle
their won't be any money for the in house cluster...
Interestingly enough, Cornell has been adopting a sort of compromise
between traditional HPC and cloud computing by maintaining an
AWS-compatible private cloud on-prem (Red Cloud
<https://www.cac.cornell.edu/services/cloudservices.aspx>). I'd speculate
that this would have the advantage of preventing researchers from "going
rogue" and foregoing traditional HPC groups entirely by going directly to
AWS.
Post by Gerald Henriksen
Post by Michael Di Domenico
Post by Gerald Henriksen
If on premise HPC doesn't change to reflect the way the software is
developed today then the users will in the future prefer cloud HPC.
I guess it is a brave new world for on premise HPC as far as that the
users now, and likely more in the future, will have alternatives thus
forcing the on premise HPC to "compete" in order to survive.
this seems a bit too stringent of a statement for me. i don't dismiss
or disagree with your premise, but i don't entirely agree that HPC
"must" change in order to compete. We've all heard this kind of stuff
in the past if x doesn't change y will take over the world!
HPC, like most things, exists to get something done.
If HPC doesn't change to reflect the changes in society and the way
the software is developed (*) then the users will look for more modern
ways to replace traditional HPC. As noted the software is no longer
developed on workstations that are connected to the lab/company
network but rather on laptops that stay with the user wherever they
go.
This in turn is at least in part what has driven to the rise of
distributed version control, git in particular.
If HPC doesn't make it easy for these users to transfer their workflow
to the cluster, and the cloud providers do, then the users will move
to using the cloud even if it costs them 10%, 20% more because at the
end of the day it is about getting the job done and not about spending
time to work to antiquated methods of putting jobs in a cluster.
And of course if the users would rather spend their department budgets
with Amazon, Azure, Google, or others then the next upgrade cycle
their won't be any money for the in house cluster...
* - note the HPC isn't unique in this regard. The Linux distributions
are facing their own version of this, where much of the software is no
longer packagable in the traditional sense as it instead relies on
language specific packaging systems and languages that don't lend
themselves to the older rpm/deb style system.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Jon Forrest
2018-11-27 22:19:52 UTC
Permalink
Post by Michael Di Domenico
this seems a bit too stringent of a statement for me. i don't dismiss
or disagree with your premise, but i don't entirely agree that HPC
"must" change in order to compete.
I agree completely. There is and always be a need for what I call
"pretty high performance computing", which is the highest performance
computing you can achieve, given practical limits like funding, space,
time, ... Sure there will always people who can figure out how to go
faster, but PHPC is pretty good.

Jon Forrest

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
John Hanks
2018-11-30 22:02:55 UTC
Permalink
Post by Jon Forrest
I agree completely. There is and always be a need for what I call
"pretty high performance computing", which is the highest performance
computing you can achieve, given practical limits like funding, space,
time, ... Sure there will always people who can figure out how to go
faster, but PHPC is pretty good.
What a great term, PHPC. That probably describes the bulk of all "HPC"
oriented computing being done today, if you consider all cores in use down
to the lab/workbench level of clustering. Certainly for my userbase
(bioinformatics) the computational part of a project often is a small
subset of the total time spent on it and time to total solution is the most
important metric for them. It's rare for us to try to get that last 10% or
20% of performance gain.

<rant>This has been a great thread overall, but I think no one is
considering the elephant in the room. Technical arguments are not winning
out in any of these technologies: CI/CD, containers, "devops", etc. All
these things are stacking on arbitrary layers of abstraction in an attempt
to cover up for the underlying, really really crappy software development
practices/models and resulting code. They aren't successful because they
are *good*, they are successful because they are *popular*.

As HPC admins, we tend to report to research oriented groups. Not always,
but more often than "normal" IT folks do who are often insulated from
negative user feedback by ticket systems, metrics, etc. Think about the
difference in that reporting chain:

A PI/researcher gets her next grant, tenured position, brilliant new
post-doc, etc., based on her research. Approach them about expanding the
sysadmin staff by 10x people and they'll laugh you out of the room. Ask for
an extra 100% budget to buy Vendor B storage rather than whitebox and
they'll laugh you out of the room. They want as much raw
computation/storage as cheaply as possible and would rather pay a grad
student than a sysadmin to run it because a grad student is more likely to
stumble over a publication and boost the PI's status. sysadmins are dead
weight in this world, only tolerated.

A CIO or CTO gets his next job based on the headcount and budget under his
control. There is no incentive to be efficient in anything they do. Of
course, there is the *appearance* of efficiency to maintain, but the CIO
101 class's first lecture is on creative accounting and metrics. Pay more
for Vendor B? Of course, they pay for golf and lunch, great people. Think
about all those "migrate/outsource to the cloud" projects you've seen that
were going to save so much money. More often than not, staff *expands* with
"cloud engineers", extra training is required, sysadmin work gets
inefficiently distributed to end users, err, I mean developers. Developers
now need to fork into new FTEs who need training...and so it goes. More
head count, more budget, more power: happy CIO. Time to apply to a larger
institution/company, rinse and repeat.

Think about it from the perspective of your favorite phone app, whatever it
may be:
- app is released, wow this is useful!
- app is updated, wow this is still useful and does 2 more things
- app is updated, ummm..., it's still useful but these 4 new things really
make what I need hard to get to
- app is updated, dammit, my feature has been split and replaced with 8
new menus, none of which do what I want?!?!?

No one goes to the yearly performance review and says "I removed X
features, Y lines of code and simplified the interface down to just the
useful functions, there's nothing else to be done" and gets a raise. People
get raises for *adding* stuff, for *increasing* complexity. You can't tie
your name to a simplification, but an addition goes on the CV quite nicely.
It doesn't matter if in the end any benefit is dwarfed by the extra
complexity and inefficiency.

Ultimately I blame us, the sysadmins.

We could have installed business oriented software and worked with schools
of business, but we laughed at them because they didn't use MPI. Now we
have the Hadoop and SPARK abominations to deal with.

We could have handed out a little sudo here and there to give people
*measured* control, but we coveted root and drove them to a more expensive
instance in the cloud where they could have full control.

We could have rounded out node images with a useful set of packages, but we
prided ourselves on optimizing node images to the point that users had to
pretty much rebuild the OS in $HOME to get anything to run, and so now:
containers.

We could have been in a position to say "hey, that's a stupid idea"
(*cough* systemd *cough*) but we squandered our reputation on neckbeard
BOFH pursuits and the enemies of simplicity stormed the gates.

Disclaimer: I'm confessing here. I recognize I played a role in this so
don't think I didn't throw the first stone at myself. Guilty as charged.

Enjoy the technical arguments, but devops and cloud and containers and
whatever next abstraction layers arise don't care. They have crept up on us
under a fog of popularity and fanbois-ism and overwhelmed HPC with sheer
numbers of "developers". Not because any of it is better or more
efficient, but because no one really cares about efficiency. They want to
work and eat and if adding and supporting a half-dozen more layers of
abstraction and APIs keeps the paychecks coming, no one is simplifying
anything. I call it "devops masturbation". The fact that pretty much all of
it could be replaced with a small shell script is irrelevant. devops needs
CI/CD, containers, and cloud to justify existence, and they will not go
quietly into that good night when offered a simpler, more efficient and
cheaper solution which puts them out of a job. Best use of our time now may
well be to 'rm -rf SLURM' and figure out how to install kubernetes. Console
yourself with the realization that people are willing to happily pay more
for less if the abstraction is appealing enough, and start counting the fat
stacks of cash.
</rant>

griznog
Post by Jon Forrest
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
John Hearns via Beowulf
2018-12-01 05:43:05 UTC
Permalink
John, your reply makes so many points which could start a whole series of
debates.
Best use of our time now may well be to 'rm -rf SLURM' and figure out
how to install kubernetes.
Well that is something I have given a lot of thought to recently.
The folks over at Sylabs have already released a development version of
Singularity with Kubernetes Container Runtime Interface.
So yes, maybe the future of HPC at the small industrial/departmental level
will be Kubernetes.
Maybe it will be so for the national lab scale systems too.

Might be worth bringing in one of my adages here - in IT you run with the
herd, or you get trampled. Meaning that you may identify a good technology,
you may find that it fits well with your needs and that you like working
with it.
But if the herd is thundering off with another, even inferior, technology
then you will be left behind.

My own thoughts on HPC for a tightly coupled, on premise setup is that we
need a lightweight OS on the nodes, which does the bare minimum. No general
purpose utilities, no GUIS, nothing but network and storage. And container
support.
The cluster will have the normal login nodes of course but will present
itself as a 'black box' to run containers.
But - given my herd analogy above - will we see that? Or will we see
private Openstack setups?
Post by Jon Forrest
I agree completely. There is and always be a need for what I call
"pretty high performance computing", which is the highest performance
computing you can achieve, given practical limits like funding, space,
time, ... Sure there will always people who can figure out how to go
faster, but PHPC is pretty good.
What a great term, PHPC. That probably describes the bulk of all "HPC"
oriented computing being done today, if you consider all cores in use down
to the lab/workbench level of clustering. Certainly for my userbase
(bioinformatics) the computational part of a project often is a small
subset of the total time spent on it and time to total solution is the most
important metric for them. It's rare for us to try to get that last 10% or
20% of performance gain.
<rant>This has been a great thread overall, but I think no one is
considering the elephant in the room. Technical arguments are not winning
out in any of these technologies: CI/CD, containers, "devops", etc. All
these things are stacking on arbitrary layers of abstraction in an attempt
to cover up for the underlying, really really crappy software development
practices/models and resulting code. They aren't successful because they
are *good*, they are successful because they are *popular*.
As HPC admins, we tend to report to research oriented groups. Not always,
but more often than "normal" IT folks do who are often insulated from
negative user feedback by ticket systems, metrics, etc. Think about the
A PI/researcher gets her next grant, tenured position, brilliant new
post-doc, etc., based on her research. Approach them about expanding the
sysadmin staff by 10x people and they'll laugh you out of the room. Ask for
an extra 100% budget to buy Vendor B storage rather than whitebox and
they'll laugh you out of the room. They want as much raw
computation/storage as cheaply as possible and would rather pay a grad
student than a sysadmin to run it because a grad student is more likely to
stumble over a publication and boost the PI's status. sysadmins are dead
weight in this world, only tolerated.
A CIO or CTO gets his next job based on the headcount and budget under his
control. There is no incentive to be efficient in anything they do. Of
course, there is the *appearance* of efficiency to maintain, but the CIO
101 class's first lecture is on creative accounting and metrics. Pay more
for Vendor B? Of course, they pay for golf and lunch, great people. Think
about all those "migrate/outsource to the cloud" projects you've seen that
were going to save so much money. More often than not, staff *expands* with
"cloud engineers", extra training is required, sysadmin work gets
inefficiently distributed to end users, err, I mean developers. Developers
now need to fork into new FTEs who need training...and so it goes. More
head count, more budget, more power: happy CIO. Time to apply to a larger
institution/company, rinse and repeat.
Think about it from the perspective of your favorite phone app, whatever
- app is released, wow this is useful!
- app is updated, wow this is still useful and does 2 more things
- app is updated, ummm..., it's still useful but these 4 new things
really make what I need hard to get to
- app is updated, dammit, my feature has been split and replaced with 8
new menus, none of which do what I want?!?!?
No one goes to the yearly performance review and says "I removed X
features, Y lines of code and simplified the interface down to just the
useful functions, there's nothing else to be done" and gets a raise. People
get raises for *adding* stuff, for *increasing* complexity. You can't tie
your name to a simplification, but an addition goes on the CV quite nicely.
It doesn't matter if in the end any benefit is dwarfed by the extra
complexity and inefficiency.
Ultimately I blame us, the sysadmins.
We could have installed business oriented software and worked with schools
of business, but we laughed at them because they didn't use MPI. Now we
have the Hadoop and SPARK abominations to deal with.
We could have handed out a little sudo here and there to give people
*measured* control, but we coveted root and drove them to a more expensive
instance in the cloud where they could have full control.
We could have rounded out node images with a useful set of packages, but
we prided ourselves on optimizing node images to the point that users had
containers.
We could have been in a position to say "hey, that's a stupid idea"
(*cough* systemd *cough*) but we squandered our reputation on neckbeard
BOFH pursuits and the enemies of simplicity stormed the gates.
Disclaimer: I'm confessing here. I recognize I played a role in this so
don't think I didn't throw the first stone at myself. Guilty as charged.
Enjoy the technical arguments, but devops and cloud and containers and
whatever next abstraction layers arise don't care. They have crept up on us
under a fog of popularity and fanbois-ism and overwhelmed HPC with sheer
numbers of "developers". Not because any of it is better or more
efficient, but because no one really cares about efficiency. They want to
work and eat and if adding and supporting a half-dozen more layers of
abstraction and APIs keeps the paychecks coming, no one is simplifying
anything. I call it "devops masturbation". The fact that pretty much all of
it could be replaced with a small shell script is irrelevant. devops needs
CI/CD, containers, and cloud to justify existence, and they will not go
quietly into that good night when offered a simpler, more efficient and
cheaper solution which puts them out of a job. Best use of our time now may
well be to 'rm -rf SLURM' and figure out how to install kubernetes. Console
yourself with the realization that people are willing to happily pay more
for less if the abstraction is appealing enough, and start counting the fat
stacks of cash.
</rant>
griznog
Post by Jon Forrest
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Gerald Henriksen
2018-12-02 15:09:19 UTC
Permalink
Post by John Hearns via Beowulf
My own thoughts on HPC for a tightly coupled, on premise setup is that we
need a lightweight OS on the nodes, which does the bare minimum. No general
purpose utilities, no GUIS, nothing but network and storage. And container
support.
One of the latest attempts at this is Fedora CoreOS, the merger of
Fedora Atomic and CoreOS (which Red Hat bought).

https://coreos.fedoraproject.org/
Post by John Hearns via Beowulf
The cluster will have the normal login nodes of course but will present
itself as a 'black box' to run containers.
But - given my herd analogy above - will we see that? Or will we see
private Openstack setups?
Maybe, Red Hat appears to be moving in that direction as well with a
Red Hat CoreOS offering with OpenShift though how it all ends up is
yet to be seen I suspect.
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mai
John Hanks
2018-12-03 18:12:10 UTC
Permalink
Post by John Hearns via Beowulf
John, your reply makes so many points which could start a whole series of
debates.
I would not deny partaking of the occasional round of trolling.
Post by John Hearns via Beowulf
Best use of our time now may well be to 'rm -rf SLURM' and figure out
how to install kubernetes.
...
My own thoughts on HPC for a tightly coupled, on premise setup is that we
need a lightweight OS on the nodes, which does the bare minimum. No general
purpose utilities, no GUIS, nothing but network and storage. And container
support.
The cluster will have the normal login nodes of course but will present
itself as a 'black box' to run containers.
But - given my herd analogy above - will we see that? Or will we see
private Openstack setups?
10 years ago, maybe even 5 I would have agreed with you wholeheartedly. I
was never impressed much by early LXC, but for my first year of exposure to
Docker hype I was thinking exactly what you are saying here. And then I
tried CoreOS and started missing having a real OS. And then I started
trying to do things with containers. And then I realized that I was seeing
software which was "easier to containerize" and that "easier to
containerize" really meant "written by people who can't figure out
'./configure; make; make install' and who build on a sand-like foundation
of fragile dependencies to the extent that it only runs on their Ubuntu
laptop so you have to put their Ubuntu laptop in a container." Then I
started asking myself "do I want to trust software of that quality?" And
after that, "do I want to trust the tools written to support that type of
poor-quality software?" And then I started to notice how much containers
actually *increased* the amount of time/complexity it took to manage
software. And then I started enjoying all the container engine bugs... At
that point, reality squished the hype for me because I had other stuff I
needed to get done and didn't have budget to hire a devops person to sit
around mulling these things over.

From the perspective of the software being containerized, I'm even more
skeptical. In my world (bioinformatics) I install a lot of crappy software.
We're talking stuff resulting from "I read the first three days of 'learn
python in 21 days' and now I'm an expert, just run this after installing
these 17 things from pypi...and trust the output" I'm good friends with
crappy software, we hang out together a lot. To me it just doesn't feel
like making crappy software more portable is the *right* thing to do. When
I walk my dog, I follow him with a bag and "containerize" what drops out.
It makes it easier to carry around, but doesn't change what it is. As of
today I see the biggest benefit of containers as that they force a
developer to actually document the install procedure somewhere in a way
that actually has to work so we can see firsthand how ridiculous it is
(*cough* tensorflow *cough*).

I got sidetracked on a rant again. Your proposed solution works fine in an
IT style computing world, it needs the exact staff IT wants to grow these
days and instead of just a self-directed sysadmin it has the potential to
need a project manager. I don't see it showing up on many lab/office
clusters anytime soon though because it's a model that embraces hype first
and in an environment not focused on publishing or press releases around
hype, it's a lot of extra work/cost/complexity for very little real
benefit. While you (and many on this list) might be interested in
exploring the technical merits of the approach, it's actual utility really
hits home for people who require that extra complexity and layered
abstraction to justify themselves. The understaffed/overworked among us
will just write a shell/job script and move along to the next raging fire
to put out.

griznog
Post by John Hearns via Beowulf
Post by Jon Forrest
I agree completely. There is and always be a need for what I call
"pretty high performance computing", which is the highest performance
computing you can achieve, given practical limits like funding, space,
time, ... Sure there will always people who can figure out how to go
faster, but PHPC is pretty good.
What a great term, PHPC. That probably describes the bulk of all "HPC"
oriented computing being done today, if you consider all cores in use down
to the lab/workbench level of clustering. Certainly for my userbase
(bioinformatics) the computational part of a project often is a small
subset of the total time spent on it and time to total solution is the most
important metric for them. It's rare for us to try to get that last 10% or
20% of performance gain.
<rant>This has been a great thread overall, but I think no one is
considering the elephant in the room. Technical arguments are not winning
out in any of these technologies: CI/CD, containers, "devops", etc. All
these things are stacking on arbitrary layers of abstraction in an attempt
to cover up for the underlying, really really crappy software development
practices/models and resulting code. They aren't successful because they
are *good*, they are successful because they are *popular*.
As HPC admins, we tend to report to research oriented groups. Not always,
but more often than "normal" IT folks do who are often insulated from
negative user feedback by ticket systems, metrics, etc. Think about the
A PI/researcher gets her next grant, tenured position, brilliant new
post-doc, etc., based on her research. Approach them about expanding the
sysadmin staff by 10x people and they'll laugh you out of the room. Ask for
an extra 100% budget to buy Vendor B storage rather than whitebox and
they'll laugh you out of the room. They want as much raw
computation/storage as cheaply as possible and would rather pay a grad
student than a sysadmin to run it because a grad student is more likely to
stumble over a publication and boost the PI's status. sysadmins are dead
weight in this world, only tolerated.
A CIO or CTO gets his next job based on the headcount and budget under
his control. There is no incentive to be efficient in anything they do. Of
course, there is the *appearance* of efficiency to maintain, but the CIO
101 class's first lecture is on creative accounting and metrics. Pay more
for Vendor B? Of course, they pay for golf and lunch, great people. Think
about all those "migrate/outsource to the cloud" projects you've seen that
were going to save so much money. More often than not, staff *expands* with
"cloud engineers", extra training is required, sysadmin work gets
inefficiently distributed to end users, err, I mean developers. Developers
now need to fork into new FTEs who need training...and so it goes. More
head count, more budget, more power: happy CIO. Time to apply to a larger
institution/company, rinse and repeat.
Think about it from the perspective of your favorite phone app, whatever
- app is released, wow this is useful!
- app is updated, wow this is still useful and does 2 more things
- app is updated, ummm..., it's still useful but these 4 new things
really make what I need hard to get to
- app is updated, dammit, my feature has been split and replaced with 8
new menus, none of which do what I want?!?!?
No one goes to the yearly performance review and says "I removed X
features, Y lines of code and simplified the interface down to just the
useful functions, there's nothing else to be done" and gets a raise. People
get raises for *adding* stuff, for *increasing* complexity. You can't tie
your name to a simplification, but an addition goes on the CV quite nicely.
It doesn't matter if in the end any benefit is dwarfed by the extra
complexity and inefficiency.
Ultimately I blame us, the sysadmins.
We could have installed business oriented software and worked with
schools of business, but we laughed at them because they didn't use MPI.
Now we have the Hadoop and SPARK abominations to deal with.
We could have handed out a little sudo here and there to give people
*measured* control, but we coveted root and drove them to a more expensive
instance in the cloud where they could have full control.
We could have rounded out node images with a useful set of packages, but
we prided ourselves on optimizing node images to the point that users had
containers.
We could have been in a position to say "hey, that's a stupid idea"
(*cough* systemd *cough*) but we squandered our reputation on neckbeard
BOFH pursuits and the enemies of simplicity stormed the gates.
Disclaimer: I'm confessing here. I recognize I played a role in this so
don't think I didn't throw the first stone at myself. Guilty as charged.
Enjoy the technical arguments, but devops and cloud and containers and
whatever next abstraction layers arise don't care. They have crept up on us
under a fog of popularity and fanbois-ism and overwhelmed HPC with sheer
numbers of "developers". Not because any of it is better or more
efficient, but because no one really cares about efficiency. They want to
work and eat and if adding and supporting a half-dozen more layers of
abstraction and APIs keeps the paychecks coming, no one is simplifying
anything. I call it "devops masturbation". The fact that pretty much all of
it could be replaced with a small shell script is irrelevant. devops needs
CI/CD, containers, and cloud to justify existence, and they will not go
quietly into that good night when offered a simpler, more efficient and
cheaper solution which puts them out of a job. Best use of our time now may
well be to 'rm -rf SLURM' and figure out how to install kubernetes. Console
yourself with the realization that people are willing to happily pay more
for less if the abstraction is appealing enough, and start counting the fat
stacks of cash.
</rant>
griznog
Post by Jon Forrest
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Michael Di Domenico
2018-12-03 19:44:13 UTC
Permalink
From the perspective of the software being containerized, I'm even more skeptical. In my world (bioinformatics) I install a lot of crappy software. We're talking stuff resulting from "I read the first three days of 'learn python in 21 days' and now I'm an expert, just run this after installing these 17 things from pypi...and trust the output" I'm good friends with crappy software, we hang out together a lot. To me it just doesn't feel like making crappy software more portable is the *right* thing to do. When I walk my dog, I follow him with a bag and "containerize" what drops out. It makes it easier to carry around, but doesn't change what it is. As of today I see the biggest benefit of containers as that they force a developer to actually document the install procedure somewhere in a way that actually has to work so we can see firsthand how ridiculous it is (*cough* tensorflow *cough*).
I vote this the single best explanation of containers I've heard all year... :)
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beow
Bogdan Costescu
2018-11-28 11:32:33 UTC
Permalink
Post by John Hearns via Beowulf
I have come across this question in a few locations. Being specific, I am
a fan of the Julia language. Ont he Juia forum a respected developer
recently asked what the options were for keeping code developed on a laptop
in sync with code being deployed on an HPC system.
In keeping with the rest of the buzzwords, where does CI/CD fit between
"code developed" and "code being deployed"? Once you have a mechanism for
this, can't this be used for the final deployment? Or even CD could
automatically take care of that final deployment?
Post by John Hearns via Beowulf
There was some discussion of having Git style repositories which can be
synced to/from.
Yes, that would work fine. Why would git not be compatible with an HPC
setup? And why restrict yourself to git and not talk about distributed
version control systems in general?
Post by John Hearns via Beowulf
My suggestion was an ssh mount of the home directory on the HPC system,
which I have configured effectively int he past when using remote HPC
systems.
I don't quite parse the first part of the phrase - care to
reformulate/elaborate?
Post by John Hearns via Beowulf
Again their workflow is to develop on the laptop and upload code to Github
type repositories. Then when running on a cloud service the software ids
downloaded from the Repo.
The way I read it, this is very much restricted to code that can be run
immediately after download, i.e. using a scripting language. That might fit
your HPC universe, but the parallel one I live in still mostly runs code
built and maybe even optimized on the HPC system it runs on. This includes
software delivered in binary form from ISVs, open source code (f.e.
GROMACS), or code developed in-house - they all have in common using an
internode (f.e. MPI) or intranode (OpenMP, CUDA) communication and/or
control library directly, not through a deep stack.
Post by John Hearns via Beowulf
There are of course HPC services on the cloud, with gateways to access them.
This leads me to ask - shoudl we be presenting HPC services as a 'cloud'
service, no matter that it is a non-virtualised on-premise setup?
What's in a name? It's called cloud computing today, but it was called grid
computing 10-15 years ago...

For many years, before the cloud-craze began, scientists might have had
access to some HPC resources in their own institution, in other
institutions in the same city, country, continent or even across
continents. How is this different from having access to an on-premise
install of f.e. OpenStack or a cloud computing offer somewhere else also
using OpenStack? The only advantage in some cases is that the on-premise
stuff might be better integrated with the "home" setup (i.e. common file
systems, common user management, or - why not? - better documentation :)),
which improves the user experience, but the functionality is very similar
or the same.

To come back to your initial topic - a git repo can just as well be sync-ed
to a login node of a cluster (wherever that is located) or to a VM in the
AWS cloud (wherever that is located).
Post by John Hearns via Beowulf
I think out loud that many HPC codes depend crucially on a $HOME directory
being presnet on the compute nodes as the codes look for dot files etc. in
$HOME. I guess this can be dealt with by fake $HOMES which again sync back
to the Repo.
I don't follow you here... $HOME, dot files, repo, syncing back? And why
"Repo" with capital letter, is it supposed to be a name or something
special?

In my HPC universe, people actually not only need code, but also data -
usually LOTS of data. Replicating the code (for scripting languages) or the
binaries (for compiled stuff) would be trivial, replicating the data would
not. Also pulling the data in or pushing it out (f.e. to/from AWS) on the
fly whenever the instance is brought up would be slow and costly. And by
the way this is in no way a new idea - queueing systems have for a long
time the concept of "pre" and "post" job stages, which could be used to
pull in code and/or data to the node(s) on which the node would be running
and clean up afterwards.

Cheers,
Bogdan
John Hearns via Beowulf
2018-11-28 12:25:15 UTC
Permalink
Bogdan, Igor. Thankyou very much for your thoughtful answers. I don not
have much time today to do your replies the justice of a proper answer.
Regarding the ssh filesystem, the scenario was that I was working for a
well known company.
We were running CFD simulations on remote academic HPC setups. There was
more than one site!
The corporate firewall allowed us an outgoing ssh connection. I found it a
lot easier to configure an sshfs mount so that engineers could transfer
programs and scripts between their locla system and the remote system,
rather than using a graphical or a command line ssh client.
The actual large data files were transferred by yours truly, via a USB disk
drive.

I did not know about gitfs (my bad). That sounds interesting.
On Mon, Nov 26, 2018 at 4:27 PM John Hearns via Beowulf <
Post by John Hearns via Beowulf
I have come across this question in a few locations. Being specific, I
am a fan of the Julia language. Ont he Juia forum a respected developer
recently asked what the options were for keeping code developed on a laptop
in sync with code being deployed on an HPC system.
I think out loud that many HPC codes depend crucially on a $HOME
Post by John Hearns via Beowulf
directory being presnet on the compute nodes as the codes look for dot
files etc. in $HOME. I guess this can be dealt with by fake $HOMES which
again sync back to the Repo.
I don't follow you here... $HOME, dot files, repo, syncing back? And why
"Repo" with capital letter, is it supposed to be a name or something
special?
I think John is talking here about doing version control on whole HOME
directories but trying to be mindful of dot files such as .bashrc and
others which can be application or system specific. The first thing which
comes to mind is to use branches for different cluster systems. However
this also taps into backup (which is another important topic since HOME
dirs are not necessarily backed up). There could be a working solution
which makes use of recursive repos and git lfs support but pruning old
history could still be desirable. Git would minimize the amount of storage
because it's hash based. While this could make it possible to replicate
your environment "wherever you go", a/ you would drag a lot history around
and b/ a significantly different mindset is required to manage the whole
thing. A typical HPC user may know git clone but generally is not a git
adept. Developers are different and, who knows John, maybe someone will
pick up your idea.
Is gitfs any popular?
In my HPC universe, people actually not only need code, but also data -
usually LOTS of data. Replicating the code (for scripting languages) or the
binaries (for compiled stuff) would be trivial, replicating the data would
not. Also pulling the data in or pushing it out (f.e. to/from AWS) on the
fly whenever the instance is brought up would be slow and costly. And by
the way this is in no way a new idea - queueing systems have for a long
time the concept of "pre" and "post" job stages, which could be used to
pull in code and/or data to the node(s) on which the node would be running
and clean up afterwards.
mark somers
2018-11-28 12:51:05 UTC
Permalink
Well, please be careful in naming things:

http://cloudscaling.com/blog/cloud-computing/grid-cloud-hpc-whats-the-diff/

(note; The guy only heard about MPI and does not consider SMP based codes using i.e. OpenMP, but he did understand there are
different things being talked about).

Now I am all for connecting divers and flexible workflows to true HPC systems and grids that feel different if not experienced
with (otherwise what is the use of a computer if there are no users making use of it?), but do not make the mistake of thinking
everything is cloud or will be cloud soon that fast. 

Bare with me for a second:

There are some very fundamental problems when dealing with large scale parallel programs (OpenMP) on virtual machines (most of
the cloud). Google for papers talking about co-scheduling. All VM specialists I know and talked with, state generally that using
more than 4 cores in a VM is not smart and one should switch to bare metal then. Don't believe it? Google for it or just try it
yourself by doing a parallel scaling experiment and fitting Amdahls law through your measurements.

So, one could say bare metal cloud have arisen mostly because of this but they also do come with expenses. Somehow I find that a
simple rule always seems to apply; if more people in a scheme need to be paid, the scheme is probably more expensive than
alternatives, if available. Or state differently; If you can do things yourself, it is always a cheaper option than let some
others do things (under normal 'open market' rules and excluding the option of slavery :)).

Nice read for some background:

http://staff.um.edu.mt/carl.debono/DT_CCE3013_1.pdf

One has to note that in academia one often is in the situation that grants are obtained to buy hardware and that running costs
(i.e. electricity and rack space) are matched by the university making the case of spending the grant money on paying amazone or
google to do your 'compute' not so sensible if you can do things yourself. Also given the ease of deploying an HPC cluster
nowadays with OpenHPC or something commercial like Qlustar or Bright, it will be hard pressed to justify long term bare metal
cloud usage in these settings.

Those were some technical and economical considerations that play a role in things. 

There is also another aspect when for example dealing with sensitive data you are to be helt responsible for. The Cloud model is
not so friendly under those circumstances either. Again your data is put "on someone else's computer". Thinking of GDPR and
such.

So, back to the point, some 'user driven' workloads might end up on clouds or on bare-metal on-premisse clouds (seems to be the
latest fad right now) but clearly not everything. Especially if the workloads are not 'user driven' but technology (or
economically or socially driven) i.e. there is no other way of doing it except using some type of (specialized) technology (or
it is just not allowed). I therefore also am of opinion that cloud computing is also not true (traditional) HPC and that the
term HPC has been diluted over the year by commercial interest / marketing speak.

BTW, on a side note / rant; The mathematics we are dealing with here are the constraints to be met in optimising things. The
constraints actually determine the final optimal case (https://en.wikipedia.org/wiki/Lagrange_multiplier) and people tend to
'ignore' or not specify the constraints in their arguments about what is the best or optimal thing to do. So what I did here is
I have given you some example of constraints (technical, economical and social) in the 'everything will be cloud' rhetoric to
keep an eye on before drawing any conclusions about what the future might bring :). 

just my little opinion though...

Disclaimer; I could be horribly wrong :).
--
mark somers
tel: +31715274437
mail: ***@chem.leidenuniv.nl
web:  http://theorchem.leidenuniv.nl/people/somers
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit h
John Hearns via Beowulf
2018-11-28 12:54:53 UTC
Permalink
MArk, again I do not have time to give your answer justice today.
However, as you are in NL, can you send me some olliebollen please? I am a
terrible addict.
Post by mark somers
http://cloudscaling.com/blog/cloud-computing/grid-cloud-hpc-whats-the-diff/
(note; The guy only heard about MPI and does not consider SMP based codes
using i.e. OpenMP, but he did understand there are
different things being talked about).
Now I am all for connecting divers and flexible workflows to true HPC
systems and grids that feel different if not experienced
with (otherwise what is the use of a computer if there are no users making
use of it?), but do not make the mistake of thinking
everything is cloud or will be cloud soon that fast.
There are some very fundamental problems when dealing with large scale
parallel programs (OpenMP) on virtual machines (most of
the cloud). Google for papers talking about co-scheduling. All VM
specialists I know and talked with, state generally that using
more than 4 cores in a VM is not smart and one should switch to bare metal
then. Don't believe it? Google for it or just try it
yourself by doing a parallel scaling experiment and fitting Amdahls law
through your measurements.
So, one could say bare metal cloud have arisen mostly because of this but
they also do come with expenses. Somehow I find that a
simple rule always seems to apply; if more people in a scheme need to be
paid, the scheme is probably more expensive than
alternatives, if available. Or state differently; If you can do things
yourself, it is always a cheaper option than let some
others do things (under normal 'open market' rules and excluding the
option of slavery :)).
http://staff.um.edu.mt/carl.debono/DT_CCE3013_1.pdf
One has to note that in academia one often is in the situation that grants
are obtained to buy hardware and that running costs
(i.e. electricity and rack space) are matched by the university making the
case of spending the grant money on paying amazone or
google to do your 'compute' not so sensible if you can do things yourself.
Also given the ease of deploying an HPC cluster
nowadays with OpenHPC or something commercial like Qlustar or Bright, it
will be hard pressed to justify long term bare metal
cloud usage in these settings.
Those were some technical and economical considerations that play a role
in things.
There is also another aspect when for example dealing with sensitive data
you are to be helt responsible for. The Cloud model is
not so friendly under those circumstances either. Again your data is put
"on someone else's computer". Thinking of GDPR and
such.
So, back to the point, some 'user driven' workloads might end up on clouds
or on bare-metal on-premisse clouds (seems to be the
latest fad right now) but clearly not everything. Especially if the
workloads are not 'user driven' but technology (or
economically or socially driven) i.e. there is no other way of doing it
except using some type of (specialized) technology (or
it is just not allowed). I therefore also am of opinion that cloud
computing is also not true (traditional) HPC and that the
term HPC has been diluted over the year by commercial interest / marketing speak.
BTW, on a side note / rant; The mathematics we are dealing with here are
the constraints to be met in optimising things. The
constraints actually determine the final optimal case (
https://en.wikipedia.org/wiki/Lagrange_multiplier) and people tend to
'ignore' or not specify the constraints in their arguments about what is
the best or optimal thing to do. So what I did here is
I have given you some example of constraints (technical, economical and
social) in the 'everything will be cloud' rhetoric to
keep an eye on before drawing any conclusions about what the future might
bring :).
just my little opinion though...
Disclaimer; I could be horribly wrong :).
--
mark somers
tel: +31715274437
web: http://theorchem.leidenuniv.nl/people/somers
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Gerald Henriksen
2018-11-28 15:23:30 UTC
Permalink
Post by mark somers
Now I am all for connecting divers and flexible workflows to true HPC systems and grids that feel different if not experienced
with (otherwise what is the use of a computer if there are no users making use of it?), but do not make the mistake of thinking
everything is cloud or will be cloud soon that fast. 
The "cloud" is a massive business that is currently growing fast.

Will it take over everything or continue its growth forever, of course
not.

But dismissing it is equally a dangerous thing to do, particularly if
your job relies on something not being in the cloud.
Post by mark somers
So, one could say bare metal cloud have arisen mostly because of this but they also do come with expenses. Somehow I find that a
simple rule always seems to apply; if more people in a scheme need to be paid, the scheme is probably more expensive than
alternatives, if available. Or state differently; If you can do things yourself, it is always a cheaper option than let some
others do things (under normal 'open market' rules and excluding the option of slavery :)).
But this is one area where the cloud can often win - the scale of the
Azure/Google/AWS operations means that you get 24/7/365 coverage with
essentially the lowest possible labour overhead.

And the fact is that while much of society insists on making decisions
purely based on cost - see airfares for example - there are a lot of
cases where people are willing to pay a premium for a service/product
that "just works".
Post by mark somers
One has to note that in academia one often is in the situation that grants are obtained to buy hardware and that running costs
(i.e. electricity and rack space) are matched by the university making the case of spending the grant money on paying amazone or
google to do your 'compute' not so sensible if you can do things yourself.
Currently.

If on premise HPC doesn't reflect the ease of use that can be found
elsewhere, combined with some lobbying by the existing or specialized
cloud providers, and those grants could become a lot more flexible.

And given that many/most/all universities are often short on space and
they may well welcome an opportunity to be able to repurpose an
existing cluster space...
Post by mark somers
There is also another aspect when for example dealing with sensitive data you are to be helt responsible for. The Cloud model is
not so friendly under those circumstances either. Again your data is put "on someone else's computer". Thinking of GDPR and
such.
I don't think this is so clear an advantage to on premise as some
think.

I think the fact that we are all on this mailing list in order to
learn and discuss issues puts us as an outlier - there are very few
people participating on this list, and even allowing for discussions
happening on other sites I (sadly) suspect you will find that the
majority of people running HPC aren't as informed as they should be.

Who do you trust more to keep your data safe - to keep systems
patched, to keep firewalls up to date, to properly configure
everything, etc.? Is it your local HPC, where maybe they are
struggling to hire staff, or can't afford to offer a "good enough"
salary, or simply can't justify hiring a security specialist? Or
perhaps you go with Google or Microsoft, who have entire departments
of staff dealing with these issues, who monitor their networks full
time looking for flaws?

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://ww
mark somers
2018-11-28 13:22:44 UTC
Permalink
As a follow up note on workflows,

we also have used 'sshfs like constructs' to help non technical users to compute things on local clusters, the actual CERN grid
infrastructure and on (national) super computers. We built some middleware suitable for that many moons ago:

http://lgi.tc.lic.leidenuniv.nl/LGI/

Works great for python coded workflows on workstations so coming back to the 'sshfs trick':

We have some organic chemists here doing many many many Gaussian calculations and only knowing windows. They do this by creating
input files using the gui of Gaussian on their workstations and save them in a special directory that is synced using
SyncBackPro to a CentOS server. On that server a python script runs via cron every 5 min to push these input files for Gaussian
into our LGI setup. Compute resources hooked up in our LGI that can do Gaussian pick up those jobs, run them using slurm /
torque / glite or whatever is suitable on that compute resource and eventually upload results into the LGI repository again. The
cron python job on the CentOS server notices finished jobs in the LGI queue and downloads the results into a special output
directory and removes the job from the LGI queue. Now the windows workstation with SynBackPro again retrieves the outputs to the
windows share they all use. This has been running 24x7 for several years now without a glitch using super computers, the actual
grid and local clusters without these organic chemists having to worry about unix or details like that.

So I can concur, a seemingly simple 'sshfs trick' should not be underestimated :).

We also have many unix literate users here using the python api to build workflows via LGI or the simple cli interface of LGI to
submit jobs from their workstations.

m.
--
mark somers
tel: +31715274437
mail: ***@chem.leidenuniv.nl
web:  http://theorchem.leidenuniv.nl/people/somers
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/
Eliot Eshelman
2018-11-28 15:16:29 UTC
Permalink
Those interested in providing user-friendly HPC might want to take a
look at Open OnDemand. I'm not affiliated with this project, but wanted
to make sure it got a plug. I've heard good things so far.

http://openondemand.org/

Eliot
Post by John Hearns via Beowulf
This may not be the best place to discuss this - please suggest a better
forum if you have one.
I have come across this question in a few locations. Being specific, I am a
fan of the Julia language. Ont he Juia forum a respected developer recently
asked what the options were for keeping code developed on a laptop in sync
with code being deployed on an HPC system.
There was some discussion of having Git style repositories which can be
synced to/from.
My suggestion was an ssh mount of the home directory on the HPC system,
which I have configured effectively int he past when using remote HPC
systems.
At a big company I worked with recently, the company provided home
directories on NFS Servers. But the /home/username directory on the HPC was
different - on higher performance storage. The 'company' home was mounted -
so you could copy between them. But we did have the inevitable incidents of
jobs being run from company NFS - and pulling code across the head node
interfaces etc.
Developers these days are used to carrying their Mac laptops around and
working at hotdesks, at home, at conferences. ME too - and I love it.
Though I have a lovely HP Spectre Ultrabook.
Again their workflow is to develop on the laptop and upload code to Github
type repositories. Then when running on a cloud service the software ids
downloaded from the Repo.
There are of course HPC services on the cloud, with gateways to access them.
This leads me to ask - shoudl we be presenting HPC services as a 'cloud'
service, no matter that it is a non-virtualised on-premise setup?
In which case the way to deploy software would be via downloading from
Repos.
I guess this is actually more common nowadays.
I think out loud that many HPC codes depend crucially on a $HOME directory
being presnet on the compute nodes as the codes look for dot files etc. in
$HOME. I guess this can be dealt with by fake $HOMES which again sync back
to the Repo.
And yes I know containerisation may be the saviour here!
Sorry for a long post.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Jonathan Engwall
2018-11-28 21:42:56 UTC
Permalink
You can probably fork from a central repo.
Post by Gerald Henriksen
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Loading...