All complex systems have flaws. It's more a matter of deciding which flaws are acceptable and which aren't, which is driven by economic factors for the most part - the cost of fixing the flaw (and potentially introducing a new one) vs the cost of damage from the flaw.
I'd find it hard to believe that Intel's CPU designers sat around implementing deliberate flaws ( the Bosch engine controller for VW model).
I'd not find it hard to believe that someone, somewhere raised a speculation about a potential flaw, among many others. That one just didn't happen to get resources applied to it, others did. Picking which ones to attack and spend resources on is a difficult question, and often gets answered based on totally irrelevant factors.
That's not negligence - that's just "it is impossible to discover and fix all possible bugs"
This is not unusual even in MUCH simpler chips-I have some 8 bit wide level shifters (from 2.5 to 3.3V logic) that have an obscure behavior with the rate at which the two power supplies come up that causes them not to pass data (preventing the system in which they are installed from booting). About 1 out of 500 times. The mfr's response is "yeah, we think we can duplicate that, but we've moved on to a newer version of that chip, why don't you replace the chips with the new ones". This isn't an necessarily an issue of the chip not performing to the datasheet specs (essentially, the data sheet is silent on this).
The Errata and Notes lists for complex parts (like CPUs and large FPGAs) runs to hundreds of pages, and continuously grows as people find more odd behaviors.
Therefore - one should assume your system has unknown flaws and design your software and operational procedures accordingly.
James Lux
Project Manager, SunRISE - Sun Radio Interferometer Space Experiment
Task Manager, DARPA High Frequency Research (DHFR) Space Testbed
Jet Propulsion Laboratory (Mail Stop 161-213)
4800 Oak Grove Drive
Pasadena CA 91109
(818)354-2075 (office)
(818)395-2714 (cell)
-----Original Message-----
From: Beowulf [mailto:beowulf-***@beowulf.org] On Behalf Of Jörg Saßmannshausen
Sent: Sunday, August 19, 2018 2:00 PM
To: ***@beowulf.org
Subject: Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA
Dear all,
whereas I am accepting that no system is 100% secure ans bug-free, I am beginning to wonder whether the current problems we are having are actually design flaws and whether, and that is the more important bit, Intel and other vendors did know about it. I am thinking of the famous 'diesel-engine' scandal and, continuing this line of thought, dragging the vendors into the limelight and get them to pay for this.
I mean, we have to sort out the mess the company was making in the first place, have to judge whether to apply a patch which might decrease the performance of our systems (I am doing HPC, hence my InfiniBand question) versus security.
Where will it stop?
Given the current and previous 'bugs' are clearly design flaws IMHO, what are the chances of a law suite? The any compensation here should go to Open Source projects, in my opinion, which are making software more secure.
Any comments here?
All the best
Jörg
Post by John Hearns via BeowulfRather more seriously, this is a topic which is well worth discussing,
What are best practices on patching HPC systems?
Perhaps we need a separate thread here.
I will throw in one thought, which I honestly do not want to see happening.
I recently took a trip to Bletchley Park in the UK. On display there
was an IBM punch card machine and sample punch cards Back in the day
one prepared a 'job deck' which was collected by an operator in a
metal hopper then wheeled off to the mainframe. You did not ever touch
the mainframe. So effectively an air gapped system. A system like that
would in these days kill productivity.
However should there be 'virus checking' of executables before they
are run on compute nodes.
One of the advantages lauded for Linux systems is of course that
anti-virus programs are not needed.
Also I should ask - in the jargon of anti-virus is there a 'signature'
for any of these exploit codes? One would guess that bad actors copy
the example codes already published and use these almost in a cut and
paste fashion. So the signature would be tight loops repeatedly
reading or writing to the same memory locations. Can that be
distinguished from innocent code?
Post by John Hearns via Beowulf*To patch, or not to patch, that is the question:* Whether 'tis
nobler in the mind to suffer The loops and branches of speculative
execution, Or to take arms against a sea of exploits And by opposing
end them. To die—to sleep, No more; and by a sleep to say we end The
heart-ache and the thousand natural shocks That HPC is heir to: 'tis
a consummation Devoutly to be wish'd. To die, to sleep
Post by Chris SamuelPost by Jeff JohnsonWith the spate of security flaws over the past year and the impacts
their
Post by Jeff Johnsonfixes have on performance and functionality it might be
worthwhile to
just
For me none of the HPC systems I've been involved with here in
Australia would have had that option. Virtually all have external
users and/or reliance on external data for some of the work they
are used for (and the sysadmins don't usually have control over the
projects & people who get to use them).
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
_______________________________________________
Computing To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit htt