Discussion:
[Beowulf] Update on dealing with Spectre and Meltdown
Prentice Bisbal
2018-03-08 18:10:15 UTC
Permalink
Beowulfers,

Have any of you updated the kernels on your clusters to fix the Spectre
and Meltdown vulnerabilities? I was following this issue closely for the
first couple of weeks. There seemed to be a lack of consensus on how
much these fixed would impact HPC jobs, and if I recall correctly, some
of the patches really hurt performance, or caused other problems. We
took a wait-and-see approach here. So now that I've waited a while, what
did you see?

--
Prentice

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/ma
Alex Chekholko via Beowulf
2018-03-08 18:13:20 UTC
Permalink
As an experienced and cynical HPC admin working with trusted users... I
have not installed any kind of firmware or kernel updates. I'll wait for
you to go first :)

On Thu, Mar 8, 2018 at 10:10 AM, Prentice Bisbal <***@pppl.gov> wrote:

> Beowulfers,
>
> Have any of you updated the kernels on your clusters to fix the Spectre
> and Meltdown vulnerabilities? I was following this issue closely for the
> first couple of weeks. There seemed to be a lack of consensus on how much
> these fixed would impact HPC jobs, and if I recall correctly, some of the
> patches really hurt performance, or caused other problems. We took a
> wait-and-see approach here. So now that I've waited a while, what did you
> see?
>
> --
> Prentice
>
> _______________________________________________
> Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
Ryan Novosielski
2018-03-08 18:55:34 UTC
Permalink
I have been waiting, specifically, for Prentice Bisbal to try it. :-P

I actually have not checked to see whether any other upgrades I’ve applied have handled these vulnerabilities (I’d assume RedHat would have patched their distribution). FWIW, however, I did receive a report that one of our newer images seemed slower. Haven’t gotten a chance to fully investigate those reports — could be any number of other things.

--
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - ***@rutgers.edu<mailto:***@rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'

On Mar 8, 2018, at 13:13, Alex Chekholko via Beowulf <***@beowulf.org<mailto:***@beowulf.org>> wrote:

As an experienced and cynical HPC admin working with trusted users... I have not installed any kind of firmware or kernel updates. I'll wait for you to go first :)

On Thu, Mar 8, 2018 at 10:10 AM, Prentice Bisbal <***@pppl.gov<mailto:***@pppl.gov>> wrote:
Beowulfers,

Have any of you updated the kernels on your clusters to fix the Spectre and Meltdown vulnerabilities? I was following this issue closely for the first couple of weeks. There seemed to be a lack of consensus on how much these fixed would impact HPC jobs, and if I recall correctly, some of the patches really hurt performance, or caused other problems. We took a wait-and-see approach here. So now that I've waited a while, what did you see?

--
Prentice

_______________________________________________
Beowulf mailing list, ***@beowulf.org<mailto:***@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.beowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutgers.edu%7Cc4cd547515a346014d0d08d5852058a0%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM1x6k1L3KPRsLITJEy%2FKI%2FqaXOGJ6qwDJmgG4%3D&reserved=0>

_______________________________________________
Beowulf mailing list, ***@beowulf.org<mailto:***@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.beowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutgers.edu%7Cc4cd547515a346014d0d08d5852058a0%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM1x6k1L3KPRsLITJEy%2FKI%2FqaXOGJ6qwDJmgG4%3D&reserved=0
Gavin W. Burris
2018-03-08 19:36:27 UTC
Permalink
Just had our biannual maintenance window. All the things are patched. One could disable the performance-killing patches with the "noibrs noibpb nopti" grub options. Cheers.

On Thu 03/08/18 01:55PM EST, Ryan Novosielski wrote:
> I have been waiting, specifically, for Prentice Bisbal to try it. :-P
>
> I actually have not checked to see whether any other upgrades I’ve applied have handled these vulnerabilities (I’d assume RedHat would have patched their distribution). FWIW, however, I did receive a report that one of our newer images seemed slower. Haven’t gotten a chance to fully investigate those reports — could be any number of other things.
>
> --
> ____
> || \\UTGERS, |---------------------------*O*---------------------------
> ||_// the State | Ryan Novosielski - ***@rutgers.edu<mailto:***@rutgers.edu>
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
> `'
>
> On Mar 8, 2018, at 13:13, Alex Chekholko via Beowulf <***@beowulf.org<mailto:***@beowulf.org>> wrote:
>
> As an experienced and cynical HPC admin working with trusted users... I have not installed any kind of firmware or kernel updates. I'll wait for you to go first :)
>
> On Thu, Mar 8, 2018 at 10:10 AM, Prentice Bisbal <***@pppl.gov<mailto:***@pppl.gov>> wrote:
> Beowulfers,
>
> Have any of you updated the kernels on your clusters to fix the Spectre and Meltdown vulnerabilities? I was following this issue closely for the first couple of weeks. There seemed to be a lack of consensus on how much these fixed would impact HPC jobs, and if I recall correctly, some of the patches really hurt performance, or caused other problems. We took a wait-and-see approach here. So now that I've waited a while, what did you see?
>
> --
> Prentice
>
> _______________________________________________
> Beowulf mailing list, ***@beowulf.org<mailto:***@beowulf.org> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.beowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutgers.edu%7Cc4cd547515a346014d0d08d5852058a0%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM1x6k1L3KPRsLITJEy%2FKI%2FqaXOGJ6qwDJmgG4%3D&reserved=0>
>
> _______________________________________________
> Beowulf mailing list, ***@beowulf.org<mailto:***@beowulf.org> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.beowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutgers.edu%7Cc4cd547515a346014d0d08d5852058a0%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM1x6k1L3KPRsLITJEy%2FKI%2FqaXOGJ6qwDJmgG4%3D&reserved=0

> _______________________________________________
> Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


--
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania
Search our documentation: http://research-it.wharton.upenn.edu/about/
Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://
Ryan Novosielski
2018-03-08 19:47:27 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cheers, Gavin (hope you're well, BTW)!

Any real-world ballpark figures on how "killed" performance is (and
does your site disable those patches)?

On 03/08/2018 02:36 PM, Gavin W. Burris wrote:
> Just had our biannual maintenance window. All the things are
> patched. One could disable the performance-killing patches with
> the "noibrs noibpb nopti" grub options. Cheers.
>
> On Thu 03/08/18 01:55PM EST, Ryan Novosielski wrote:
>> I have been waiting, specifically, for Prentice Bisbal to try it.
>> :-P
>>
>> I actually have not checked to see whether any other upgrades
>> I’ve applied have handled these vulnerabilities (I’d assume
>> RedHat would have patched their distribution). FWIW, however, I
>> did receive a report that one of our newer images seemed slower.
>> Haven’t gotten a chance to fully investigate those reports —
>> could be any number of other things.
>>
>> -- ____ || \\UTGERS,
>> |---------------------------*O*--------------------------- ||_//
>> the State | Ryan Novosielski -
>> ***@rutgers.edu<mailto:***@rutgers.edu> || \\
>> University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
>> Campus || \\ of NJ | Office of Advanced Research
>> Computing - MSB C630, Newark `'
>>
>> On Mar 8, 2018, at 13:13, Alex Chekholko via Beowulf
>> <***@beowulf.org<mailto:***@beowulf.org>> wrote:
>>
>> As an experienced and cynical HPC admin working with trusted
>> users... I have not installed any kind of firmware or kernel
>> updates. I'll wait for you to go first :)
>>
>> On Thu, Mar 8, 2018 at 10:10 AM, Prentice Bisbal
>> <***@pppl.gov<mailto:***@pppl.gov>> wrote: Beowulfers,
>>
>> Have any of you updated the kernels on your clusters to fix the
>> Spectre and Meltdown vulnerabilities? I was following this issue
>> closely for the first couple of weeks. There seemed to be a lack
>> of consensus on how much these fixed would impact HPC jobs, and
>> if I recall correctly, some of the patches really hurt
>> performance, or caused other problems. We took a wait-and-see
>> approach here. So now that I've waited a while, what did you
>> see?
>>
>> -- Prentice
>>
>> _______________________________________________ Beowulf mailing
>> list, ***@beowulf.org<mailto:***@beowulf.org> sponsored
>> by Penguin Computing To change your subscription (digest mode or
>> unsubscribe) visit
>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.b
eowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutg
ers.edu%7Ca91c0d09885047db8ded08d5852beaa4%7Cb92d2b234d35447093ff69aca66
32ffe%7C1%7C1%7C636561346021256640&sdata=Eq4smdlMLwxS4nNkgAZlfDcc6Sl2gl3
NvuRZrt3AIxI%3D&reserved=0<https://na01.safelinks.protection.outlook.com
/?url=http%3A%2F%2Fwww.beowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=0
2%7C01%7Cnovosirj%40rutgers.edu%7Cc4cd547515a346014d0d08d5852058a0%7Cb92
d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM
1x6k1L3KPRsLITJEy%2FKI%2FqaXOGJ6qwDJmgG4%3D&reserved=0>
>>
>>
>>
>>
>>
_______________________________________________
>> Beowulf mailing list,
>> ***@beowulf.org<mailto:***@beowulf.org> sponsored by
>> Penguin Computing To change your subscription (digest mode or
>> unsubscribe) visit
>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.b
eowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutg
ers.edu%7Cc4cd547515a346014d0d08d5852058a0%7Cb92d2b234d35447093ff69aca66
32ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM1x6k1L3KPRsLITJEy%2FKI%2
FqaXOGJ6qwDJmgG4%3D&reserved=0
>
>>
>>
>>
_______________________________________________ Beowulf mailing
>> list, ***@beowulf.org sponsored by Penguin Computing To
>> change your subscription (digest mode or unsubscribe) visit
>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.b
eowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutg
ers.edu%7Ca91c0d09885047db8ded08d5852beaa4%7Cb92d2b234d35447093ff69aca66
32ffe%7C1%7C1%7C636561346021256640&sdata=Eq4smdlMLwxS4nNkgAZlfDcc6Sl2gl3
NvuRZrt3AIxI%3D&reserved=0
>
>>
>>
>>
>

- --
____
|| \\UTGERS, |----------------------*O*------------------------
||_// the State | Ryan Novosielski - ***@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark
`'
-----BEGIN PGP SIGNATURE-----

iFwEARECABwFAlqhk08VHG5vdm9zaXJqQHJ1dGdlcnMuZWR1AAoJEJm/oGnRHLG+
Uk4AoLLJenA1AFzo0LZ7wiIOeDXj1VdkAJ9qMYhw/9mDyfjKmrxLxi7/0eCvLA==
=MQ/T
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/m
Gavin W. Burris
2018-03-09 15:05:37 UTC
Permalink
Hey, Ryan. Doing well. Hope you weathered the storm OK!

The benchmarks were inconclusive, as they were ran on a loaded system. I should have ran a variety during the maintenance window, when queues were clear. With so many different applications, many of which being network IO-bound, we decided to only enforce patches on login nodes, applying grub options on compute nodes. Numbers there are good.

Cheers.

On Thu 03/08/18 02:47PM EST, Ryan Novosielski wrote:
> Cheers, Gavin (hope you're well, BTW)!
>
> Any real-world ballpark figures on how "killed" performance is (and
> does your site disable those patches)?
>
> On 03/08/2018 02:36 PM, Gavin W. Burris wrote:
> > Just had our biannual maintenance window. All the things are
> > patched. One could disable the performance-killing patches with
> > the "noibrs noibpb nopti" grub options. Cheers.
> >
> > On Thu 03/08/18 01:55PM EST, Ryan Novosielski wrote:
> >> I have been waiting, specifically, for Prentice Bisbal to try it.
> >> :-P
> >>
> >> I actually have not checked to see whether any other upgrades
> >> I’ve applied have handled these vulnerabilities (I’d assume
> >> RedHat would have patched their distribution). FWIW, however, I
> >> did receive a report that one of our newer images seemed slower.
> >> Haven’t gotten a chance to fully investigate those reports —
> >> could be any number of other things.
> >>
> >> -- ____ || \\UTGERS,
> >> |---------------------------*O*--------------------------- ||_//
> >> the State | Ryan Novosielski -
> >> ***@rutgers.edu<mailto:***@rutgers.edu> || \\
> >> University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
> >> Campus || \\ of NJ | Office of Advanced Research
> >> Computing - MSB C630, Newark `'
> >>
> >> On Mar 8, 2018, at 13:13, Alex Chekholko via Beowulf
> >> <***@beowulf.org<mailto:***@beowulf.org>> wrote:
> >>
> >> As an experienced and cynical HPC admin working with trusted
> >> users... I have not installed any kind of firmware or kernel
> >> updates. I'll wait for you to go first :)
> >>
> >> On Thu, Mar 8, 2018 at 10:10 AM, Prentice Bisbal
> >> <***@pppl.gov<mailto:***@pppl.gov>> wrote: Beowulfers,
> >>
> >> Have any of you updated the kernels on your clusters to fix the
> >> Spectre and Meltdown vulnerabilities? I was following this issue
> >> closely for the first couple of weeks. There seemed to be a lack
> >> of consensus on how much these fixed would impact HPC jobs, and
> >> if I recall correctly, some of the patches really hurt
> >> performance, or caused other problems. We took a wait-and-see
> >> approach here. So now that I've waited a while, what did you
> >> see?
> >>
> >> -- Prentice
> >>
> >> _______________________________________________ Beowulf mailing
> >> list, ***@beowulf.org<mailto:***@beowulf.org> sponsored
> >> by Penguin Computing To change your subscription (digest mode or
> >> unsubscribe) visit
> >> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.b
> eowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutg
> ers.edu%7Ca91c0d09885047db8ded08d5852beaa4%7Cb92d2b234d35447093ff69aca66
> 32ffe%7C1%7C1%7C636561346021256640&sdata=Eq4smdlMLwxS4nNkgAZlfDcc6Sl2gl3
> NvuRZrt3AIxI%3D&reserved=0<https://na01.safelinks.protection.outlook.com
> /?url=http%3A%2F%2Fwww.beowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=0
> 2%7C01%7Cnovosirj%40rutgers.edu%7Cc4cd547515a346014d0d08d5852058a0%7Cb92
> d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM
> 1x6k1L3KPRsLITJEy%2FKI%2FqaXOGJ6qwDJmgG4%3D&reserved=0>
> >>
> >>
> >>
> >>
> >>
> _______________________________________________
> >> Beowulf mailing list,
> >> ***@beowulf.org<mailto:***@beowulf.org> sponsored by
> >> Penguin Computing To change your subscription (digest mode or
> >> unsubscribe) visit
> >> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.b
> eowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutg
> ers.edu%7Cc4cd547515a346014d0d08d5852058a0%7Cb92d2b234d35447093ff69aca66
> 32ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM1x6k1L3KPRsLITJEy%2FKI%2
> FqaXOGJ6qwDJmgG4%3D&reserved=0
> >
> >>
> >>
> >>
> _______________________________________________ Beowulf mailing
> >> list, ***@beowulf.org sponsored by Penguin Computing To
> >> change your subscription (digest mode or unsubscribe) visit
> >> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.b
> eowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutg
> ers.edu%7Ca91c0d09885047db8ded08d5852beaa4%7Cb92d2b234d35447093ff69aca66
> 32ffe%7C1%7C1%7C636561346021256640&sdata=Eq4smdlMLwxS4nNkgAZlfDcc6Sl2gl3
> NvuRZrt3AIxI%3D&reserved=0
> >
> >>
> >>
> >>
> >
>
> --
> ____
> || \\UTGERS, |----------------------*O*------------------------
> ||_// the State | Ryan Novosielski - ***@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark
> `'

--
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania
Search our documentation: http://research-it.wharton.upenn.edu/about/
Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/m
Ryan Novosielski
2018-03-16 19:16:30 UTC
Permalink
Thanks again, Gavin. We did the same thing at our site this week (mitigated on login nodes, disabled mitigation on compute nodes), but I intend to go back and test to see if there’s any performance penalty without the kernel arguments.

Am I correct that this is Haswell and newer that is affected?

--
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - ***@rutgers.edu<mailto:***@rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'

On Mar 9, 2018, at 10:06, Gavin W. Burris <***@wharton.upenn.edu<mailto:***@wharton.upenn.edu>> wrote:

Hey, Ryan. Doing well. Hope you weathered the storm OK!

The benchmarks were inconclusive, as they were ran on a loaded system. I should have ran a variety during the maintenance window, when queues were clear. With so many different applications, many of which being network IO-bound, we decided to only enforce patches on login nodes, applying grub options on compute nodes. Numbers there are good.

Cheers.

On Thu 03/08/18 02:47PM EST, Ryan Novosielski wrote:
Cheers, Gavin (hope you're well, BTW)!

Any real-world ballpark figures on how "killed" performance is (and
does your site disable those patches)?

On 03/08/2018 02:36 PM, Gavin W. Burris wrote:
Just had our biannual maintenance window. All the things are
patched. One could disable the performance-killing patches with
the "noibrs noibpb nopti" grub options. Cheers.

On Thu 03/08/18 01:55PM EST, Ryan Novosielski wrote:
I have been waiting, specifically, for Prentice Bisbal to try it.
:-P

I actually have not checked to see whether any other upgrades
I’ve applied have handled these vulnerabilities (I’d assume
RedHat would have patched their distribution). FWIW, however, I
did receive a report that one of our newer images seemed slower.
Haven’t gotten a chance to fully investigate those reports —
could be any number of other things.

-- ____ || \\UTGERS,
|---------------------------*O*--------------------------- ||_//
the State | Ryan Novosielski -
***@rutgers.edu<mailto:***@rutgers.edu><mailto:***@rutgers.edu> || \\
University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
Campus || \\ of NJ | Office of Advanced Research
Computing - MSB C630, Newark `'

On Mar 8, 2018, at 13:13, Alex Chekholko via Beowulf
<***@beowulf.org<mailto:***@beowulf.org><mailto:***@beowulf.org>> wrote:

As an experienced and cynical HPC admin working with trusted
users... I have not installed any kind of firmware or kernel
updates. I'll wait for you to go first :)

On Thu, Mar 8, 2018 at 10:10 AM, Prentice Bisbal
<***@pppl.gov<mailto:***@pppl.gov><mailto:***@pppl.gov>> wrote: Beowulfers,

Have any of you updated the kernels on your clusters to fix the
Spectre and Meltdown vulnerabilities? I was following this issue
closely for the first couple of weeks. There seemed to be a lack
of consensus on how much these fixed would impact HPC jobs, and
if I recall correctly, some of the patches really hurt
performance, or caused other problems. We took a wait-and-see
approach here. So now that I've waited a while, what did you
see?

-- Prentice

_______________________________________________ Beowulf mailing
list, ***@beowulf.org<mailto:***@beowulf.org><mailto:***@beowulf.org> sponsored
by Penguin Computing To change your subscription (digest mode or
unsubscribe) visit
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.b
eowulf.org<http://eowulf.org>%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutg
ers.edu<http://ers.edu>%7Ca91c0d09885047db8ded08d5852beaa4%7Cb92d2b234d35447093ff69aca66
32ffe%7C1%7C1%7C636561346021256640&sdata=Eq4smdlMLwxS4nNkgAZlfDcc6Sl2gl3
NvuRZrt3AIxI%3D&reserved=0<https://na01.safelinks.protection.outlook.com
/?url=http%3A%2F%2Fwww.beowulf.org<http://2Fwww.beowulf.org>%2Fmailman%2Flistinfo%2Fbeowulf&data=0
2%7C01%7Cnovosirj%40rutgers.edu<http://40rutgers.edu>%7Cc4cd547515a346014d0d08d5852058a0%7Cb92
d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM
1x6k1L3KPRsLITJEy%2FKI%2FqaXOGJ6qwDJmgG4%3D&reserved=0>





_______________________________________________
Beowulf mailing list,
***@beowulf.org<mailto:***@beowulf.org><mailto:***@beowulf.org> sponsored by
Penguin Computing To change your subscription (digest mode or
unsubscribe) visit
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.b
eowulf.org<http://eowulf.org>%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutg
ers.edu<http://ers.edu>%7Cc4cd547515a346014d0d08d5852058a0%7Cb92d2b234d35447093ff69aca66
32ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM1x6k1L3KPRsLITJEy%2FKI%2
FqaXOGJ6qwDJmgG4%3D&reserved=0




_______________________________________________ Beowulf mailing
list, ***@beowulf.org<mailto:***@beowulf.org> sponsored by Penguin Computing To
change your subscription (digest mode or unsubscribe) visit
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.b
eowulf.org<http://eowulf.org>%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutg
ers.edu<http://ers.edu>%7Ca91c0d09885047db8ded08d5852beaa4%7Cb92d2b234d35447093ff69aca66
32ffe%7C1%7C1%7C636561346021256640&sdata=Eq4smdlMLwxS4nNkgAZlfDcc6Sl2gl3
NvuRZrt3AIxI%3D&reserved=0






--
____
|| \\UTGERS, |----------------------*O*------------------------
||_// the State | Ryan Novosielski - ***@rutgers.edu<mailto:***@rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark
`'

--
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania
Search our documentation: https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fresearch-it.wharton.upenn.edu%2Fabout%2F&data=02%7C01%7Cnovosirj%40rutgers.edu%7C81518df46beb4ae2339808d585cf437d%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C636562047597701062&sdata=SyA6EIUeAjIH9oZsQXkZOVAA03ImAy607mr7Zt6nBRQ%3D&reserved=0
Subscribe to the Newsletter: https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwhr.tn%2FResearchNewsletterSubscribe&data=02%7C01%7Cnovosirj%40rutgers.edu%7C81518df46beb4ae2339808d585cf437d%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C636562047597701062&sdata=UUC35NEU2oNL4O%2FPkcBaeiaZqpUy0g9OYtIQuB%2FlGD4%3D&reserved=0
Chris Samuel
2018-03-16 23:49:32 UTC
Permalink
On Saturday, 17 March 2018 6:16:30 AM AEDT Ryan Novosielski wrote:

> Am I correct that this is Haswell and newer that is affected?

Meltdown is affects Intel, IBM and ARM CPUs at least and Spectre (IIRC)
potentially affects any CPU with speculative execution.

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listin
Stephen Fralich
2018-03-08 19:46:18 UTC
Permalink
I did do some testing with the original round of updates and I found the supplementary GPFS monitoring tools (mmsysmon) written in Python were having a profoundly negative affect on performance with the Spectre/Meltdown patches with even first pass things like Linpack. You can disable the supplementary monitoring though and that cleared up the performance issues I was seeing. I didn't get very deep into it before they retracted all the firmware updates and we decided to wait and see. IBM had talked about providing options to make the querying less intrusive and less frequent on the GPFS UG mailing list in newer versions of GPFS as well.

Stephen

________________________________________
From: Beowulf <beowulf-***@beowulf.org> on behalf of Ryan Novosielski <***@rutgers.edu>
Sent: Thursday, March 8, 2018 12:55:34 PM
To: Beowulf List
Subject: Re: [Beowulf] Update on dealing with Spectre and Meltdown

I have been waiting, specifically, for Prentice Bisbal to try it. :-P

I actually have not checked to see whether any other upgrades I’ve applied have handled these vulnerabilities (I’d assume RedHat would have patched their distribution). FWIW, however, I did receive a report that one of our newer images seemed slower. Haven’t gotten a chance to fully investigate those reports — could be any number of other things.

--
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - ***@rutgers.edu<mailto:***@rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'

On Mar 8, 2018, at 13:13, Alex Chekholko via Beowulf <***@beowulf.org<mailto:***@beowulf.org>> wrote:

As an experienced and cynical HPC admin working with trusted users... I have not installed any kind of firmware or kernel updates. I'll wait for you to go first :)

On Thu, Mar 8, 2018 at 10:10 AM, Prentice Bisbal <***@pppl.gov<mailto:***@pppl.gov>> wrote:
Beowulfers,

Have any of you updated the kernels on your clusters to fix the Spectre and Meltdown vulnerabilities? I was following this issue closely for the first couple of weeks. There seemed to be a lack of consensus on how much these fixed would impact HPC jobs, and if I recall correctly, some of the patches really hurt performance, or caused other problems. We took a wait-and-see approach here. So now that I've waited a while, what did you see?

--
Prentice

_______________________________________________
Beowulf mailing list, ***@beowulf.org<mailto:***@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.beowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutgers.edu%7Cc4cd547515a346014d0d08d5852058a0%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM1x6k1L3KPRsLITJEy%2FKI%2FqaXOGJ6qwDJmgG4%3D&reserved=0>

_______________________________________________
Beowulf mailing list, ***@beowulf.org<mailto:***@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.beowulf.org%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cnovosirj%40rutgers.edu%7Cc4cd547515a346014d0d08d5852058a0%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636561296368291958&sdata=4fTJudM1x6k1L3KPRsLITJEy%2FKI%2FqaXOGJ6qwDJmgG4%3D&reserved=0
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beo
Skylar Thompson
2018-03-09 00:06:26 UTC
Permalink
We installed the kernel updates when they became available. Fortunately we
were a little slower on the firmware updates, and managed to rollback the
few we did apply that introduced instability. We're a bioinformatics shop
(data parallel, lots of disk I/O mostly to GPFS, few-to-no
cross-communication between nodes), and actually had some jobs start
running faster, though the group running them came back to us to report
that they had taken advantage of the maintenance window to make some tweaks
to their pipeline.

That's sort of a long way of saying YMMV.

Skylar

On Thu, Mar 8, 2018 at 10:10 AM, Prentice Bisbal <***@pppl.gov> wrote:

> Beowulfers,
>
> Have any of you updated the kernels on your clusters to fix the Spectre
> and Meltdown vulnerabilities? I was following this issue closely for the
> first couple of weeks. There seemed to be a lack of consensus on how much
> these fixed would impact HPC jobs, and if I recall correctly, some of the
> patches really hurt performance, or caused other problems. We took a
> wait-and-see approach here. So now that I've waited a while, what did you
> see?
>
> --
> Prentice
>
> _______________________________________________
> Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
Michael Di Domenico
2018-03-09 13:07:44 UTC
Permalink
we've put the kernel updates in place as part of our normal patching,
but not the firmware updates. nothing overly relevant has been
noticed so far

On Thu, Mar 8, 2018 at 1:10 PM, Prentice Bisbal <***@pppl.gov> wrote:
> Beowulfers,
>
> Have any of you updated the kernels on your clusters to fix the Spectre and
> Meltdown vulnerabilities? I was following this issue closely for the first
> couple of weeks. There seemed to be a lack of consensus on how much these
> fixed would impact HPC jobs, and if I recall correctly, some of the patches
> really hurt performance, or caused other problems. We took a wait-and-see
> approach here. So now that I've waited a while, what did you see?
>
> --
> Prentice
>
> _______________________________________________
> Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit htt
Peter Kjellström
2018-03-14 14:30:55 UTC
Permalink
On Thu, 8 Mar 2018 13:10:15 -0500
Prentice Bisbal <***@pppl.gov> wrote:

> Beowulfers,
>
> Have any of you updated the kernels on your clusters to fix the
> Spectre and Meltdown vulnerabilities? I was following this issue
> closely for the first couple of weeks. There seemed to be a lack of
> consensus on how much these fixed would impact HPC jobs, and if I
> recall correctly, some of the patches really hurt performance, or
> caused other problems. We took a wait-and-see approach here. So now
> that I've waited a while, what did you see?

We updated on day one as we would for any security related update.

We ran regression tests across serveral applications at scale and came
to the rough conclusion that PTI (meltdown) was essentially no impact
(<1%) but IBRS/IBPB (spectre) more costly (2-5%).

After this analysis two significant things happened, 1) new kernel with
possibly different behaviour 2) Intel reverted the microcode side of
IBRS/IBPB. This is not reflected in my numbers above.

Right now we're looking at adding new microcode for both our old
Sandybridge and Haswell systems to enable the spectre side again. This
will include another performance regression testing round.

OS for these systems, CentOS-6.

/Peter K
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) vis
James Cuff
2018-03-16 19:09:02 UTC
Permalink
> Prentice Bisbal <***@pppl.gov> wrote:
>
>> Beowulfers,
>>
>> Have any of you updated the kernels on your clusters to fix the
>> Spectre and Meltdown vulnerabilities? I was following this issue
>> closely for the first couple of weeks. There seemed to be a lack of
>> consensus on how much these fixed would impact HPC jobs, and if I
>> recall correctly, some of the patches really hurt performance, or
>> caused other problems. We took a wait-and-see approach here. So now
>> that I've waited a while, what did you see?


Hi all,

Interesting analysis by my new mentor Tim over here at The Next
Platform, especially around I/O challenges:

https://twitter.com/jamesdotcuff/status/974721256481226752

Love his editorial note:

"We will say it again: Run your own tests before and after applying
the Spectre and Meltdown patches"

Best,

j.

--
Dr. James Cuff
The Next Platform

https://twitter.com/jamesdotcuff
https://linkedin.com/in/jamesdotcuff
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit htt
Loading...