[Beowulf] Heterogeneity in a tiny (two-system cluster)?

Discussion:

Tad Slawecki

2018-02-15 13:20:55 UTC

Hello, list -

We are at a point where we'd like to explore a tiny cluster of two systems to speed up execution of the FVCOM circulation model. We already have a two-year-old system with two 14-core CPUs (Xeon E-2680), and I have budget to purchase another system at this point, which we plan to directly connect via Infiniband. Should I buy an exact match, or go with the most my budget can handle (for example 2xXeon Gold 1630, 16-cores) under the assumption that the two-system cluster will operate at about the same speed *and* I can reap the benefits of the added performance when running smaller simulations independently?

Our list owner already provided some thoughts:

> I've always preferred homgenous clusters, but what you say is,
> I think, quite plausible. The issue you will have though is
> ensuring that the application is built for the earliest of the
> architectures so you don't end up using instructions for a newer
> CPU on the older one (which would result in illegal instruction
> crashes).
>
> But there may be other gotchas that others think of!

Thank you ...

Tad
_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/l

John Hearns via Beowulf

2018-02-16 08:39:06 UTC

Permalink

Ted,
I would go for the more modern system. you say yourself the first system is
two years old. In one or two years it will be out of warranty, and if a
component breaks you will have to decide to buy that component or just junk
they system.

Actually, having said that you should look at the FVCOM model and see how
well it scales on a multi-core system.
Intel are increasign core counts, but not clock speeds. PAradoxically in
the past you used to be able to get dual-core parts at over 3Ghz, which
don;t have many cores competing for bandwith to RAM.
The counter example to this is Skylake which has more channels to RAM,
makign for a more balannced system.

I would go for a Skylake system, populate all the DIMM channels, and quite
honestly forget about runnign between two systems unless the size of your
models needs this.
Our latest Skylakes have 192Gbuytes of RAM for that reason. Int he last
generation this would sound like an unusual amount of RAM, but it makes
sense in the Skylake generation.

On 15 February 2018 at 14:20, Tad Slawecki <***@limno.com> wrote:

>
> Hello, list -
>
> We are at a point where we'd like to explore a tiny cluster of two systems
> to speed up execution of the FVCOM circulation model. We already have a
> two-year-old system with two 14-core CPUs (Xeon E-2680), and I have budget
> to purchase another system at this point, which we plan to directly connect
> via Infiniband. Should I buy an exact match, or go with the most my budget
> can handle (for example 2xXeon Gold 1630, 16-cores) under the assumption
> that the two-system cluster will operate at about the same speed *and* I
> can reap the benefits of the added performance when running smaller
> simulations independently?
>
> Our list owner already provided some thoughts:
>
> > I've always preferred homgenous clusters, but what you say is,
> > I think, quite plausible. The issue you will have though is
> > ensuring that the application is built for the earliest of the
> > architectures so you don't end up using instructions for a newer
> > CPU on the older one (which would result in illegal instruction
> > crashes).
> >
> > But there may be other gotchas that others think of!
>
> Thank you ...
>
> Tad
> _______________________________________________
> Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

John Hearns via Beowulf

2018-02-16 08:41:02 UTC

Permalink

Oh, and while you are at it.
DO a bit of investigation on how the FVCOM model is optimised for use with
AVX vectorisation.
Hardware and clock speeds alone don't cut it.

On 16 February 2018 at 09:39, John Hearns <***@googlemail.com> wrote:

> Ted,
> I would go for the more modern system. you say yourself the first system
> is two years old. In one or two years it will be out of warranty, and if a
> component breaks you will have to decide to buy that component or just junk
> they system.
>
>
> Actually, having said that you should look at the FVCOM model and see how
> well it scales on a multi-core system.
> Intel are increasign core counts, but not clock speeds. PAradoxically in
> the past you used to be able to get dual-core parts at over 3Ghz, which
> don;t have many cores competing for bandwith to RAM.
> The counter example to this is Skylake which has more channels to RAM,
> makign for a more balannced system.
>
> I would go for a Skylake system, populate all the DIMM channels, and quite
> honestly forget about runnign between two systems unless the size of your
> models needs this.
> Our latest Skylakes have 192Gbuytes of RAM for that reason. Int he last
> generation this would sound like an unusual amount of RAM, but it makes
> sense in the Skylake generation.
>
>
>
>
>
>
>
>
>
> On 15 February 2018 at 14:20, Tad Slawecki <***@limno.com> wrote:
>
>>
>> Hello, list -
>>
>> We are at a point where we'd like to explore a tiny cluster of two
>> systems to speed up execution of the FVCOM circulation model. We already
>> have a two-year-old system with two 14-core CPUs (Xeon E-2680), and I have
>> budget to purchase another system at this point, which we plan to directly
>> connect via Infiniband. Should I buy an exact match, or go with the most my
>> budget can handle (for example 2xXeon Gold 1630, 16-cores) under the
>> assumption that the two-system cluster will operate at about the same speed
>> *and* I can reap the benefits of the added performance when running smaller
>> simulations independently?
>>
>> Our list owner already provided some thoughts:
>>
>> > I've always preferred homgenous clusters, but what you say is,
>> > I think, quite plausible. The issue you will have though is
>> > ensuring that the application is built for the earliest of the
>> > architectures so you don't end up using instructions for a newer
>> > CPU on the older one (which would result in illegal instruction
>> > crashes).
>> >
>> > But there may be other gotchas that others think of!
>>
>> Thank you ...
>>
>> Tad
>> _______________________________________________
>> Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>

Scott Atchley

2018-02-16 17:59:36 UTC

Permalink

If it is memory bandwidth limited, you may want to consider AMD's EPYC
which has 33% more bandwidth.

On Fri, Feb 16, 2018 at 3:41 AM, John Hearns via Beowulf <
***@beowulf.org> wrote:

> Oh, and while you are at it.
> DO a bit of investigation on how the FVCOM model is optimised for use with
> AVX vectorisation.
> Hardware and clock speeds alone don't cut it.
>
>
> On 16 February 2018 at 09:39, John Hearns <***@googlemail.com> wrote:
>
>> Ted,
>> I would go for the more modern system. you say yourself the first system
>> is two years old. In one or two years it will be out of warranty, and if a
>> component breaks you will have to decide to buy that component or just junk
>> they system.
>>
>>
>> Actually, having said that you should look at the FVCOM model and see how
>> well it scales on a multi-core system.
>> Intel are increasign core counts, but not clock speeds. PAradoxically in
>> the past you used to be able to get dual-core parts at over 3Ghz, which
>> don;t have many cores competing for bandwith to RAM.
>> The counter example to this is Skylake which has more channels to RAM,
>> makign for a more balannced system.
>>
>> I would go for a Skylake system, populate all the DIMM channels, and
>> quite honestly forget about runnign between two systems unless the size of
>> your models needs this.
>> Our latest Skylakes have 192Gbuytes of RAM for that reason. Int he last
>> generation this would sound like an unusual amount of RAM, but it makes
>> sense in the Skylake generation.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 15 February 2018 at 14:20, Tad Slawecki <***@limno.com> wrote:
>>
>>>
>>> Hello, list -
>>>
>>> We are at a point where we'd like to explore a tiny cluster of two
>>> systems to speed up execution of the FVCOM circulation model. We already
>>> have a two-year-old system with two 14-core CPUs (Xeon E-2680), and I have
>>> budget to purchase another system at this point, which we plan to directly
>>> connect via Infiniband. Should I buy an exact match, or go with the most my
>>> budget can handle (for example 2xXeon Gold 1630, 16-cores) under the
>>> assumption that the two-system cluster will operate at about the same speed
>>> *and* I can reap the benefits of the added performance when running smaller
>>> simulations independently?
>>>
>>> Our list owner already provided some thoughts:
>>>
>>> > I've always preferred homgenous clusters, but what you say is,
>>> > I think, quite plausible. The issue you will have though is
>>> > ensuring that the application is built for the earliest of the
>>> > architectures so you don't end up using instructions for a newer
>>> > CPU on the older one (which would result in illegal instruction
>>> > crashes).
>>> >
>>> > But there may be other gotchas that others think of!
>>>
>>> Thank you ...
>>>
>>> Tad
>>> _______________________________________________
>>> Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>
>>
>
> _______________________________________________
> Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>