[Beowulf] "NNSA’s first really big heterogeneous supercomputer"
Prentice Bisbal via Beowulf
2018-10-31 17:58:30 UTC
It also makes Sierra the NNSA’s first really big heterogeneous
Does Roadrunner, the first computer to exceed 1 PFLOPS, not count any
more? That was an NNSA system, and that had 3 (yes, 3!) different
processors in it: The AMD Opteron CPU, of course, and then the
PowerXCell 8i Processor consisted of  had a stripped down POWER
processor core and Cell processors inside it. I read a publication from
LANL on it's architecture years ago, and I believe to had to program for
all 3 different processors to take advantage of it's architecture. (If
soneone knows for sure, please correct me if I'm wrong.)   I'd say
that's even more heterogeneous than what we are seeing today (CPU + GPU).

Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.
Chris Samuel
2018-11-01 09:10:54 UTC
I read a publication from LANL on it's architecture years ago, and I believe
to had to program for all 3 different processors to take advantage of it's
architecture. (If soneone knows for sure, please correct me if I'm wrong.)
I'd say that's even more heterogeneous than what we are seeing today
(CPU + GPU).
I think you're bang on the money, here's a presentation from LANL
about Roadrunner which includes programming on slide 25.


All the best,
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org
Prentice Bisbal via Beowulf
2018-11-01 16:45:17 UTC
Post by Chris Samuel
I read a publication from LANL on it's architecture years ago, and I believe
to had to program for all 3 different processors to take advantage of it's
architecture. (If soneone knows for sure, please correct me if I'm wrong.)
I'd say that's even more heterogeneous than what we are seeing today
(CPU + GPU).
I think you're bang on the money, here's a presentation from LANL
about Roadrunner which includes programming on slide 25.
All the best,
Thanks for this. Slides 25 - 31 are all good slides, but I think slide
31, titled "Programming a Hybrid Computer"  really hammers the point
home. Fortunately, it's all text, so I can cut-and-paste it's contents

Decomposition of an application for Cell-acceleration:
1. Opteron code
    - Runs non-accelerated parts of application
    - Participates in usual cluster parallel computations
    - Controls and communicates with Cell PPC code for the accelerated
2. Cell PPC code
    - Works with Opteron code on accelerated portions of application
    - Allocates Cell common memory
    - Communicates with Opteron code
    - Controls and works with its 8 SPEs
3. Cell SPE code (8-way parallel)
    - Runs on each SPE (SPMD) (MPMD also possible)
    - Shares Cell common memory with PPC code
    - Manages its Local Store (LS), transferring data blocks in/out as
    - Performs vector computations from its LS data

It seems like this would be a nightmare to program for. A few years ago
I worked with faculty member who did some programming for the Roadrunner
architecture, and he confirmed this.


Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinf