|
|
 |

SC 2009 - Birds
of a Feather Presentation


HPC systems consist of thousand of nodes and ten of thousand
of cores, all connected via high speed networking such as
InfiniBand 40Gb/s. Future systems will include a higher
number of nodes and cores, and the challenge to have them
all available for long scientific simulation run time will
increase. One of the solutions for this challenge is to
add scalable fault-tolerance capability as an essential
part of the HPC system architecture. In these videos, Appro
and Mellanox will review scalable fault-tolerant architectures
and examples of energy efficient and scalable supercomputing
clusters using dual quad data rate (QDR) InfiniBand to combine
capacity computing with network failover capabilities with
the help of Programming languages such as MPI and a robust
Linux cluster management package. It will also discuss how
fault-tolerance plays in the multi core systems and what
are the required modification to sustain long scientific
and engineering simulation on those systems. Special guest
appearance from Tsukuba University.
Scalable
Fault-Tolerant HPC Supercomputers -- Part 1 of 2
Presented by: Steve Lyness, VP of HPC Solutions at
Appro, Gilad Shainer, Director of Technical Marketing
at Mellanox, David Race, Director of SW PRoduct Development
at Appro, Shannon Davidson, Cheif Software Technologist
at Appro, and Dr. Taisuke Boku from Tsukuba University
Watch the video below.

Scalable
Fault-Tolerant HPC Supercomputers -- Part 2 of 2
Presented by: Steve Lyness, VP of HPC Solutions
at Appro, Gilad Shainer, Director of Technical Marketing
at Mellanox, David Race, Director of SW PRoduct Development
at Appro, Shannon Davidson, Cheif Software Technologist
at Appro, and Dr. Taisuke Boku from Tsukuba University
Watch the video below.
|
|
|

|
|