|
|
 |

Success Story



|
 |
Appro at the Forefront of Supercomputing History – Advances Research at Lawrence Livermore National Lab |
|

Download this Success Story ----
The Challenge
From the early days monolithic mainframes, through the vector supercomputers era and massively parallel systems era, LLNL has been home to many firsts in supercomputing and has been a driving force behind such development. While LLNL still has some of the largest supercomputers in existence like BlueGene/L and Purple, they realized the need to also field scalable compute resources comprised of hundreds and sometimes thousands of clustered commodity compute nodes/servers interconnected with high-performance commodity interconnects running Linux operating system and cluster tools. These Linux clusters can be created in much less time, built from standard industry servers, and assembled for significantly fewer dollars per peak teraFLOP/s. Through high speed interconnects compute nodes can be clustered together to form a new generation of scalable and highly flexible supercomputers able to meet LLNL’s programmatic need for additional, cost effective, large-scale simulation capacity.
At LLNL demand for time on their supercomputers far exceeds capacity. Scientific teams looking for computer time in order to accomplish 2D and 3D simulations of laser plasma interactions, turbulent hydrodynamics, strength of materials and improved equations of state had a difficult time obtaining their full requested allocations. This left LLNL scientists and engineers with the prospect of running only their highest priority calculations and performing less comprehensive parameter studies. This reduced the quality and innovation of the scientific research that could be accomplished, neither of which was desirable. In addition, the capability of the older machines was limiting the maximum size of jobs and the fidelity of full system 3D simulations. The LLNL High Performance Computing (HPC) team responsible for providing best-in-class resources for scientists realized it was time to create a new paradigm to acquire new scalable, dynamic, easy to create, install and maintain commodity Linux compute clusters.
So the HPC team gathered and aggregated requirements from multiple programs at the Laboratory and determined the need to create four new dedicated compute resources. These programs all required the ability to run MPI based jobs from 16 to 1024 MPI task counts with 2.0 GB of memory per task and low latency (<3 µs), high bandwidth (>1.8GB/s bidirectional delivered bandwidth) with more than 2.0 million of messages processed per MPI task per second. Another key factor in the LLNL applications requirements was for high delivered memory bandwidth. These new scalable Linux clusters would be comprised of hundreds to thousands of the most powerful compute nodes available, interconnected for high throughput. Because multiple clusters would be purchased, integrated and operated on fairly tight budgets, the LLNL HPC team had to find a way to significantly lower the Total Cost of Ownership (TCO) for these new systems. In addition to the programmatic requirement for a quick infusion of badly needed compute capability the LLNL HPC team had to contend with a very aggressive time frame of 3 – 6 months to purchase, install, integrate and bring the new resources into production status.
The Solution
With a clear internal mandate the LLNL HPC team set off to create a new breed of high performance commodity Linux clusters, to meet the extreme programmatic demand for large-scale simulations, key to advancing multiple scientific programs at LLNL. Due to the large memory capacity and bandwidth requirements of scientific applications at LLNL, the four clusters would need to have AMD Opteron 4-socket nodes with dual core processors to handle the sophisticated multi-programmatic, native 64-bit applications requirements of the scientists. Given the high bandwidth, low latency and high messaging rate requirements of the LLNL MPI based applications, InfiniBand™ 4x Single Data Rate interconnect with Mellanox adapters and Voltaire 24 port and 288 port switches were chosen. In order to significantly lower the TCO and time to deploy multiple clusters of widely ranging sizes, a new concept in Linux clusters was developed. This concept is based on two fundamental observations: 1) all Linux clusters need to have a scalable infrastructure; 2) the commodity ecosystem is really good at producing a large quantity of relatively simple items.
The solution is to build Linux clusters out of a highly replicated Scalable Unit (SU). This SU is architected to maximize the number of compute nodes, but allows for the necessary infrastructure nodes to be accommodated in a way that allows well balanced clusters of widely varying node or SU count to be built by simply interconnecting multiple SUs together with one or more IBA second stage switches. Because users login directly to the clusters to perform job setup, submit jobs, check the progress of jobs, and visualize the scientific output from simulations, there is a need for login nodes that are not part of the compute pool
and have additional 1 and 10 Gb/s Ethernet links to other components of the LLNL simulation environment. In order to improve hardware Mean Time Between Failures (MTBF) LLNL decided to go diskless on the compute nodes. However, the root, swap and tmp directories needed to be served up by a remote partition server (RPS) via NFS/Ethernet or SRP/IBA. The RPS node would have a highly reliable and high IOP rate RAID5 array of SATA disks for this purpose. In order to provide high bandwidth access to the Lustre global parallel file system, gateway nodes were required to interface the compute nodes on IBA with the 10 Gb/s Ethernet infrastructure that the Lustre metadata and object storage systems were on. By defining a SU with 138 compute nodes, 1 Login, 4 gateway, and 1 RPS node for a total of 144 nodes, this fits naturally into the IBA first stage interconnect infrastructure.
LLNL Scalable Unit is comprised of 144 nodes that maximizes the number of compute nodes and allows the infrastructure nodes (Login, Gateway, Remote Partition Server) to scale as clusters are built up from multiple SU.
With this SU definition, LLNL could procure a large number of Scalable Units and then build four well balanced clusters for different programmatic missions. For Multi-Programmatic and Institutional Computing, LLNL deployed a capacity cluster called Zeus with 2 SU and a capability Linux cluster called Atlas with 8 SU. For the Stockpile Stewardship and ASC programs LLNL deployed two capacity clusters, Rhea with 4 SU and Minos with 6 SU.
Rhea at 22.2 teraFLOP/s was the first to be brought into production and Minos at 33.2 teraFLOP/s was the last. Together these two Linux clusters deliver over 55 teraFLOP/s peak processing capacity for scientists with the National Nuclear Security Administration’s (NNSA) Advanced Simulation and Computing (ASC) Program. These scientists look at complex challenges of nuclear stockpile stewardship, and safety, security and reliability of the nation’s nuclear deterrent without underground testing. Both supercomputers will be used in the second phase of the Reliable Replace Warhead (RRW) design process. In addition, LLNL will be using these clusters for a complex series of calculations of multiple warheads in order to quantify the margins and uncertainty (QMU) of the designs. A recent QMU study of one weapon on ASC Purple required 4,400 runs and three months to accomplish.
“With this new scalable unit design for Linux clusters we were able to field four clusters ranging in size from 288 to 1,152 nodes in less than six months,” said Mark Seager, who leads LLNL’s Advanced Computing Technology program to develop new platforms. “This represents a breakthrough in our ability to quickly and cost effectively deliver new capacity and capability to our demanding programmatic requirements.”
LLNL’s Atlas was ranked the 19th fastest computer in the world according to the Top500 list of supercomputers.
Zeus with its 11 teraFLOP/s peak processing capacity and Atlas with its 44 teraFLOP/s of peak processing capability were deployed shortly after Rhea to provide capacity and capability cycles for multi-programmatic research in climate modeling, protein folding, material modeling, dislocation dynamics, atmospheric ground flow and earthquake simulations. The first round of 7 tier one and 10 tier two Atlas Grand Challenge Projects have been allocated time on the machine to perform breakthrough science with large scale simulations in dislocation dynamics to understand the physics and chemistry of how materials fracture, study the potential for new states of matter in highly compressed metals, accurate 3D laser backscatter in National Ignition Facility (NIF) ignition targets, 3D studies of Class 1A supernova, predicting properties of Plutonium with dynamical mean field theory, initial studies of atomistic simulations with quantum-level accuracy, and Quantum Chromodynamics.
“These LLNL’s clusters are further proof that 4-socket servers are optimized to power serious research,” said Kevin Knox, vice president, Worldwide Commercial Business, AMD. “AMD and partners like Appro have helped bring powerful solutions that when clustered can form formidable supercomputers. Customers like LLNL are recognizing they can get the price/performance-per-watt and scalability, particularly when they see the seamless upgrade path to native quad-core processing that only AMD64 with Direct Connect Architecture can deliver.”
Results
These clusters have performed far beyond original expectation. For instance, with the Minos 6 SU installation, the last 2 SU were delivered on a Thursday, the whole cluster was up and running large jobs in the LLNL synthetic workload tests by Saturday and a three hour full system Linpack run at 83% of peak was completed on Monday on the first attempt without node failures. Since the Zeus, Rhea and Atlas clusters went into production early in calendar year 2007, LLNL has delivered over 43.4 million CPU hours to the programs and realized, on average about 95% utilization on all the clusters. These clusters have seen between 100 and 150 hours of non-stop use with only an 0-2% downtime. That’s right, Zeus in 152 days of continuous service with over 95% utilization has never gone down. The largest job run on Atlas used 1,100 out of the 1,104 compute nodes (8,800 MPI tasks with one MPI task per core) and ran for 3 weeks doing a very high priority 3D NIF laser plasma interaction simulation. In addition, a dislocation dynamics simulation using 1,024 (8,192 cores with one MPI task per core) ran for ten days. “Scalable Linux clusters built from the Scalable Unit design scale better than machines based on custom interconnects up to 8,192 MPI tasks on our real world applications. That coupled with the exceptional good MTBF of these clusters allow us to accomplish more science on multiple projects and that is what it is all about,” Seager commented.
Summary
Lawrence Livermore National Laboratory has been able to make high end computing available to a broader base of researchers. Armed with these new scalable compute resources, time to results just keeps getting shorter and scientists are able to be more productive, accomplish more cost effectively.
Appro delivered exactly what Lawrence Livermore National Laboratory HPC IT team and scientists required a new breed of highly scalable, dynamic, reliable and effective Linux clusters to create the next generation of supercomputers. As this story is being written, Atlas was just ranked as being the 19th most powerful supercomputer in the world and Appro continues to be at the forefront of HPC history with customers like LLNL.
About Lawrence Livermore National Laboratory
Established in 1952, Lawrence Livermore National Laboratory (LLNL) is a premier applied science laboratory, part of the National Nuclear Security Administration within the Department of Energy. LLNL is responsible for ensuring the nation’s nuclear weapons remain safe, secure, and reliable through application of advances in science and engineering. With its unique capabilities, the Laboratory meets other pressing national security needs, including countering the proliferation of weapons of mass destruction and strengthening homeland security against the terrorist use of such weapons.
The Laboratory is an international leader in many areas of science and technology and undertakes significant research programs in energy, environment, bioscience, biotechnology, and basic science and advanced technology. Since so many of its projects require huge volumes of data analysis and hosting of highly complex proprietary applications the need for more powerful simulation compute resources has been an integral requirement since the 1980’s.
About Appro
Appro accelerates technical applications and business results unlocking the value of IT for the high-performance and enterprise computing markets environment through differentiated performance balanced architecture, open standards, and engineering expertise. Appro is a leading developer of innovative, high-performance, density-managed servers, cluster-solutions, storage subsystems, and high-end workstations for the high-performance and enterprise computing markets.
Appro’s headquarters is in Milpitas, CA with an R&D and manufacturing partner in Asia and a sales and service office in Houston, Texas.
|
|
|
|
|
|

|





 |
 |



Appro is focusing its product design to address the HPC cluster
market and key customer requirements including system management,
high availability and price/performance for HPC applications.
Appro has shown the ability to win highly sought-after, large-scale
HPC deals positioning the company to benefit from strong market
growth that IDC projects through 2012. Earl
Joseph,
IDC Program Vice President, Technical Computing
|
 |
 |
|



 |
 |

Customer Quote

“Appro not only offered us a cost effective solution but
they also improved our required technical specification through
better reliability, greater fault tolerance and redundancy as
well as more flexibility with regards to system scalability.
Bob Bell,
Technical Director, ING Renault F1 Team

|
 |
 |
|



|
|