|
Overview - NVIDIA® Tesla™
GPU HPC Solutions
Appro offers the option
of creating scalable supercomputing
clusters using NVIDIA Tesla
GPU to solve advancing high
performance computing with
a massively multi-threaded
architecture. With 128-processor
computing core per GPU,
C-language development environment
for the GPU, a suite of
developer tools as well
as the world’s largest
GPU computing ISV development
community, Tesla allows
scientific and technical
professionals the opportunity
to expand their ability
to develop applications
faster and to deploy them
across multiple generations
of processors. This type
of accomplishment was previously
impossible with current
computing approaches.
By developing
a parallel architecture
from the ground up, NVIDIA
has designed its new Tesla
computing products to meet
the requirements of HPC
software. The features include
a Thread Execution Manager
to coordinate the concurrent
execution of thousands of
computing threads and a
Parallel Data Cache enabling
computing threads to share
data easily, delivering
results in less time. |
|
| NVIDIA
Tesla |
Open View
 |
Offering
a Compatible Solution option
--
Xtreme-X1 NVIDIA Tesla Certified
As an industry-standard solution,
the Tesla 8 and 10 series
GPU computing system easily
fits into existing HPC environments.
It is used in tandem with
multi-core CPU systems, Tesla
solutions provide a flexible
computing platform offering
30 to 50% of performance boost
in specific types of applications.
Tesla-10 Series is the latest
processor lauched by NVIDIA
and offers 64-bit double-precision
floating point support. This
upgrade is designed for high
performance computing customers
who make heavy use of mathematical
operations. |
NVIDIA
CUDA C Programing Technology
It simplifies many-core programming
and enhances performance by
off-loading computationally-intensive
activities from the CPU to
the GPU. It enables developers
to utilize NVIDIA GPUs to
solve the most complex computation-intensive
challenges such as protein
docking, molecular dynamics,
financial analysis, fluid
dynamics, structural analysis
and many others. The world's
only C-language development
environment for the GPU, the
NVIDIA CUDA software development
kit includes a standard C
compiler, hardware debugger
tools, and a performance profiler
for simplified application
development. |
Appro Xtreme-X1 Supercomputer
- Configuration and Performance
| Major
Supercomputer Components |
| Appro Compute
Blade |
- Dual Socket,
Dual or Quad-Core processors
- 16 DIMM slots
- Optional On Board HDD
- PCI-E 16x Gen2
- Dual Port IB and GbE |
NVIDIA
S870
- Product
Datasheet- pdf |
- Four Tesla GPU's (128
thread processors per GPU)
- 6GB of system memory (1.5
GB dedicated memory per
GPU)
- Standard 19" 1U rack-mounted
chassis
- Connects to host via cabling
to a low power PCI-E x8
or x16 card
- Configuration: 2 PCI-E
connectors for 2 GPUs each
(4 GPUs total)
- Cuda Technology - The
CUDA™ C programming
environment |
NVIDIA S1070
- Product
Datasheet- pdf |
- Four Tesla GPU's
- 960 computing cores (240
cores per processor)
- IEEE 754 single and Double
floating point Precision
- 16GB (4GB dedicated memory
per GPU)
- 408 GB/sec (102 GB/s per
GPU to local memory bandwidth
- Connects to PCI-E x16
Gen 2 card w/ Extender
- 2 GPU per PCI-E connector
- PCI-E switch internal
- Cuda Technology - The
CUDA™ C programming
environment
|

| GPU
Supercomputer Configuration |
| Compute Resources |
256 Compute
Servers and 128 S870 GPUs
2048 X86 Dual Core or 4096
Quad Core Cores
Up to 512 GPGPU processors |
| Performance |
13.5TF peak X86 Dual Core
Performance
22.8TF peak X86 Quad Core
Performance
More than 256TF peak GPU based
performance
40Gb per second sustained
BW per blade
Sub 1.6us latency blade to
blade communication |
| Memory/Storage |
Up to 16TB of DDR2 ECC memory
96 TB I/O Node Hard Drive
Storage |
| Power/Cooling |
Compute Rack 24-29kW –
7 Tons
GPU Rack 17-18kW – 5
Tons |
| Reference
Design Architectures |
| Number of Racks |
3 |
5 |
10 |
18 |
35 |
| Number of Blades |
64 |
128 |
256 |
512 |
1024 |
| Numbers of GPUs |
32 |
64 |
128 |
256 |
512 |
| Peak GPU Performance |
64TF |
128TF |
256TF |
512TF |
1024TF |
| Max Memory Capacity |
4TB |
8TB |
16TB |
33TB |
74TB |
| Max Node Latency |
<1,6US |
<1,6US |
<1,6US |
<1,6US |
<1,6US |
| Max Node Storage
BW |
40Gbs |
40Gbs |
40Gbs |
40Gbs |
40Gbs |
Benefits of using NVIDIA Tesla
Massively Multi-Threaded Computing
Architecture - Executes thousands
of concurrent processing threads
for high throughput parallel processing
of mathematically intensive problems.
NVIDIA GPU Computing Drivers -
Management of the GPU resources
and an extensive runtime library
for enhanced data management and
program execution. Offers a high
speed data transfer path and streamlined
driver for computing, independent
of the graphics driver.
Multi-GPU Comupting - Multiple
Tesla GPUs can be controlled by
a single CPU via the GPU computing
driver, delivering incredible
throughput on computing applications.
The power of the GPU to solve
large-scale problems can be multiplied
by splitting the problem across
multiple GPUs.
For more info on GPU Cluster contact
sales@appro.com
or submit a quote request.
|