Dell HPC Dr. Jeffrey Layton (
[email protected]) Enterprise Technologist - HPC
GPU Computing at Dell
Dell HPC
GPU Computing Approach • Hardware changes rapidly – New CPUs – New GPUs
– New Interconnects – New software
• All of these happen at different rates and at different times • GPU applications are evolving very rapidly
• How do you adapt to these changes? How do you protect your investment? How do you adapt to new and evolving applications? • Be Flexible
3
Dell HPC
Great example of flexibility
• From initial development to “final” code version – performance improves by a factor of 9! • Software changes during development results in hardware changes 4
Dell HPC
Implementation • Develop on something smaller such as a laptop or workstation • Deploy production applications onto cluster • For cluster deployments: – Move GPUs to external PCIe chassis
• Allows CPUs and GPUs to be changed independently • Allows network to be changed independently • Optimize power and cooling for GPUs and CPUs separately
• Add GPUs to host nodes as applications evolve – It may be 1 GPU today and 8 GPUs tomorrow
5
Dell HPC
Dell C410x • 3U PCIe chassis – 16 slots (10 in front, 6 in back) – all x16 – 8 PCIe connections to host nodes (1-8 slots per connection)
• Redundant power supplies (4x 1400W) • BMC (IPMI 2.0) on-board 6
Dell HPC
Host nodes: • C6100:
• C6145:
• • • •
4-in-2U 2S Intel with IB mezz card (x8) PCIe x16 HIC card Redundant power
• 2x 4S AMD boards in 2U • (4) x16 slots –
3 are open
–
1 has iPASS connector
• IB mezz card (x8) • Redundant power
7
Dell HPC
Host/GPU combinations • Many combinations are possible – Intel or AMD? – How many GPUs per node?
– How many lanes per GPU?
8
Dell HPC
Internal vs. External: NAMD NAMD – STMV Benchmark 1.2
Steps/Second
1
0.95 0.82
0.8 SuperMicro (2) C410x / C6100 (2)
0.6 0.4 0.2
0 STMV
9
Dell HPC
Internal vs. External: CUDASW++ CUDASW++ 30
GFLOPS
25 20 15
C410x / C6100 (2) SuperMicro (2)
10 5 0
Query Length
10
Dell HPC
Scalability: NAMD NAMD 1.52
1.6
Steps/Second
1.4 1.2 1
0.84
0.8 0.6
0.47
0.4 0.2
0.95
CPU C410x / C6100 (1) C410x / C6100 (2) C410x / C6100 (4) SuperMicro (2)
0.10
0 STMV
11
Dell HPC
Impact of CUDA versions • Heisenberg Spin Glass (HSG) Model – Spin Glass modeling is a technique used in statistical mechanics to simulate and predict the behavior of various physical phenomena
• HSG is multi-GPU capable using MPI – Recent upgrade to CUDA 4.0
• Two code versions: – MPI based › GPUs communicate by sending data to host, then to approproate GPU
– CUDA 4.0 › GPUs communicate directly (no host)
• Compare performance
12
Dell HPC
HSG results • CUDA 4.0 (GPU Direct) is 15-30% faster than MPI • For Intel systems, GPU Direct requires all GPUs to be connected to the same IOH • C410x allows you to expand to multiple GPUs per single IOH
13
Dell HPC
Data Management and Storage
Dell HPC
Realities • HPC storage is about 15-25% the cost of a system but about 90% of the problems • HPC Storage is about Solutions not just hardware – Hardware, file system, client, management/monitoring, documentation, best practices, sizing and performance guidance, services and support
• There are no one, two, or even three file systems/solutions that satisfy the various requirements – Recent IDC study: 25 customers = 13 file systems
• Applications/Processes drive solutions (just like compute). But – Very few customers understand the IO characteristics of the apps
• Access frequency requirements don’t match the underlying storage platform – A very large percentage of data is never touched approximately 2-4 weeks after it is created 15
Dell HPC
HPC Storage Solutions Aren’t Easy • Ignoring Cost – name the Top 3 storage attributes 1. Performance
2. Reliability 3. Capacity • Difficult or impossible to get all 3 attributes in a single solution with HPC price constraints
• Can we get all 3 attributes in different solutions and integrate them? – Maintain attributes and improves flexibility and increases options 16
Dell HPC
Flexibility, Adaptability, and Options • The performance importance of data changes over the life of the data – At first, performance is very important – After a period of time, the performance is less important
• Why keep data on high-performance storage that isn’t being used? • Based on applications and performance importance there are three basic categories of data requirements: 1. Fast Scratch •
Performance, performance, performance
2. Primary (/home) •
Reliability
3. Long-term •
17
Capacity (very little performance)
Dell HPC
Dell’s approach to deliver HPC storage solutions • Dell is delivering solutions using two approaches: – Complete solutions - Fully vetted, tested, supported › Come with end-to-end support from Dell and partners › Detailed documentation including best practices, performance and sizing guidance
› Deployment services if necessary
– Roll-it-your-own › Dell creates technical whitepapers containing: – Recommended configurations – Details on configuration – Best practices and sizing guidance
› Customer buys hardware and uses whitepapers as a reference guide › Full Dell warranty and support on Dell components – Limited or no deployment services; no solution type services
• Overtime, deliver building blocks that will integrate into the larger storage ecosystem 18
Dell HPC
Fast Scratch Storage • Requirements: – Very fast (above 1.4 GB/s) – more than NFS
– Scalability in performance and capacity – Cost effective – Reliability is not necessarily a primary requirement
• Roll-Your-Own reference configurations and supporting data Cambridge University Developed Lustre Reference Configuration – Detailed whitepaper discussing architecture and performance analysis of the Lustre solution deployed at University of Cambridge
– The deployment steps and best practices listed in the paper can be used to architect similar Lustre solutions using Dell server and storage products – Currently work under progress to develop a reference architecture using latest generation Dell PowerEdge servers and PowerVault storage
• Complete Dell HPC Fast Scratch Solutions Dell | Terascala High Performance Computing Storage Solution (DT-HSS) – Third generation Lustre solution from Dell and Terascala referred to as DT-HSS3 – Utilizes Dell’s latest generation 6Gb/s SAS based PowerVault MD series storage
19
Dell HPC
The DELL | Terascala HPC Storage Solution (DT-HSS3) • Unique scale out storage appliance for throughput intensive applications • Fully supported storage appliance that leverages Lustre, industry’s leading open-source parallel file system • Simple, linear scalability – Up to 6.2 GB/s of read and 4.2GB/s write throughput per base object pair. Scale aggregate performance by adding object pairs. – 48TB to Petabytes in a single name space – Pre-defined configurations from 48TB to 336 TB in a single rack – (building blocks) – Configurations serve as building blocks for larger and faster solutions
• Rich management including hardware and file system monitoring
Metadata Storage Server (MDS) Pair
Object Storage Server (OSS) Pair
– Automated Install & Maintenance , Health Monitoring, Failover Solution, Root Cause Analysis
20
Dell HPC
Primary Storage • Requirements: – Performance is usually not a big deal – Reliability is important – Ease of use is important
• Typical usage for home directories, user data, application data and results • NFS is a widely used protocol for such use case • Roll-Your-Own reference configurations and supporting data: – Dell PowerVault MD1200 as a Network File System Backend Storage Solution – Optimizing Dell PowerVault MD1200 Storage Arrays for High Performance Computing (HPC) Deployments
• Complete Dell HPC NFS Storage Solutions – Dell HPC NFS Storage Solution (NSS) › Leverages Dell PowerEdge and PowerVault storage › 24-96TB (raw storage) in a single namespace using Red Hat XFS file system
› Dell developed tuning and best practices 21
Dell HPC
The Dell HPC NFS Storage Solution • Takes the guesswork out of NFS configurations – Appliance approach to inexpensive NFS solutions
• Range of capacity: – Up to 96TB in a single namespace
• HA Configuration options • Good performance – Up to 1.47 GB/s for writes and 2.4 GB/s for reads for NFS performance – 6Gbps SAS, optional IB or 10GigE – Tuned storage and file system configurations
• Cost Effective • Reliable and supported – Proven hardware – 3 years support with Dell including XFS support – Redundant power supplies, connections, plus drive spares kit
NFS Gateway
Storage – MD1200
… Expansion MD1200’s
• Easy to install – Dell configuration and deployment: Whitepaper and Dell PS – Affordable installation services available 22
Dell HPC
Benefits of Dell NSS • Performance tuned NFS server – Best possible performance – No need to experiment with tuning options – already tuned 1400000
1200000
Througput KB/s
1000000
30%
800000
tuned 600000
not tuned
400000
200000
0 2
4
8
12
16
24
32
Clients
23
Dell HPC
NSS Options Common Aspects • NFS Gateway – – – – – –
Dell Server (R710) RAID-1 for OS (plus 1 hot-spare) RAID-0 for additional swap space 3 years of support on OS, file system, hardware Cold spares (disks) IB, 10GigE options
– RHEL 5.5 OS – Redhat Scalable File system (XFS) – Dell ProSupport
NSS • Single NFS Gateway – Perc H800 RAID card(s) in NFS gateway › Dell MD1200 JBOD’s connected to RAID cards – RAID-60 or RAID-60+LVM 24
NSS-HA • Two Active-Passive NFS Gateways – Dell MD3200 RBOD contains RAID card – Dell MD1200 JBOD’s are connected to RBOD – RAID-6 + LVM Dell HPC
NSS Large Solution: 96 TB’s
QDR IB or 10GigE
Summary Raw capacity: 96TB Formatted capacity: ~80TB RAID-60 and LVM RAID-6 within each MD1200 RAID-0 across MD1200 pairs LVM to combine LUNS
10GigE NFS Performance Peak Sequential Read: 850 MB/s Peak Sequential Write: 1,180 MB/s
InfiniBand NFS Performance Peak Sequential Read: 1,350 MB/s Peak Sequential Write: 1,470 MB/s
25
Dell HPC
NSS-HA: Large 1
Dell R710 NSS-HA Server
1
Dell 710 NSS-HA Server
Summary Raw capacity: 96TB Formatted capacity: ~80TB RAID-6 and LVM RAID-6 within each MD3200/1200 LVM to combine LUNS
PowerVault MD3200
10GigE NFS Performance
PowerVault MD1200
InfiniBand NFS Performance
Peak Sequential Read: 560 MB/s Peak Sequential Write: 1,130 MB/s
Peak Sequential Read: 2,430 MB/s Peak Sequential Write: 1,274 MB/s
GigE
Power Cords IB or 10GigE SAS (6Gbps)
26
Dell HPC
Summary • Two most recent trends: • GPU Computing – GPU Computing is still evolving › Hardware (CPUs, GPUs, Interconnect), and software (CUDA)
– Best course of action is to remain flexible – Ability to upgrade CPUs or GPUs or software independent of each – External PCIe chassis affords flexibility › Good host nodes
• Data Management and Storage – Overall it’s the largest problem for users today – Focus on performance (fast-scratch), reliability (primary), and capacity (long-term) › Develop a product for each piece and integrate them together
– Roll-it-your-own and Fully supported solutions are available – Tools for data management are becoming highly critical 27
Dell HPC
Thanks!