Beyond the Box: Scaling GPUs

Note: This article originally appeared on hpcwire.com, click here to read the original piece.

November 14, 2023
The performance of GigaIO’s SuperNODE™ was described in a previous HPCWire article. Accessing 32 GPUs or accelerators (from any vendor) over a shared PCI bus provides a unique method for application acceleration. Recently, the SuperNODE received a 2023 HPCwire Editors’ Choice Award for Best AI Product or Technology and made the Editors’ Top 5 New Products or Technologies to Watch list.

Most commercial motherboards allow four (and sometimes eight) GPU cards on the local PCIe bus. The GigaIO SuperNode takes this to a new level using their FabreX network; they can present 32 (and now 64) GPUs to the host processor. FabreX provides the ability to create composable systems that allocate PCIe devices across multiple servers.

GigaIO has shared recent benchmarks for the new 32 GPU SuperNODE and the just announced 64 GPU SuperDuperNODE. The system used for the benchmarks contained 32 or 64 AMD Instinct™ MI210 GPUs connected over GigaIO FabreX™to a 1U server with dual AMD EPYC™ Milan.

GigaIO’s SuperNODE system was tested with 32 AMD Instinct MI210 accelerators on a Supermicro 1U server powered by dual 3rd Gen AMD EPYC processors. The first set of benchmarks used Hashcat and Resnet 50.

Hashcat: Workloads that utilize GPUs independently, such as Hashcat, should scale perfectly linearly to the 32 GPUs tested.
Resnet 50: For workloads that utilize GPU Direct RDMA or peer-to-peer, such as Resnet50, the scale factor should be slightly reduced as the GPU count rises.

The results in Figures One and Two show the scalability of the 32 GPU SuperNODE. The results demonstrate significantly improved scalability for ResNet 50 compared to the legacy alternative of scaling the number of GPUs using multiple servers and MPI to communicate between individual nodes. Benchmarks indicate that when testing a multi-node MPI model, GPU scalability is reduced to 50 percent or less.

Figure One: SuperNode HashCat Scaling (Source GigaIO)

Figure Two: SuperNode ResNet 50 Scaling (Source GigaIO)

Double the GPUs

The scalability of the 32 GPU SuperNode demonstrated exceptional performance and scaling. The SuperDuperNode brings 64 GPUs to the single server. Many HPC users look to the Top500 benchmarks to measure system performance. Several HPC benchmarks were used to test the SuperDuperNODE performance.

HPL-MxP – measures the mixed precision type of workload used in high-performance computing (HPC) and artificial intelligence (AI) workloads.
HPL – solves a (random) dense linear system in double precision (64 bits) arithmetic (used for Top500 ranking)
HPCG – solves a Conjugate Gradient problem and stresses various aspects of the system, particularly memory performance.

Results for HPC-MxP

Figure Three: SuperDuperNODE HPC-MxP Scaling (Source GigaIO)

As shown in Figure three, the results for the HPL-MxP benchmark showed excellent scaling for reduced precision computes bandwidth, achieving 99.7% of ideal theoretical scaling.

Results for HPL

Results for the Top500 HPL benchmark resulted in 95.2% of ideal theoretical scaling. For parallel systems, the inter-node networks directly affect scalability (a slower network results in poor scalability)

Figure Four: SuperDuperNode HPL Scaling (Source GigIO)

Results for HPCG

The HPCG benchmark showed 88% scaling, an excellent result for memory scaling.

Figure Five: SuperDuperNODE HPCG Scaling (Source GigIO)

These results demonstrate significantly improved scalability compared to the legacy alternative of scaling the number of GPUs using MPI to communicate between multiple nodes. For example, to apply 64 GPUs to an HPC problem, a user would need sixteen 4-GPU servers connected to a high-speed network. When running benchmarks in this configuration, the GPU scalability is often reduced by 50 percent or less. In other words, half of the GPU computing performance is unavailable to the application.

View source version on hpcwire.com: https://www.hpcwire.com/2023/11/14/beyond-the-box-scaling-gpus/

First Name*
Last Name*
Email Address*
Phone Number*
Company*
Country*
Country
Preferred method of contact*
Message*
Email
This field is for validation purposes and should be left unchanged.

Beyond the Box: Scaling GPUs

Double the GPUs

Related Posts

GigaIO and d-Matrix Advance Strategic Collaboration to Build World’s Most Efficient Scalable Inference Solution for Enterprise AI Deployment

GigaIO and d-Matrix Advance Strategic Collaboration to Build World’s Most Efficient Scalable Inference Solution for Enterprise AI Deployment

GigaIO Partners with d-Matrix to Deliver Ultra-Efficient Scale-Up AI Inference Platform

Sign up for GigaIO News

Contact Us