GigaIO Reports on Interconnect Technology Performance
Note: This article originally appeared on insidehpc.com, click here to read the original piece.
Carlsbad, California, April 29, 2025 – Edge-to-core AI platform GigaIO has unveiled AI training, fine-tuning, and inference benchmarks that demonstrate the performance, cost and power efficiency of GigaIO’s AI fabric compared with RDMA over Converged Ethernet (RoCE).
According to the company, results include 2x faster training and fine-tuning and 83x better time to first token for inferencing, demonstrating how smarter interconnects can have a transformative impact on AI infrastructure.
As AI models grow more complex, interconnect inefficiency presents an unexpected critical bottleneck. Testing showed that GigaIO’s AI fabric outperformed traditional RoCE Ethernet in every AI workload, and can enable organizations to:
- Train models twice as fast
- Reduce time to first token by 83.5x for instant user response
- Cut power consumption by 35-40% without sacrificing performance
- Deploy multi-GPU clusters faster and more easily
- Achieve reduced infrastructure costs due to simpler hardware configurations
Throughout the testing, the same GPUs, servers, operating systems, and application software were used, with only the interconnects varied to isolate the differences they contributed.
The PCIe-native design of GigaIO’s AI fabric enables organizations to achieve target performance with fewer GPUs and lower power consumption, and eliminates the need for additional networking hardware such as NICs and Ethernet switches, further reducing energy use. Tests show RoCE systems require 35-40% more hardware (and energy) to provide equivalent performance.
Unlike RoCE, GigaIO’s AI fabric eliminates protocol overhead and complex RDMA tuning, simplifying system setup with seamless GPU discovery and minimal tuning requirements. In contrast, RoCE demands extensive configuration and troubleshooting to achieve suboptimal performance. “With GigaIO, we spend less time on infrastructure and more time optimizing LLMs,” said Greg Diamos, CTO of Lamini, an enterprise custom AI platform.
GigaIO’s AI fabric achieved better results than RoCE across the entire AI work chain. Training and fine-tuning achieved better GPU utilization in multi-GPU setups, with 104% higher throughput in distributed training scenarios compared with RoCE. And in inferencing, for models like Llama 3.2-90B Vision Instruct, GigaIO’s AI fabric reduced Time-to-First Token (TTFT) by 83.5 times, significantly improving responsiveness for interactive AI applications like chatbots, vision systems, and RAG pipelines, which responded in milliseconds vs. seconds.
For the large model Llama 3.2-90B Vision Instruct, GigaIO’s AI fabric achieved 47.3% higher throughput and was able to handle the same user load with 30-40% less hardware than RoCE. In a 16-GPU AMD MI300X cluster, GigaIO’s AI fabric delivered 38% higher training throughput and superior GPU utilization, enabling faster convergence on large-scale models.
“Our AI fabric isn’t just faster, it’s cheaper to deploy and operate,” said Alan Benjamin, CEO of GigaIO. “Teams report 30-40% lower power consumption, making it a compelling alternative to traditional Ethernet-based interconnects for organizations facing power constraints or seeking to optimize AI infrastructure costs. Our AI fabric enables faster time-to-value and more scalable AI deployments by delivering superior performance while consuming less power.”
Review all test results in the “Smarter Interconnects for Power-Constrained AI” white paper here.
View source version on insidehpc.com: https://insidehpc.com/2025/04/gigaio-reports-on-interconnect-technology-performance/