GigaIO and d-Matrix Advance Strategic Collaboration to Build World’s Most Efficient Scalable Inference Solution for Enterprise AI Deployment

Note: This article originally appeared on aithority.com, click here to read the original piece.

SuperNODE™ platform, capable of supporting dozens of Corsair™ AI inference accelerators in a single node, delivers unprecedented scale and efficiency for next-generation AI inference workloads.

GigaIO, a pioneer in scalable edge-to-core AI platforms for all accelerators that are easy to deploy and manage, announced the next phase of its strategic partnership with d-Matrix to deliver the world’s most expansive inference solution for enterprises deploying AI at scale. Integrating d-Matrix’s revolutionary Corsair inference platform into GigaIO’s SuperNODE architecture creates an unparalleled solution that eliminates the complexity and performance bottlenecks traditionally associated with large-scale AI inference deployment.

“Combining GigaIO’s scale-up AI architecture with d-Matrix’s purpose-built inference acceleration technology delivers unprecedented token generation speeds and memory bandwidth, while significantly reducing power consumption and total cost of ownership.”

This joint solution addresses the growing demand from enterprises for high-performance, energy-efficient AI inference capabilities that can scale seamlessly without the typical limitations of multi-node configurations. Combining GigaIO’s industry-leading scale-up AI architecture with d-Matrix’s purpose-built inference acceleration technology produces a solution that delivers unprecedented token generation speeds and memory bandwidth, while significantly reducing power consumption and total cost of ownership.

Revolutionary Performance Through Technological Integration
The new GigaIO SuperNODE platform, capable of supporting dozens of d-Matrix Corsair accelerators in a single node, is now the industry’s most scalable AI inference platform. This integration enables enterprises to deploy ultra-low-latency batched inference workloads at scale without the complexity of traditional distributed computing approaches.

“By combining d-Matrix’s Corsair PCIe cards with the industry-leading scale-up architecture of GigaIO’s SuperNODE, we’ve created a transformative solution for enterprises deploying next-generation AI inference at scale,” said Alan Benjamin, CEO of GigaIO. “Our single-node server eliminates complex multi-node configurations and simplifies deployment, enabling enterprises to quickly adapt to evolving AI workloads while significantly improving their TCO and operational efficiency.”

The combined solution delivers exceptional performance metrics that redefine what’s possible for enterprise AI inference:

Processing capability of 30,000 tokens per second at just 2 milliseconds per token for models like Llama3 70B
Up to 10x faster interactive speed compared with GPU-based solutions
3x better performance at a similar total cost of ownership
3x greater energy efficiency for more sustainable AI deployments

“When we started d-Matrix in 2019, we looked at the landscape of AI compute and made a bet that inference would be the largest computing opportunity of our lifetime,” said Sid Sheth, founder and CEO of d-Matrix. “Our collaboration with GigaIO brings together our ultra-efficient in-memory compute architecture with the industry’s most powerful scale-up platform, delivering a solution that makes enterprise-scale generative AI commercially viable and accessible.”

This integration leverages GigaIO’s cutting-edge PCIe Gen 5-based AI fabric, which delivers low-latency communication between multiple d-Matrix Corsair accelerators with near-zero latency. This architectural approach eliminates the traditional bottlenecks associated with distributed inference workloads while maximizing the efficiency of d-Matrix’s Digital In-Memory Compute (DIMC) architecture, which delivers an industry-leading 150 TB/s memory bandwidth.

Industry Recognition and Performance Validation
This partnership builds on GigaIO’s recent achievement of recording the highest tokens per second for a single node in the MLPerf Inference: Datacenter benchmark database, further validating the company’s leadership in scale-up AI infrastructure.

“The market has been demanding more efficient, scalable solutions for AI inference workloads that don’t compromise performance,” added Benjamin. “Our partnership with d-Matrix brings together the tremendous engineering innovation of both companies, resulting in a solution that redefines what’s possible for enterprise AI deployment.”

View source version on aithority.com: https://aithority.com/machine-learning/gigaio-and-d-matrix-advance-strategic-collaboration-to-build-worlds-most-efficient-scalable-inference-solution-for-enterprise-ai-deployment/