SHPCP Talk: What Is Disaggregated Composable Infrastructure and Should You Care?
Note: This article originally appeared on HPCwire, click here to read the original piece.
December 14, 2020
What’s your take on disaggregated composable infrastructure? Do you know what it is? Speaking at the Society of HPC Professional’s annual technology (virtual) meeting last week, Earl Dodd, HPC and HPDA architect at technology services giant World Wide Technology, delivered a strong pitch for DCI as the wave of the future for mainstream HPC.
DCI is another version of software defined everything that takes aim at the datacenter. Dodd argued DCI can slash latency and boost TCO.
“We’ve done a really good job in HPC of trying to scale out. OK, we used to scale up, [but] we don’t scale up anymore. We used to have the big SMPs and the CC-NUMA machines and then we got away from that. Now, I see a lot of 4-processor, 8-processor, 16-processor systems. I see these big systems growing again. So how do you seamlessly [and] dynamically put this whole scale-up capability together with that scale-out capability? I believe only forms of disaggregated composability will allow us to drive that,” said Dodd.

Leaving aside supercomputers and the hyperscalers – vast creatures in their own rights – Dodd took aim at commercial and more mainstream HPC infrastructure. The culprit stymieing modern HPC performance and adoption, he maintained, is latency which in turn is depressing utilization rates which in turn reduces TCO.
Dodd cited Nvidia CEO Jensen Huang’s recent mantra – “[Jensen says] in the datacenter world, and that includes HPC, all things computing must be disaggregated, composable and accelerated. I totally agree with that. I’m mainly talking about the capability market and specifically, the capability market in the cooperation that’s going on with HPC, big data, AI, ML, and DL [where] we’re seeing very, very low utilization rates.”
That change is afoot in HPC is something no one can deny. Dodd and his colleague Zach Splaingard have written a brief paper (Primer Series: Rack-Scale Composable Infrastructure) and argue in it:
“As more applications introduce support for accelerators (GPUs and FPGAs), which can reduce time to result from weeks to literally minutes, users are clamoring for more of these expensive resources. Yet, industry data shows they are only utilized 15 to 20 percent of the time, stranded behind the traditional data center’s rigid architecture.
“Legacy data center infrastructures were not designed for today’s workflow requirements. The scalable modern data center needs a solution that can integrate compute, storage and other communication I/O into a single-system cluster fabric, scaling resources up and out across the cluster as needed. This solution should free resources from their silos to be shared with other network users who draw from these resource pools through a disaggregated composable infrastructure (DCI), an emerging category of infrastructure designed to maximize IT resource usage and improve business agility.”
Broadly, this is not a new idea. It’s virtualization by another name. New software and getting rid of traditional network bottlenecks are among the key enablers, argued Dodd. The PCIe bus (or something like CXL or Gen-Z), contended Dodd, is an excellent candidate for replacing InfiniBand/Ethernet at many points in today’s datacenter.
“In the old days, I’ve got my nodes – thin nodes, fat nodes, high nodes, half-nodes, whatever you want to call them. I’ve got a computing element on computing system on the left side (slide above), and I got thingies in it. That’s a very technical term, by the way thingies. So, compute, memory, GPUs, FPGAs, smart NICs, etc. The idea is those can be disaggregated, and put into other systems, into other enclosures, other boxes, and treated as dynamically available resource pools,” said Dodd in his SHPCP talk.
“If you look at this chart, I’ve got a server on one side and server on the other side. We’re using something [like this] on the right at the labs now. We’ve been actively testing GigaIO‘s FabreX environment. In the old days, you had to go all the way down to that NIC layer, and you had to get across some form of a network and get back up and talk to something. And that’s even when you had to pool multiple nodes together. Okay, so we create fatter nodes, 4U boxes, and 7U boxes and 9U enclosures, and then put a midplane in there and put these things together. That whole thing is eliminated when I can talk PCIe-to-PCIe,” he said.
Currently, said Dodd, there are many misconceptions around DCI vendor lock-in, performance penalties, and whether DCI will sort of happen on its own.
“A lot of people are very much worried about vendor lock-in. You’ll hear a lot about, “Oh, I have a composable infrastructure,” but then they’ll say, and “I have a midplane, or I have a backplane and thou shalt only have stuff to fit into my mid plane, or backplane.” That creates vendor lock-in. And there is this concept of a tax, as in performance penalty associated with it by going through another midplane, another set of chipsets, another translation that goes on in there and that just adds latency. Latency overall is the big killer. It’s not the bandwidth, it’s the latency, latency, latency.
“[Another misbelief] is that composability is going to happen on its own, it’s going to have the Darwin effect; if we’ve gone from a converged infrastructure to hyperconverged infrastructure, [then] composability will naturally come out of that. It’ll happen on its own. Wrong, that is not going to happen. It has to be actually thought [through], because there is a lot of software needed to help make software-defined datacenter [actually become] software defined. Another misconception is you have to be close: you know, the memory must stick right next to the processor. The GPU must be really close and in the same box as the processor, the FPGA, or the persistent memory. Or you need the smart NIC sitting right next to the box. That is completely wrong. Within reason today I can disaggregate things up to about 30 meters, and still maintain the same performance characteristics. Now, that is a game changer.”
One important issue, said Dodd, was the tendency of the HPC community to think of itself as defined by workloads; today that is anachronistic, he suggested. What matters more, he argued, is workflow because it directly influences latency. Disaggregating resources in the manner described he says will drive down latency and drive up HPC utilization.
There was certainly a product pitch element to Dodd’s presentation and not too much detail. That said, it was fascinating and he was quick to emphasize that the GigaIO technology he was using as an example was based on open standards including Redfish and PCIe. Others will certainly develop their own version of DCI technology. Indeed Liqid’s composable architecture creates a flexible shared resource pool over the PCIe bus.
“This allows you to take almost any of these nodes and make some of the resources available to other node types in the system via the PCIe switch,” said George Moncrief, chief technologist at ERDC’s DoD Supercomputing Resource Center (DSRC), in an interview with HPCwire.
It will be interesting to watch how DCI writ large develops.