All Checksum

Will CXL Revolutionize Server Architecture? | Checksum: Episode 22

I sat down with Keith Townsend for a special crossover episode of the Gestalt IT Checksum and CTO Dose at VMware Explore 2022. Our conversation focused on CXL, a new technology that promises to revolutionize server architecture by disaggregating memory, I/O, and accelerators from the CPU. Although this isn’t an entirely new concept (consider HPE’s “The Machine”, Liqid, and Nvidia’s NV-Link), CXL has broad industry support and the first products are hitting the market today. Will CXL revolutionize server architecture or is it just another tool in the arsenal for datacenter architects?

There’s a new technology coming called CXL, and it promises to revolutionize server architecture. But first it’s going to have a real impact on a few very specific applications. By 2025, servers are not going to look anything like they look today!

At first glance, CXL might not sound all that exciting. PCI express is already around: it’s the bus that’s inside your computer. You also might have a Mac or PC with Thunderbolt, and that’s actually PCI express over an external connection. This is pretty much what CXL sounds like, but it’s quite different in practice.

The idea of CXL is to hang different devices off of a PCI express bus so that they can be accessed by the host as if they were local devices. Sometimes they actually are local devices, like NVMe storage or CXL-based memory expansion devices. Samsung, SK Hynix, and others are already introducing CXL version 1 memory expansion devices. But in the future those devices don’t necessarily need to be inside the server anymore.

Imagine if your NVMe drives, memory expansion, or I/O controllers were not actually in the server anymore. Maybe they’re in another chassis in the rack. Now imagine that the CPU, system memory, I/O, and storage don’t even need to be in the same chassis. What is a server in this scenario? You can “explode” the server and build whatever you need. You can “compose” a server based on resources attached over a PCI express fabric based on CXL. This revolutionizes the entire shape of servers, and that’s what I’m excited about.

We’ve heard about composability and disaggregated servers for years. HP’s “The Machine” was proposed to fundamentally change the architecture of servers. And IBM and Nvidia have talked a lot about disaggregated servers. CXL promises to do what these companies wanted, which is to create a compostable server. The problem was that these previous efforts applied to a very limited set of hardware and software.

The brilliance of CXL is that it is an industry standard. Intel created CXL and shared it with the whole industry, the way that only somebody like Intel really can. Every company in the industry has signed on, and all of the competing ideas (except maybe NVLink) have been subsumed into CXL. CXL very much could be the equivalent of Bluetooth or USB or x86, except for disaggregated servers.

There are many technical challenges with this, however, and technical reasons that it hasn’t been done before. Although CXL is initially based on PCI express 3, it really isn’t going to reach its potential until we get to PCIe 5 or even PCIe 6. We need ultra high-performance and ultra low-latency for it to make sense to have memory on the PCI buss instead of on a specialized memory bus. It’s the same with storage and anything else needing low latency and high bandwidth. We also need to have PCI express not just point to point but be a switched networking fabric. In order to do that we need a whole new generation of PCI express chips, and that’s not really coming until PCIe 6. So there is some time still before this takes over.

Of course CXL could go the way of Thunderbolt and become a niche product for certain customers, but it does have some momentum. One type of customers is really pushing it forward. Hyperscalers have a fundamental problem: They want to have all the memory channels populated in order to get the most performance out of their CPUs but memory chips only come in certain sizes, and they are big are really expensive. So a hyperscaler can easily fill all four channels with 32GB of memory, but if they need more than 128 GB they have to move to larger DIMMs and it gets to be a pretty big portion of the server cost! Right they have to equip their systems with way too much memory in order to fill all the channels. With CXL they can right-size memory and save a lot of money.

Although hyperscalers are very excited about memory expansion using CXL, that’s not a transformative use case. In order for CXL to really take off we need a real fabric with all sorts of devices talking to each other and real composability.

Another thing to worry about is ownership. If two CPUs both want to talk to the same peripheral, how do we deal with that? The issue is inconsistency between CPU and peripheral memory caches. Cache coherency is the primary challenge of CXL: Making sure that devices all see the same data in the same place. There’s also a security aspect, because if you’re composing servers with the same RAM and two servers have different security domains there’s the possibility for leaks. The good news is that these things are specifically being addressed in the CXL spec.

The other issue is that not every server is ready for CXL. Intel and AMD CPUs are ready for the first generation of CXL (memory expansion) but it’s going to be a while before we get PCIe 6 and composable infrastructure. If that is delayed, for example if there was a mismatch between PCIe switching chips and CPUs, it could derail the whole CXL trend. The last question for me at VMware Explore 2022 was where VMware is when it comes to CXL. Maybe it’s just a group we don’t have access to yet, or something that they just haven’t talked about, but we need to have companies like VMware committed to make this happen. The tiered memory aspect of CXL could be really powerful in vSphere, as could the whole idea of disaggregated servers, so we hope this is something VMware is working on.

Gestalt IT is closely watching the development of CXL and will have lots more CXL-related content here in the future. Watch for our participation in the upcoming CXL Forum events and the launch of a season of the Utilizing Tech podcast focused on CXL, launching next month!

About the author

Stephen Foskett

Stephen Foskett is an active participant in the world of enterprise information technology, currently focusing on enterprise storage, server virtualization, networking, and cloud computing. He organizes the popular Tech Field Day event series for Gestalt IT and runs Foskett Services. A long-time voice in the storage industry, Stephen has authored numerous articles for industry publications, and is a popular presenter at industry events. He can be found online at TechFieldDay.com, blog.FoskettS.net, and on Twitter at @SFoskett.

Leave a Comment