All Exclusives

Cerebras Leading the Big Chip Race with Second Generation Wafer-Scale Engine

With computing problems becoming more abstruse and nuanced as AI takes over the picture, datacenter operators are feeling a vacuum building. Only a new kind of chip designed for processing overly complex workloads can fill up that space.

Cerebras Systems, a 2016 startup focused on AI acceleration addressed this need for a new class of chips. Three years ago, Cerebras designed a special-purpose AI chip that is packed with power to take AI capabilities to the next level. Its biggest point of difference – its super-size.

In a time of increasing miniaturization, Cerebras Systems took a giant leap in the opposite direction by designing a titanic chip built on TSMC 7nm technology. It is officially the largest chip ever built. At the recent AI Field Day event, Cerebras presented the Wafer-Scale Engine 2 (WSE- 2) which it unveiled last year. In a quick session delivered jointly with DDN, Cerebras showed the audience how WSE-2 is poised to handle the subtleties and nuances of AI computing of the most cutting-edge type overcoming the limitations of conventional microchips.

The Size War

Companies, both industry leaders and fledging outfits are competing with each other to deliver what is the smallest chip in the market. The rate at which silicon designs have shrunken in size as a result of this competition is pretty amazing. As of 2022, the smallest microchip is a 2nm process that is about half the size of a rice grain, sitting on a wafer smaller than your fingernail.

But companies sitting on a glut of chips of dwindling dimensions are now starting to see that the shrinking process size is really a double-edged sword. On one hand, it allows more chips to fit in less and less silicon space, meaning more chips are now etched onto a single wafer. But on the other hand, it has caused much bigger worries for datacenter operators using them for AI and high-performance computing.

With computer networks growing at impossibly fast speeds, companies are left wrestling with the ensuing complexity. “The growth of the largest networks is exponential – 12 times quicker than Moore’s law. That’s an absolutely terrifying growth,” explained Rebecca Lewington, Technology Evangelist at Cerebras Systems Inc. “The bigger these things are, the richer they can be. The more insights they can show, the more accurate they can be – the better they can perform downstream tasks.”

AI workloads typically use the processing power of multiple processors distributed across systems. But in this situation, distributed compute itself becomes the limiting factor. When the microchips communicate with each other, the latency is of a few nanoseconds. But the further apart the chips are, the higher goes the latency. Especially, with off-chip communication where data has to flow through longer distances, the latency numbers are unacceptably high.

“There’s only so much you can do when you’ve got to deal with interconnects and memory bottlenecks across hundreds, even thousands of devices,” noted Lewington. This has made smaller size chips unreliable for certain types of workloads.

A Super-Chip for Supercomputing

Enter a new class of monolithic chips that are bigger in size and pack a lot more power. In that league, Cerebras has established itself a forerunner showing some of the industries’ biggest players how it’s done.

Although Cerebras’ is not the first chip to be on the bigger side – NVIDIA and many smaller outfits have been dabbling in it since long – it definitely is revolutionary in design. Cerebras recognized that it’s not just AI workloads that a bigger size chip will benefit. Even HPC workloads can enjoy the gains, because as different as they are, they have common needs. They both require ultra-high network and memory bandwidths, and linear performance scaling. So go big, they said.

The Second Generation Wafer-Scale Engine

The Wafer-Scale Engine 2 (WSE-2) lays the ideal ground for training large language models. The original WSE was a central processor purpose-built for Cerebras’ deep learning computer systems. Now in its second iteration, the WSE-2 is a monstrous chip occupying a full wafer that Cerebras likens to a dinner plate.

Based on a 7-nanometer process technology, and covering a massive 46,225 square millimeters, the WSE-2 is almost the same size as an iPad Pro, and unsurprisingly the world’s largest chip. But Wafer-Scale Engine 2 is more than that title. It has some very compelling numbers to boost.

Image: Cerebras WSE-2

WSE-2 packs 2.6 trillion transistors that are built into its square dimension. It has 850,000 cores, all of which are optimized for sparse linear algebra computing which is one of the key requirements of HPC and AI workloads.

WSE-2 has 40 GB on-chip memory and 20 PB/sec memory bandwidth. Cerebras says that with these numbers, a single chip can deliver cluster-scale performance.

How do these numbers translate in terms of performance? Incredibly well! Compared to the mammoth NVIDIA A100 which its makers call the “ultimate instrument for AI”, the Cerebras WSE-2 is 56 times bigger in size and has 123 times more cores.

WSE-2 has on-chip memory of 40 GB, which is a 1000 times more than the A100’s 40 MB memory. In memory and fabric bandwidths too, the WSE-2 is leagues ahead, with massive numbers of 20 PB/sec and 220 Petabits/sec respectively.

Cerebras packages the second-generation WSE into a complete system called CS-2, an AI infrastructure that delivers ground-breaking performance gains, and dials down the complexity of large-scale cluster deployment.

The company just released its newest supercomputer, Andromeda in November. A juggernaut, Andromeda houses 16 WSE-2 chips with an eye-popping 13.5 Million cores. By the combined power of the chips, the supercomputer delivers up to a breathtaking 1 Quintillion AI operations every second.

Wrapping Up

Cerebras busted the trend of miniaturization with a monstrous replacement which is not just several sizes bigger, but WSE-2 asserts its individuality with monstrous power which explains why its makers decided to build a chip its size. By linking chips, Cerebras has successfully stuffed it with all the computing resources required to power the largest neural networks. And instead of relying on several systems, the models can now draw all the power from a single device without breaks or latencies, thus solving the challenges of its atom-size counterparts.

For more information on WSE-2, be sure to check out the full presentation from the recent AI Field Day event, or head over to Cerebras’ website. For more exclusive stories like this one, keep reading here at Gestaltit.com.

About the author

Sulagna Saha

Sulagna Saha is a writer at Gestalt IT where she covers all the latest in enterprise IT. She has written widely on miscellaneous topics. On gestaltit.com she writes about the hottest technologies in Cloud, AI, Security and sundry.

A writer by day and reader by night, Sulagna can be found busy with a book or browsing through a bookstore in her free time. She also likes cooking fancy things on leisurely weekends. Traveling and movies are other things high on her list of passions. Sulagna works out of the Gestalt IT office in Hudson, Ohio.

Leave a Comment