All Utilizing Tech

Connecting Ceph Storage to AI with Clyso | Utilizing Tech 07×06

Many of the largest-scale data storage environments use Ceph, an open source storage system, and are now connecting this to AI. This episode of Utilizing Tech, sponsored by Solidigm, features Dan van der Ster, CTO of Clyso, discussing Ceph for AI Data with Jeniece Wnorowski and Stephen Foskett. Ceph began in research and education but today is widely used as well in finance, entertainment, and commerce. All of these use cases require massive scalability and extreme reliability despite using commodity storage components, but Ceph is increasingly able to deliver high performance as well. AI workloads require scalable metadata performance as well, which is an area that Ceph developers are making great strides. The software has also proved itself adaptable to advanced hardware, including today’s large NVMe SSDs. As data infrastructure development has expanded from academia to HPC to the cloud and now AI, it’s important to see how the community is embracing and improving the software that underpins today’s compute stack.

Apple Podcasts | Spotify | Overcast | More Audio Links | UtilizingTech.com


A Cloud-Like Scalable Storage for On-Premises, with Clyso

Companies moved fast to adopt AI when nearly two years ago ChatGPT entered the market snapping up its first hundred million users in under two months. But AI’s hunger for more is keeping companies from deploying the technology at the same pace.

It’s clear that AI is the tool to capture mindshare and market share in a highly competitive landscape. But businesses looking to take advantage of its potential benefits are faced with the pressure of continually of expanding and upgrading their infrastructure.

AI models rely on massive volumes of data to do their job, and their demand for storage seems to be growing elastically. In the coming years, LLMs will consume tremendous amounts of data as the models grow and advance. The danger of that is, it will become woefully expensive to preserve this prospering body of data.

Not a Simple Math

The concept of horizontal scaling goes back decades in storage. Need more capacity? Add more servers. But the same trick does not apply for AI.

“When you’re building a large-scale storage system, you don’t want to have to lift and shift and migrate data from one appliance to another every four or five years,” says Dan van der Ster, CTO of Clyso. “You want something like an organic storage system.”

In this episode of Utilizing AI Podcast brought to you by Solidigm, hosts Stephen Foskett and Jeniece Wnorowski, chat with Dan about Ceph, a groundbreaking open-source storage solution that has entered the AI zeitgeist with a flourish.

The top selling points of a data storage solution for AI are scalability and reliability. But unfortunately, solutions that deliver these also tend to be expensive. Ceph is creating a new benchmark for AI data storage. It is not only supremely adaptive to AI’s unpredictable demand for capacity, but is also surprisingly affordable.

Although not wildly popular like some proprietary solutions, Ceph has been around for many years, and has the approval of countless users that have deployed it with high confidence in demanding HPC environments. Clyso, a premium partner of Ceph, provides support, integration and development services to companies looking to introduce Ceph in large environments.

An Intelligent Approach

Ceph is a singular storage solution that chains together a combination of storage interfaces – a low-level object storage, a block storage for cloud platforms, and it can also be deployed as a normal POSIX file system.

“Because it’s so flexible, it’s already quite massively used in cloud environments for object storage, especially in self-hosted, hybrid cloud and multi-cloud environments,” says Dan.

According to a recent demo, a team at Clyso achieved a record-breaking milestone with a Ceph cluster. Through intense tuning and optimization, the cluster reached a performance of 1 TB per second.

The shocking, and perhaps the most remarkable thing is that the entire experiment was run on low-cost commodity hardware.

“There’s a lot of smarts internally to enable that to happen at high performance speeds,” Dan highlight.

Ceph’s tremendous performance numbers have earned enthusiastic reviews from many, and attracted AI companies in large numbers.

“AI use case is certainly the hot topic with Ceph these days,” Dan says.

The performance and scalability all boil down to Ceph’s fully distributed architecture. The fundamental principle Ceph is built on is “distributed everything”. That is similar to the concept on which AI clusters are designed, to avoid having single points of failure that are the key reasons of bottleneck.

Ceph is built to support massively parallel access patterns which is a highly metadata-intensive task.

“It’s very easy to make a file system or a storage system which deals with large objects, and you just stream everything in parallel. In HPC, we call this embarrassingly parallel.”

But when there are too many clients accessing the same files in the same directories at the same time which is typical of AI, there is massive contention.

Ceph functions on the strict principle that no client should have an outdated view of a file.

The Ceph storage cluster is based on Reliable Autonomic Distributed Object Store or RADOS which is its foundation for providing data to clients.

A Ceph cluster primarily consists of intelligent, self-healing storage nodes. These nodes replicate and redistribute data dynamically among themselves. Each node is powered by off-the-shelf commodity hardware. Additionally, the cluster requires Ceph daemons – monitors, managers, OSDs and metadata servers – to function.

Here is how these components work under the hood. The infrastructure that supports the data gets split up into numerous machines and data is distributed across those machines. This is unlike the 1:1 mapping technique seen in centralized architecture. The Ceph storage manager is installed on multiple machines from where it orchestrates the systems to work together as a single unit.

Data is stored as objects within logical storage pools, and thousands of clients have access to exabytes of data simultaneously.

“Ceph has about 10,000 tuning options.  So you can really manipulate anything about the sector sizes, the block sizes, how data is chopped, diced and sliced and distributed across the cluster, in a way that optimizes the usage of the underlying devices.”

Integral to Ceph’s peak performance is the solid-state technology. Dan told us that a lot of Ceph users rely on high-density SSDs like Solidigm’s for their use cases.

An ongoing project called Project Crimson which is a major rewrite of the low-level storage daemons focuses on extracting performance out of large NVMe systems.

Support and Integration

Dan says, “The beauty of Ceph is that it can do whatever you need and can work in that use case.”

But to get that kind of flexibility of performance from ordinary hardware, organizations need creative solutions. That’s where a company like Clyso comes in. Clyso works through a group of technology veterans whose mission is to expand the performance limits of commodity hardware with open-source solutions.

Clyso brings guided walkthroughs and assistance required to achieve that technical feat with Ceph.

“Ceph may not always work optimally out-of-the-box, but with little insight and expertise and guidance, you can really get that. That’s how you achieve results like 1 terabyte-per-second. It’s paying attention to all those stacks.”

Ceph’s early adopters were academic institutions like universities, research centers and labs that typically deploy open-source products in their environments. Despite being identified as an HPC file system in its early years, Ceph evolved rapidly, absorbing new use cases along the way.

Today, it is ubiquitous, no matter the industry. It is predominantly deployed in supercomputer centers doing advanced biotech research, in finance, for high-frequency trading, and in entertainment.

Clyso is a platinum member of the Ceph Foundation to which it actively contributes. Built under the Linux Foundation umbrella, the Ceph foundation exists to welcome industry partners to collaborate and be a part of the initiative.

Head over to Ceph.io for more collateral on the technology. Also check out Clyso’s website to learn all about how they are helping organizations get optimal performance and breakthrough economics in Ceph clusters. Keep listening to the Utilizing AI Podcast at  Gestalt IT or on the podcast platform of your choice for more such discussions on AI data infrastructure with Solidigm.

Podcast Information:

Stephen Foskett is the Organizer of the Tech Field Day Event Series President of the Tech Field Day Business Unit, now part of The Futurum Group. Connect with Stephen on LinkedIn or on X/Twitter and read more on the Gestalt IT website.

Jeniece Wnorowski is the Datacenter Product Marketing Manager at Solidigm. You can connect with Jeniece on LinkedIn and learn more about Solidigm and their AI efforts on their dedicated AI landing page or watch their AI Field Day presentations from the recent event.

Dan van der Ster is the CTO at Clyso and Ceph Executive Council member. You can connect with Dan on LinkedIn or on X/Twitter. Learn more about Clyso on their website. Learn more about Ceph on their website.

Learn about the Future of Storage from Ceph


Thank you for listening to Utilizing Tech with Season 7 focusing on AI Data Infrastructure. If you enjoyed this discussion, please subscribe in your favorite podcast application and consider leaving us a rating and a nice review on Apple Podcasts or Spotify. This podcast was brought to you by Solidigm and by Tech Field Day, now part of The Futurum Group. For show notes and more episodes, head to our dedicated Utilizing Tech Website or find us on X/Twitter and Mastodon at Utilizing Tech.

About the author

Sulagna Saha

Sulagna Saha is a writer at Gestalt IT where she covers all the latest in enterprise IT. She has written widely on miscellaneous topics. On gestaltit.com she writes about the hottest technologies in Cloud, AI, Security and sundry.

A writer by day and reader by night, Sulagna can be found busy with a book or browsing through a bookstore in her free time. She also likes cooking fancy things on leisurely weekends. Traveling and movies are other things high on her list of passions. Sulagna works out of the Gestalt IT office in Hudson, Ohio.

Leave a Comment