Modern AI infrastructure has exposed the importance of reliability and predictability of storage in addition to performance. This episode of Utilizing Tech, presented by Solidigm, features Kelley Osburn of Graid Technology discussing the challenges of maximizing performance and resiliency of storage for AI with Jeniece Wnorowski and Stephen Foskett. AI servers are optimized for machine learning processing, and Graid Technology SupremeRAID offloads processing to GPUs similarly to the way these massively-parallel processors offload ML processing. They also have a peer-to-peer DMA feature to direct the data directly to the processor rather than forcing all data to pass through a single processor or channel. There is a need for RAID software at many spots in the data pipeline, from ingestion and preparation to processing and consolidation, and each requires performance and availability. There are many applications that require maximum performance and capacity without impacting the host CPU, including military, medical research and diagnostics, and financial, in addition to AI processing.
Apple Podcasts | Spotify | Overcast | More Audio Links | UtilizingTech.com
Bypassing Performance Bottlenecks with Graid Technology
A debate is shaping out over whether GPU’s growing popularity is stealing the thunder from other AI infrastructure components.
Discussions around GPUs and specialized accelerators have dominated the topic of AI in the last few months. But the AI infrastructure is not a one-trick pony. Storage solutions that underpin the system play a big part in enabling the GPUs. Poor storage can hold back the performance, while great storage can set it free.
Season 7 of Utilizing Tech, presented by Solidigm, puts storage back in the map. In this episode, co-hosts, Stephen Foskett and Jeniece Wnorowski, meet with Kelley Osburn, sr. director of OEM/channel business development at Graid Technology to explore the role of data storage in AI. The conversation focuses on how AI enterprises can unlock breakthrough performances in solid-state drives.
“More and more customers are buying expensive GPUs. These servers are going to be $400,000 to $500,000 machines by the time they’re finished being built. They’re very very data hungry and one of the biggest issues is keeping them fully utilized,” Osburn notes.
One of the ways to do it is to put in place a storage system that can compete with the performance of the GPU.
QLC SSD makes the fittest contender for this. Out-of-the-box, these devices offer great speeds and capacity. But extracting all the promised performance is another worry for enterprises.
“AI is not an event. It’s a workflow,” he comments.
AI processes have varying read and write intensities. “High speed access to data both read and write, smooths out those workflows and makes them much more efficient.”
Also read Graid’s blog post, “Podcast: Utilizing AI Infrastructure Series: Accelerating Storage Infrastructure using GPUs with Graid Technology and Solidigm“
A Force Multiplier for SSDs
Graid Technology has put out a groundbreaking software-defined RAID solution that can hoover out SSD performance without hogging CPU cycles.
SupremeRAID has a secret. RAID algorithms, when run on CPUs, consume a lot of resources leaving very little for the applications. Bypassing that, SupremeRAID leverages GPU to perform RAID parity computations.
“SupremeRAID is a GPU-accelerated RAID stack. We dedicated GPU to do all of the infrastructure processing for RAID calculations for parity,” Osburn says.
RAID levels go from 0 through 6 and above, with RAID 0 offering the best performance. But there’s a caveat.
“If I said I’m going to do RAID 0 stripe across all the drives and then do read, I’ll get very high read performance. Unfortunately, I’m not going to have any data protection,” he points out.
In a dense server environment that constitutes tens of NVMe drives, data protection can create significant bottleneck, making RAID 0’s theoretical performance elusive.
Traditional hardware RAID controllers do not fare well in these environments. All the PCIe lanes with the installed NVMe drives feed into one card. It is similar to a multi-lane freeway feeding through two toll interchanges. A congestion is inevitable.
There are a couple different ways SupremeRAID tackles bottlenecks. Offloading the CPU is one of them. Conjoined with this is a peer-to-peer DMA (data memory access) technology that allows data to move directly from the drives to the applications across PCIe. This feature acts like a traffic cop directing the traffic through a feeder lane to avoid the holdup.
SupremeRAID is deployed on NVIDIA GPU as a CUDA-based application. CUDA cores are extremely dense and can perform mathematical calculations dramatically faster than CPUs.
There is zero installation complexity when it comes to connecting SSDs to the RAID card. The NVMe drives attach directly to the motherboard without any cabling.
“We want to see the drives plugged right into the motherboard so we can see them across the PCIe root complex and communicate and control without forcing data through our card.” This eliminates performance bottlenecks.
SupremeRAID outperforms both software and hardware RAID technologies delivering close to RAID 0 performance along with data protection. However, Osburn points out that it is not a solution for lightweight servers with 4 drives or less.
“Where the Graid solution really fits is with much higher densities of NVMe in a single machine,” he says. “Our market is to focus on machines that have more than four NVMe drives where you have to start figuring out how to deliver the maximum performance from those drives to the application.”
Graid’s current customers are the military that require high-performance, low-power computing for edge deployments, research hospitals that generate huge amounts of data that need to be written quickly, the medical device space where patients are catered daily with a limited number of equipment, and credit card companies that use high-performance databases.
A Best-of-Breed Solution Engineered for Top-Tier Use Cases
Osburn offered a glimpse into some of the real-world applications. One of them is in broadcasting. Massive video files are stored and worked on regularly in media and entertainment houses.
The combination of SupremeRAID and Solidigm’s QLC SSDs is unbeatable for editing and VFX works. But taking it a notch higher, Graid Technology, Solidigm and Cheetah RAID Storage, combined their innovations to design a breakthrough storage solution for the creatives.
Cheetah RAID provided a server with removable cartridges that makes it easy to pull out drives as they fill up. The drives can then be ship off for post-processing to a different location, while new cartridges can be plugged in to continue the work.
The Solidigm P5336 QLC SSDs brought incredible density and raw performance, and Graid’s solution lent RAID protection and high-performance read and write access for the data.
Tuxera Fusion File Share provided the finishing touch. Fusion File Share allows multiple users to work on the same digital files concurrently at high performance.
“This solved the problem of having to wait for data to scroll forward. You can move back and forth very very quickly and have multiple people attacking that data at the same time instead of having to do it sequentially. That saves time.”
Check out Graidtech.com to learn more about SupremeRAID. Both Graid Technology and Solidigm will be participating in the upcoming Flash Memory Summit 2024 in Santa Clara, and Supercomputing Conference 2024 in Atlanta. Be sure to catch their sessions at the events, and for more such conversations around AI, keep listening to the Utilizing Tech Podcast.
Podcast Information:
Stephen Foskett is the Organizer of the Tech Field Day Event Series President of the Tech Field Day Business Unit, now part of The Futurum Group. Connect with Stephen on LinkedIn or on X/Twitter and read more on the Gestalt IT website.
Jeniece Wnorowski is the Datacenter Product Marketing Manager at Solidigm. You can connect with Jeniece on LinkedIn and learn more about Solidigm and their AI efforts on their dedicated AI landing page or watch their AI Field Day presentations from the recent event.
Kelley Osburn is the Senior Director of OEM and Channel Business Development at Graid Technology. You can connect with Kelley on LinkedIn and learn more about Graid Technology on their website.
Learn More about Graid Technology
- About Graid Technology
- Empowering Performance & Reliability with Supermicro and SupremeRAID
- SupremeRAID™ SR-1010
Thank you for listening to Utilizing Tech with Season 7 focusing on AI Data Infrastructure. If you enjoyed this discussion, please subscribe in your favorite podcast application and consider leaving us a rating and a nice review on Apple Podcasts or Spotify. This podcast was brought to you by Solidigm and by Tech Field Day, now part of The Futurum Group. For show notes and more episodes, head to our dedicated Utilizing Tech Website or find us on X/Twitter and Mastodon at Utilizing Tech.