All Exclusives

Pure Storage Unveils Platform Optimized for AI Workloads

Pure Storage this week announced that this summer it will make available a data storage platform to its portfolio that is specifically designed for artificial intelligence (AI) applications.

The FlashBlade//EXA data storage platform is designed to support the high concurrency requirements of AI applications in a way that promises to deliver more than 10TB per second read performance in a single namespace, says Chadd Kenney, vice president of technology at Pure Storage.

At the core of the next-generation disaggregated storage platform is an NVMe-based system that makes use of a key/value database to optimize the storage of metadata that is then used to direct requests to the specific node where the actual physical data needed is located, notes Chenney. “It’s a high-performance metadata engine,” he adds.

In contrast, existing storage platforms are not designed to support the level of concurrency required for reads and writes by AI workloads running on graphical processing units (GPUs) unless organizations opted to deploy a parallel storage system versus continuing to rely on a familiar NFS protocol, says Chenney.

The more efficient that storage platform is in terms of load balancing, the less time it takes to move data, which in turn increases GPU utilization rates, he adds.

Longer term, Pure Storage is also working toward making its Fusion software available on the FlashBlade//EXA data storage platform to make it simpler to manage a common pool of storage resources, noted Chenney.

It’s not clear to what degree legacy storage systems are holding up both the training and deployment of AI models in production environments. However, organizations should not need to deploy two separate storage systems to support both use cases, Chenney says.

The FlashBlade//EXA data storage platform reduces the total cost of AI storage by providing a platform that can be used to train models faster while ensuring the overall level of throughput required by AI inference engines running in a production environment, he adds.

Responsibility for deciding which data storage platform to employ, however, will vary widely. In some cases, there is a dedicated team managing IT infrastructure on behalf of a data science team. In other instances, that infrastructure is being managed by a centralized IT team alongside existing IT systems.

Regardless of approach, the one thing that is certain is organizations will need to allocate additional budget to acquire platforms that run AI workloads optimally. There are already multiple options to choose from but given the costs, choices in terms of which AI projects to prioritize will inevitably need to be made.

Hopefully, the cost of AI infrastructure will steadily decline over time, but for the time being, improving utilization rates will remain a high priority at a time when the cost of the platforms required to run AI workloads remains at a premium. The challenge and the opportunity now is for storage administrators and data engineers to find a way to optimize the usage of that infrastructure to the full extent possible.

About the author

Mike Vizard

Leave a Comment