The rise of AI and the importance of data to modern businesses has driven us too recognize that data matters, not storage. This episode of the Tech Field Day podcast focuses on AI data infrastructure and features Camberley Bates, Andy Banta, David Klee, and host Stephen Foskett, all of whom will be attending our AI Data Infrastructure Field Day this week. We’ve known for decades that storage solutions must provide the right access method for applications, not just performance, capacity, and reliability. Today’s enterprise storage solutions have specialized data services and interfaces to enable AI workloads, even as capacity has been driven beyond what we’ve seen in the past. Power and cooling is another critical element, since AI systems are optimized to make the most of expensive GPUs and accelerators. AI also requires extensive preparation and organization of data as well as traceability and records of metadata for compliance and reproducibility. Another question is interfaces, with modern storage turning to object stores or even vector database interfaces rather than traditional block and file. AI is driving a profound transformation of storage and data.
Infrastructure Beyond Storage
The rise of AI has fundamentally shifted the way we think about data infrastructure. Historically, storage was the primary focus, with businesses and IT professionals concerned about performance, capacity, and reliability. However, as AI becomes more integral to modern business operations, it’s clear that data infrastructure is about much more than just storage. The focus has shifted from simply storing data to managing, accessing, and utilizing it in ways that support AI workloads and other advanced applications.
One of the key realizations is that storage, in and of itself, is not the end goal. Data is what matters. Storage is merely a means to an end, a place to put data so that it can be accessed and used effectively. This shift in perspective has been driven by the increasing complexity of AI workloads, which require not just vast amounts of data but also the ability to access and process that data in real-time or near real-time. AI systems are highly dependent on the right data being available at the right time, and this has led to a rethinking of how data infrastructure is designed and implemented.
In the past, storage systems were often designed with a one-size-fits-all approach. Whether you were running a database, a data warehouse, or a simple file system, the storage system was largely the same. But AI has changed that. AI workloads are highly specialized, and they require storage systems that are equally specialized. For example, AI systems often need to access large datasets quickly, which means that traditional storage systems that rely on spinning disks or even slower SSDs may not be sufficient. Instead, AI systems are increasingly turning to high-performance storage solutions that can deliver the necessary bandwidth and low latency.
Moreover, AI workloads often require specialized data services that go beyond simple storage. These include things like data replication, data reduction, and cybersecurity features. AI systems also need to be able to classify and organize data in ways that make it easy to access and use. This is where metadata management becomes critical. AI systems need to be able to track not just the data itself but also the context in which that data was created and used. This is especially important for compliance and reproducibility, as AI systems are often used in regulated industries where traceability is a legal requirement.
Another important aspect of AI data infrastructure is the interface between the storage system and the AI system. Traditional storage systems often relied on block or file-based interfaces, but AI systems are increasingly turning to object storage or even more specialized interfaces like vector databases. These new interfaces are better suited to the needs of AI workloads, which often involve large, unstructured datasets that need to be accessed in non-linear ways.
Power and cooling are also critical considerations in AI data infrastructure. AI systems are highly resource-intensive, particularly when it comes to GPUs and other accelerators. These systems generate a lot of heat and consume a lot of power, which means that the data infrastructure supporting them needs to be optimized for energy efficiency. This has led to a shift away from traditional spinning disks, which consume a lot of power, and towards more energy-efficient storage solutions like SSDs and even tape for long-term storage.
The rise of AI has also blurred the lines between storage and memory. With the advent of technologies like CXL (Compute Express Link), the distinction between memory and storage is becoming less clear. AI systems often need to access data so quickly that traditional storage solutions are not fast enough. In these cases, data is often stored in memory, which offers much faster access times. However, memory is also more expensive and less persistent than traditional storage, which means that data infrastructure needs to be able to balance these competing demands.
In addition to the technical challenges, AI data infrastructure also needs to address the growing need for traceability and compliance. As AI systems are increasingly used to make decisions that impact people’s lives, whether in healthcare, finance, or other industries, there is a growing need to be able to trace how those decisions were made. This requires not just storing the data that was used to train the AI system but also keeping detailed records of how that data was processed and used. This is where metadata management becomes critical, as it allows organizations to track the entire lifecycle of the data used in their AI systems.
In conclusion, AI is driving a profound transformation in the way we think about data infrastructure. Storage is no longer just about performance, capacity, and reliability. It’s about managing data in ways that support the unique needs of AI workloads. This includes everything from specialized data services and interfaces to energy-efficient storage solutions and advanced metadata management. As AI continues to evolve, so too will the data infrastructure that supports it, and organizations that can adapt to these changes will be well-positioned to take advantage of the opportunities that AI presents.
Apple Podcasts | Spotify | Overcast | Amazon Music | YouTube Music | Audio
Learn more about AI Data Infrastructure Field Day 1 on the Tech Field Day website. Watch the event live on LinkedIn or on Techstrong TV.
Podcast Information:
Stephen Foskett is the Organizer of the Tech Field Day Event Series, now part of The Futurum Group. Connect with Stephen on LinkedIn or on X/Twitter.
Camberley Bates is the VP and Practice Lead at The Futurum Group. You can connect with Camberley on LinkedIn and her podcast Infrastructure Matters through The Futurum Group.
Andy Banta is a consultant at MagnitionIO and a storage expert promoting simplicity and economy. You can connect with Andy on X/Twitter or on LinkedIn. Learn more about Andy on his Substack.
David Klee is the Founder at Heraflux Technologies. You can connect with David on X/Twitter or on LinkedIn. Learn more about David on his personal website or about Heraflux Technologies on their website.
Thank you for listening to this episode of the Tech Field Day Podcast. If you enjoyed the discussion, please remember to subscribe on YouTube or your favorite podcast application so you don’t miss an episode and do give us a rating and a review. This podcast was brought to you by Tech Field Day, home of IT experts from across the enterprise, now part of The Futurum Group.