All Events Featured Tech Field Day Events

Meeting the Storage Performance Requirements of AI and ML Workloads with WekaIO

Two of the most impressive technological advancements in recent years are Artificial Intelligence (AI) and Machine Learning (ML). But AI and ML workloads require vast amounts of storage and high-performance access from multiple servers. This is the ideal market for the WekaIO distributed storage solution, as demonstrated at Storage Field Day in February. It offers scale, performance, and compatibility with AI and ML workloads.

WekaIO introduces their distributed storage solution at Storage Field Day in February, 2019

Introducing Artificial Intelligence and Machine Learning

Artificial Intelligence (AI) is the term used when systems seem to exhibit intelligence, but it is quite different from the “natural intelligence” humans have. AI systems are able to perform tasks that emulate cognitive abilities like learning and problem solving but do it using algorithms that are entirely digital. Examples of AI in use today include medical programs able to make diagnoses faster than human doctors, programs able to play complicated games against top-level human players, self-driving vehicles, and the speech-recognition used by digital assistants.

Machine Learning (ML) is considered to be a subset of AI, and is required for many forms of AI to function. ML uses advanced algorithms and statistical modeling to sift through vast amounts of data to identify patterns. These systems can then analyze new data and make predictions or decisions without having received explicit instructions on how to perform the task.

Most AI and ML systems are extremely demanding in terms of computing requirements. The infrastructure needs to efficiently receive data, often from multiple sources, and typically requires a large amount of storage. AI and ML also often need this data to be accessible by a large number of compute servers, with high-bandwidth and low-latency connections to the data. In addition to high-performance access to the data, AI and ML work better with high-performance access to metadata. The algorithms use metadata to rapidly sort through data to determine which is relevant to the problem being solved.

To meet the vast capacity, extreme performance, and multiple points of access for the data used by AI and ML, many organizations are turning to parallel or distributed file systems. A distributed file system stores data across a network of multiple nodes. The file system scales easily (anywhere from 10 nodes to 10,000 or more), transparently, and provides transparent access to files so no server needs to know the physical location of any data.

What distinguishes distributed file systems from Network Attached Storage (NAS) is that the distributed systems allow access to files using the same interfaces and semantics used to access local storage. The operating system sees the file system as if it were local. High-performance distributed file systems provide access to data at speeds the server accesses data on internal storage. This can far out-perform a traditional NAS, making it a great fit for AI and ML deployments.

WekaIO for Distributed Storage

At Storage Field Day in February, WekaIO introduced their distributed storage solution. WekaIO offers its own proprietary distributed file system, and it is seeing great success in ML and AI environments. Having just come out of stealth in 2018, WekaIO is making a big splash with their impressive performance numbers.

Their file system, Weka, has been proven to perform even better than local disk storage. Weka accomplishes this by directly accessing the NVMe flash in storage nodes, bypassing the operating system’s kernel completely. Because of this, Weka doesn’t have any of the overhead inherent in storage concepts as disk cylinders. Another way Weka gets its extreme performance is striping metadata across all nodes in the system, ensuring that all metadata queries and access happen at extraordinarily high speeds.

Weka allows for +2 or +4 data protection using something similar to standard erasure coding, but with WekaIO’s own proprietary algorithms. Weka offers multi-protocol support allowing access by SMB, NFS, HDFS, and S3, in addition to Weka, allowing the whole system to ingest data from a variety of multiple sources.

Stephen’s Stance

With more use cases for Artificial Intelligence and Machine learning being developed, many organizations will be looking to build out infrastructure that meet both the capacity and performance needs for AI and ML. For many, a distributed file system will be just what they’re looking for. Learn more about WekaIO by viewing their presentation from Storage Field Day 18.

About the author

Stephen Foskett

Stephen Foskett is an active participant in the world of enterprise information technology, currently focusing on enterprise storage, server virtualization, networking, and cloud computing. He organizes the popular Tech Field Day event series for Gestalt IT and runs Foskett Services. A long-time voice in the storage industry, Stephen has authored numerous articles for industry publications, and is a popular presenter at industry events. He can be found online at TechFieldDay.com, blog.FoskettS.net, and on Twitter at @SFoskett.

Leave a Comment