VAST Data is one of the leading innovators in enterprise storage, winning customers and partnerships with its scale-out storage platform. But the company is emphasizing the “Data” part of their name with a new platform designed to support data, analytics, and AI. The VAST DataPlatform is designed to merge storage and data to enable AI-assisted computation. Let’s take a look at the core features of this new VAST DataPlatform and its potential impact on the data and AI landscape.
The Need for an AI-Focused Data Platform
The historic division between storage and data solutions continues to impact the practical impact of data analytics. Even the most advanced storage arrays largely ignore the content they hold and most use protocols that are incompatible with analytics engines. Similarly, data platforms like Snowflake and Databricks are not inherently designed to support the demands of AI. VAST Data recognized the limitations of existing platforms and set out to create a data computing solution that would address the unique requirements of deep learning and AI-driven applications. The objective was to develop a platform that effectively supports various data types, bridges the gap between transactional and analytical processing, and enables seamless integration of unstructured and structured data.
This is not the first time a storage company has tried to market a data-centric offering. In fact we have seen many companies try to market storage solutions for data and analytics use cases as well as industrial verticals like legal, medical, media and entertainment, and more. But many of these products were not truly designed for this market, did not support the APIs and protocols needed, and did not find adoption. Marketing was another major issue for past efforts, with companies still “talking storage” instead of using terms and concepts that a data or application customer could understand. Our initial conversations with VAST Data have been positive, with the company seeming to understand the challenge ahead of entering this new market and ready to invest to make it happen.
Learn more about VAST Data by watching their Tech Field Day presentations, especially this early overview of their original architecture, their support for AI data, and this overview of the VAST Data advantage.
The VAST DataPlatform Components
VAST DataPlatform comprises several core components that create a unified, scalable, and AI-focused data infrastructure:
- VAST DataStore: The VAST DataStore is a scalable storage architecture engineered for unstructured data. Unlike traditional storage systems, it eliminates storage tiering and also provides enterprise file storage and object storage interfaces. This feature allows for efficient management of vast amounts of unstructured data and serves as a robust foundation for AI model training.
- VAST DataBase: The new VAST DataBase introduces a semantic database layer natively integrated into the system. By combining the characteristics of a database, a data warehouse, and a data lake into one simple, distributed, and unified database management system, VAST is attempting to resolve the divide between real-time analytics and real-time data capture and cataloging.
- VAST DataEngine: The VAST DataEngine acts as a global function execution engine, supporting popular programming languages such as SQL and Python. It facilitates rapid data capture and fast queries at any scale, streamlining AI pipelines and enhancing the management of AI and ML models.
- VAST DataSpace: The VAST DataSpace provides a global namespace that allows seamless storage, retrieval, and processing of data across various environments, including on-premises data centers, edge environments, and leading public cloud platforms like AWS, Microsoft Azure, and Google Cloud.
One of the fundamental challenges in building an AI-focused data platform is developing a database that supports both transactional and analytical processing. VAST Data approached this challenge with a unique strategy. Transactions are initially performed in row form within the write buffer and then converted to a highly efficient columnar format in flash storage. This columnar format, with a size of just 32KB compared to the traditional 128MB in Parquet, enhances data processing efficiency, enabling real-time analytics.
Stephen’s Stance
The VAST DataPlatform is a very interesting new offering, blurring the lines between storage and data. By unifying storage, database, and virtualized compute engine services and supporting various data types, VAST Data has created the sort of platform that ought to be interesting to companies developing data and AI applications. The challenge with solutions like these has always been in communicating across this divide, from product development to sales, and engaging developers and data scientists with a new tool. Increased focus on marketing content tailored to these communities will be a sign that the company is set to succeed where others have failed.