Unstructured Data at Scale is Hard
Data is everywhere, and it’s no longer files or emails. It’s a seemingly infinite number of data types and objects, each related to specific use cases. Whether we’re referring to the pictures we’ve uploaded on Instagram, medical records, or telemetry data from industrial IoT sensors or even from cars, planes, and other means of transportation, unstructured data is all around us.
Objects grew from millions to hundreds of billions and counting, raising storage capacity demands from terabytes into petabytes. Before this “big bang” of unstructured data, organizations would maintain separate storage silos. There were multiple reasons: different unstructured data types would have different performance requirements, some teams preferred a separate platform for file storage, others for object storage.
Modern applications relying on unstructured data need stability, performance and rich media services
The exponential capacity demand increase makes it impossible to continue treating unstructured data as separate silos based on data types or performance profiles. Operational complexity is all over the place. Data segmentation and isolation prevent the use of data efficiency mechanisms, which leads to capacity waste. The networking backend isn’t designed to operate at this scale, being primarily designed for physical-only connectivity, and requiring unhealthy amounts of cabling and configuration. All of this also negatively impacts environmental requirements such as power consumption, data center space, cooling, and so on.
To make a long story short, unstructured data at scale is unsustainable and makes costs skyrocket unnecessarily, both from a CapEx and OpEx perspective.
Meeting Unstructured Data Needs
The very first requirement to meet unstructured data needs is a storage platform that converges file and object storage with high-performance characteristics, which led to the creation of a new storage category named Unified Fast File and Object (UFFO).
To meet the needs of unstructured data at scale, a UFFO platform should deliver:
- Massive scalability
- Be designed for performance
- Provide consistent performance capabilities that match all use cases
- Deliver enterprise-grade features such as:
- Multiple replication options
- Advanced Disaster Recovery capabilities
The UFFO platform should not only provide massive scalability without disruption, but also provide a consistent degree of performance regardless of the workloads using the storage backend.
The Embodiment of UFFO: Pure Storage FlashBlade
Pure Storage understood very early the rise of unstructured data and launched their own UFFO system, FlashBlade, as early as 2016. Since then, the company has generated over $1 billion in revenue from the FlashBlade alone. FlashBlade is an extremely interesting UFFO platform with unique characteristics: it delivers across all of the unstructured data requirements and concepts.
From a design perspective, the FlashBlade consists of a 4U enclosure that comprises up to 15 storage blades. Each blade can have a capacity of either 17 TB or 52 TB for up to 1 PB usable capacity per chassis, based on a 2:1 compression ratio. A FlashBlade system can start from 7 blades in a single enclosure and scale to a total of 10 enclosures and 150 blades. Internally, the blades are connected by up to 16x 100 Gb/s Ethernet ports within each FlashBlade system, and External Fabric Modules allow the system to scale to up to 150 blades.
Performance achievements are made possible by the unique capabilities of the FlashBlade architecture. The solution “cuts the middleman” by eliminating the various bottlenecks created by legacy protocols usually stacked one over the other. The system accesses flash cells directly by eliminating the Flash Translation Layer and handling flash cell management and garbage collection globally, which improves flash durability and reliability. The solution implements a distributed transaction database, a variable-block metadata engine and supports native object storage.
The platform delivers enterprise-grade data services such as file and object replication with low data overhead, which enable disaster recovery capabilities (such as one-click DR) and cross-site replication scenarios. Data efficiency is built into the platform with thin provisioning, compression, fast copy capabilities, and global erasure coding. All of this is delivered in a way that ensures it is easy for administrators to use effectively, reducing the likelihood of wasted features.
Management of the FlashBlade system is done through Pure1, Pure Storage’s unified management platform. Pure1 delivers advanced management capabilities coupled with proactive alerting, predictive analytics, and capacity modeling features. These are provided by Pure1 Meta, an AI/ML engine. Pure1 supports all of Pure Storage’s product lines, providing a consistent experience regardless of the storage platform used.
Modern applications based on cloud-native concepts and the continued implementation of advanced, telemetry-based technology to everyday objects are two among the many trends that are fueling the growth of unstructured data. Considering the staggering scale, managing unstructured data as silos is no longer a sustainable position.
A new approach that allows file and object storage to be converged and managed at scale is needed. And this approach should not allow any compromises on performance either; it needs to be consistent regardless of the workload profile. Unified Fast File and Object Storage arrays are the answer to these challenges.
Organizations managing massive amounts of unstructured data require paramount stability, consistent performance, massive scalability, and best-in-class support.
Pure Storage have demonstrated their ability to deliver in the following areas:
- First, they were able to envision this emerging industry trend in time and build a resilient storage platform built to address unstructured storage requirements, including performance and scalability.
- Secondly, they tapped into their experience and intellectual property in delivering enterprise-class data services and applied it to the FlashBlade platform while taking into consideration the characteristics of unstructured data.
- Finally, they continue to deliver an outstanding and consistent user experience to their customers, thanks to the advanced management services provided by their Pure1 platform, which delivers proactive alerts and predictive analytics.
All of this makes Pure Storage the premier choice when it comes to selecting the best UFFO platform.