So, here’s my rash statement from Twitter last night: “If FAST isn’t free, I don’t want it! All it’s doing is automating process I could script/do manually”. It’s a bold statement, I know, so is FAST really offering something better than what could be achieved today using EMC’s Symmetrix Optimizer?
EMC’s Symmetrix architecture (18 years old and counting, I believe) uses the concept of disk hypers to present LUNs. Each physical disk is carved into a number of slices, which are then recombined to create LUNs to present to a host. A mirrored (RAID-1) LUN uses two hypers, a RAID-5 (3+1) LUN uses 4. EMC ensure general performance by setting standards on how LUNs are created from hypers and that’s reflected in a “binfile” layout. However despite this sensible planning, it is possible (especially as hard drives are now much larger and contain many more hypers) that two hypers on a single physical disk could be highly active and so contend against each other — in other words “hot spots” on disk.
Optimizer helps alleviate the issue of hot spots by exchanging the high I/O hypers with low I/O ones, distributing busy LUNs across more physical spindles. This is classic load balancing where resources are distributed across the available infrastructure in order to obtain better overall generic performance. EMC have now rebranded Optimizer as part of Ionix for Storage Resource Managment, but it’s still effectively the same product. Hyper swaps can be managed automatically, based on historical performance data. They can also be user-defined — a manual swap at the users request.
Although tedious (and not as well automated as Hitachi’s HiCommand Tiered Storage Manager), in theory Optimizer could be used to manually move workload between storage tiers. In fact, Optimizer is already aware of a tiered storage infrastructure. Here’s a quote directly from the ControlCenter 6.1 manual:
“Optimizer is also aware of physical drives that operate at different speeds, as well as location of the data on the physical media, which influences the I/O rate. This information is used when determining which logical devices to move.”
So with a little bit of knowledge on the layout of data on a Symmetrix array, it would be possible today to use Optimizer to perform LUN-based FAST.
Load-Balancing Versus Policy
Unfortunately, simple load-balancing of I/O across a storage array doesn’t offer what should be seen as the next generation of storage tiering. Where Storage Tiering 1.0 was about offering multiple layers of storage within the same physical infrastructure and manually placing or moving LUNs to the appropriate tier, Storage Tiering 2.0 will be about establishing policies that determine more service-based measurements of the performance and availability customers receive.
A policy-based approach would allow rules to be established on how data at the application layer moves between tiers. This is a critical distinction from the load-balancing methodology earlier described. As an example, where an application was known to require higher performance at a certain time of day or day of the week, data could be moved proactively to a faster tier of storage, returning later once the high I/O workload had completed. Whilst achievable using Optimizer, there’s no doubt the process of application migration would be tedious and time consuming. I expect the v1.0 implementation of FAST will simply package up Optimizer into a tool that automates the migration of related data between tiers. Don’t forget, other vendors have been offering this feature for some time — for example Hitachi and Tiered Storage Manager.
Now LUN-based migration has its benefits. Where large numbers of disks exist in an infrastructure, application data can be placed or moved to the most appropriate location as required. However with the introduction of solid state disks (SSDs), a more granular approach is needed as the number of SSDs deployed in an array is likely to be low due to their excessive cost. Moving an entire application (or even LUN) to SSD will be undesirable unless that application can take full use of the SSD hardware. There are very few, if any, applications that require high-intensity read/write activity from every piece of application data all the time.
Block-level tiering offers a higher level of granularity to the placement of data. A LUN can be split into blocks and placed across multiple layers of storage technology including traditional HDDs and faster SSDs. Selective placement will ensure the more efficient use of expensive SSD media by placing only the highly active data onto it.
All of a sudden with increased granularity we’re back to Storage Tiering 1.0 where data is being placed on faster technology purely based on increasing overall system performance. This is a feature Compellent have been offering for some time. Data is migrated up or down the tier hierarchy on a daily basis, subject to performance figures over a 12-day period. This level of granular performance management is possible because data is stored in a block-based structure. Unfortunately for EMC, the hyper design legacy represents a technical challenge in making FAST version 2 a reality.
As just mentioned, Compellent already offer block-based data migration in their products. At a recent dinner in London with the Compellent team, they highlighted their strong position in the market, protected by patents covering block-level data migration between tiers. You can find the filed patent here. Compellent use the term “Data Progression” to describe how blocks are moved between tiers based on I/O activity. As I/O activity is monitored over time, it is possible to determine the most appropriate tier of storage to use when expanding capacity. Typically these are lower tier SATA drives, as initial performance requirements are usually over-estimated. This metholodogy is very much Storage Tiering 1.0 discussed earlier.
Compellent aren’t the only people claiming rights to block-level tiering within a storage array. I’ve also found the following patent application from IBM, filed by Barry Whyte, Steve Legg and others. If IBM and Compellent both claim to have invented the FAST concept, how does that position EMC? Do they have an earlier patent which trumps these two?
Storage Tiering 1.0 provides performance management of storage arrays. Storage Tiering 2.0 extends this to offer policy-driven optimisation offerings. Both of these technologies are available today from existing vendors in one format or another. EMC will simply be playing catchup with these vendors once FAST 1 & FAST 2 are released. I’d like to be surprised and see EMC offer something the competition currently don’t. I’m not holding my breath…