I Know It When I See It
The term hyperscaler is a little difficult to come up with a firm definition. There’s no doubt there is an implicit understanding of companies that are hyperscalers. Depending on who you ask, you’ll hear Google, Amazon, Alibaba, or Facebook rattled off as possible examples. I believe Supreme Court Justice Potter Stewart aptly summed up the difficulty in nailing down the term:
I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description, and perhaps I could never succeed in intelligibly doing so. But I know it when I see it…
Hyperscale Thinks Different
What sets off hyperscalers from even a very large enterprise deployment? Well, the scale of these customers change many of the fundamental considerations that are a given for an enterprise. Hyperscalers have the scale to dictate terms to component makers. The idea of them buying a traditional storage array from Dell EMC or NetApp isn’t effective on any number of levels. When traditional enterprises are already starting to adopt software-defined storage to scale more effectively, it’s not surprising to see hyperscalers be at the forefront.
SNIA estimates that virtually half of all storage bytes shipped go to hyperscalers. At this volume, the priorities of storage have to be readjusted. Generally when discussing storage, a lot of enterprise storage providers immediately start talking about IOPS. While surely a baseline of performance is important at scale, hyperscalers often are more concerned with latency as a bottleneck than strict read and write operations.
Tall Tail Latency
SNIA’s Technical Council co-chair Mark Carlson recently presented one one particular latency phenomenon seen in a recently hyperscaler study, so called “tail latency”. In a study of over 400,000 drives over 87 days, researchers saw that between 0.2% and 0.6% of the time, a given IO response would be between 2-10x slower than normal in a given RAID group. On a smaller scale, this might be easy to overlook. But in a massive scale-out environment, this was effectively causing 1% of customer accesses to either slow down or fail.
These tail latencies are often the result of background processes and garbage collection happening on an individual disk, which is why the tail latency becomes more pronounced on SSDs than spinning metal. This is one of the reasons SNIA is now seeing increased engagement with hyperscalers on storage standards. These processes are generally needed on smaller scales to prevent data corruption, but at scale that becomes a secondary consideration to latency.
Fail Faster
One possible remediation for tail latency is SNIA’s recently approved DePop standard. This would apply an IO tag per operation, indicating that the IO is part of a stripe within a RAID group. Since hyperscalers have their own designed software-defined storage architecture, the tag would allow the IO to fail faster programmatically. If a given drive is running slow, programmers could isolate the slower media area, whether it’s a particular platter section or the platter as a whole. The slight reduction in capacity is easily justified in exchange for less latency.
DePop will need to be enabled by drive vendors, but given the purchasing power of hyperscalers, this would seem to be more a formality than a negotiation. While hyperscalers design their data centers around the routine of component failure, getting more use out of a disk before that failure will almost certainly be welcome by the players in the space, since it can all be dictated in software.
What Comes After Hyperscale?
With SNIA and the major hyperscalers working together on storage standards, the benefits will hopefully trickle down through large enterprises. As more shops turn to SDS solutions, these standards will probably take on greater significance with IT pros. It’ll be interesting to see how traditional enterprise storage companies respond to this pressure.
Even though they are large purchasers of storage, they still don’t have the single market pressure that the hyperscalers can command, often paying more per drive from component vendors. Many enterprises right now don’t have the resources to roll their own SDS solution like hyperscalers, but as DevOps eats the enterprise and SDS services become more robust and viable, traditional storage companies seem destined to feel even more pressure.