The Benefits of Wide Striping — Avoiding A Long Tail

IOPS Per RAID Group, ordered by most to least

IOPS Per RAID Group, ordered by most to least

I took part in a podcast last night that discussed the XIV platform.   One of the “key features” of XIV is the wide striping of data across all spindles.   It’s a concept we’re seeing more and more in contemporary storage hardware architectures and one that’s being shoe-horned into older storage arrays too.   Have you ever wondered what the point is?   Take a look at the following graphic.   It shows the number of write operations per RAID group,  ordered by  the busiest RAID group to the least active.   It’s real data from a real system.   What you see is the Long Tail effect, where a small number of RAID groups are doing most of the I/O.   In this example, 80% of the workload is performed by 50% of the RAID groups; only 3 RAID groups account for 20% of the workload.

The chart shows that in  some array designs (typically the older Enterprise arrays), I/O distribution was not evenly balanced and so not all drives were being used to their full capacity.   This was mitigated by using tools to move LUNs or sub-LUNs around; alternatively concatenated devices like metas and LUSEs were employed to spread the load.

The only real solution to the I/O balancing problem is genuine wide striping.   Manual or even automated rebalancing, or the use of metas are just workarounds.   Once wide striping is in place, either more work can be performed or the number of spindles or their “quality” can be reduced, i.e. you can build a complete SATA array like XIV.

There are of course disadvantages to having your data more widely spread.   The most obvious is the increased risk of data loss when the RAID system fails — i.e. a double disk failure.   The wider the striping, the wider the impact.   The tradeoff is the benefit of increased performance.   You have to choose what level of risk/impact you consider acceptable versus the potential gains.

If you’re not doing wide striping today then you should seriously be considering it.   After all, you’re only harnessing performance capacity within the array that you’ve already paid for.