The Dark Ages
I wasn’t around for the transition from using tape storage as consumer to spinning disk. When I started computing, everything was platter based hard drives (except for the Apple IIe at school).
When flash-based SSDs first came on the market, I remember being fairly unimpressed. For one, it was insanely expensive. You get spoiled in the days when cost per GB on spinning disk seemed to get cut in half every year, then all of the sudden flash hit and everything cost four-figures! Added to this was a lack of OS and application optimization, as well as initial batches of storage controllers that were still designed for slow spinning mediums.
Flash forward (see what I did there) to 2017, and the landscape is very different. Flash is relatively cheap (when there isn’t a shortage), reliable, and fast. We’re starting to see NVMe push what flash can do by giving it PCIe throughput. This isn’t so much a rethought of flash, but rather the continuing refinement of interfacing with this still new storage medium.
Initial storage controllers used on flash were designed originally for spinning disk. We saw the first breakthrough of speeds when Intel and others designed controllers specifically designed for the needs of flash. Now with NVMe, directly attaching storage to PCIe speeds moves performance even faster. But is this the high end of what we can expect?
Intel thinks the next step forward in flash efficiency isn’t tied to hardware. Don’t get me wrong, Intel also really wants to sell you a bunch of NVMe commodity storage. But with NVMe, getting performance out of the storage and controller itself isn’t so much an issue. They provide IOPS for days. No, the main bottleneck now becomes latency. That’s where Intel thinks they can innovate on software.
Intel’s Storage Performance Development Kit brings a minimalist approach to optimizing for latency with NVMe, allowing you to get more IOPS per dollar. This discards a lot of legacy software needs in the data path, with the goal of eliminating needless CPU cycles which generate latency. Intel engineered this driver to work at the lowest level possible so that the effects reverberate to all levels of the storage architecture.
What’s interesting is where Intel finds this efficiency. SPDK effectively bypasses the kernel by running at the user level. Intel sees the kernel space as having too many costly context switches and interrupts, which have small but snowballing effects on performance. This saves processor cycles which can be used to tune performance.
In practice, Intel is seeing the ability to scale almost linearly with NVMe IOPS performance by adding more drives. Intel does this by dedicating a single CPU core to handling IO. This core is solely dedicated to this use, but it seems worth it given the performance. As a point of comparison, the Linux kernel bottlenecks performance and actually decreases IOPS as more drives are added under the same arrangement. Meanwhile, using SPDK saw virtually identical increases in IOPS as each drive was added.
Granted, the Linux kernel is much more general purpose and has a much wider range of applications. But Intel is making the case that as we’ve hit a new absolute hardware performance plateau with NVMe at the moment, we still have a long way to go in terms of optimization. SPDK cleverly optimizes to reduce the latency bottleneck that couldn’t be perceived until the speeds of NVMe became available. Ordinarily drivers aren’t exactly the most exciting releases by a company. But Intel’s showing the nascent state of NVMe means there’s still a lot performance being left on the table, especially at scale.