Enterprise Storage has followed a similar architecture for a very long time, but has recently been undergoing major shifts. Instead of large arrays of disks with redundant controllers and expansion shelves, scale out software defined storage (SDS) products are increasing in popularity. Sometimes these distributed SDS solutions include compute capacity in addition to storage and are therefore considered to be part of the Hyperconverged Infrastructure (HCI) marketplace.
The Software Defined Storage Takeover
Often times SDS solutions can come in the form of a fully integrated hardware and software solution, but often times they are available as a software only solution that can run on the commodity hardware of the customer’s choice. As SDS has been on the rise many distributed filesystems have seen increased adoption not only in high performance computing (HPC) and hyperscale environments, but in enterprise data centers as well.
As these distributed storage systems grow, however, so too can complexity. Companies who opt to use open source solutions such as Ceph, XtreemFS, or GlusterFS quickly find that as storage scales out and grows in capacity it also grows in complexity, thus resulting in additional administrative overhead. This is contrarian to the view of hyperscale storage providers who aim to create massive storage infrastructure that can be automated to the degree that it can be managed by a very small team. Bringing this capability to an enterprise can be a challenge.
HyperScale Storage for the Enterprise
Recently I had the opportunity to speak with Björn Kolbeck, co-founder and CEO of Quobyte, and he shared with me how their data center filesystem can bring massive scale without massive administration. The software is designed to work on a variety of commodity hardware and stretch the filesystem across multiple nodes to provide high levels of availability and performance without the need for large storage teams. Kolbeck even told me that they have customers who are able to manage clusters over 100 PB with only 2 administrators.
Many of the advanced features of Quobyte’s filesystem are what make this ease of management possible.
- Disks that are part of the cluster each receive a unique identifier and can be moved to any node in the cluster as needed without data loss.
- Hardware failures can be be predicted by more indicators than just S.M.A.R.T. data.
- If a hardware failure does occur, the platform will rebuild data automatically without the need for human interaction.
- Data redundancy is provided by maintaining 3 replicas of data or erasure coding data in a way that multiple device failures can be tolerated and split-brain scenarios can be avoided.
- Server generations and make/model can be mixed within a cluster without negatively impacting manageability or performance.
Additionally, Quobyte has seen some recent feature releases specifically for Machine Learning (ML) work loads. A recently released TensorFlow plugin allows data to be shoveled directly into the Quobyte filesystem, thus bypassing the kernel. The result is that CPUE and RAM bandwidth is conserved and overall write latency is far lower. This is one application in which traditional enterprise storage solutions cannot compete performance wise.
Ken’s Conclusion
Enterprise IT has matured to the point where tribal knowledge is neither valuable nor differentiating. Reduction of operational problems so that more time and resources can be spent solving business problems and focusing revenue generating activities is paramount. Storage, like all infrastructure, needs to be easy and low friction to manage. By creating a platform that meets requirements of scale and performance in the enterprise without a heavy burden of administration, Quobyte is enabling the enterprise to move past infrastructure micro-management.