I am certain you agree that high availability is a good thing for most systems. We only have to decide how much availability is right for the business needs, then design and deliver it. Easy, right? Yet I still have trouble sleeping well at night, writing on that eventual call about a system that is down. Then, I learned about the difference between available and reliable. Available means a system is running. Reliable means it’s not just running but is working correctly, and we can rely on it remaining that way. My dreams told me I needed more reliability in my life.
Last week, I chatted with Greg Knieriemen and Bart Guglelot of Hitachi Vantara. Bart manages in the Center of Excellence and is responsible for showcasing reliability solutions. We sat down to talk about this amazing video of Kari Byron of Mythbusters fame literally tearing into a Virtual Storage Platform (VSP) setup, unplugging, disconnecting, and taking out 26 components without any system downtime happening. You can watch this video at Unbreakable? With Kari Byron: MythBuster Takes on Hitachi Vantara Challenges | Hitachi Vantara.
Many high designs provide a failover environment with a planned downtime during the failover. Some solutions, such as RAID, should survive a single drive failure with no downtime. Kari wasn’t given instructions on what to break; she was taught how to break things but was not told which ones to remove or disable. She was legitimately choosing what she would break, just like on one of her shows.
What I love about this fact is that this test is more like what can happen in real life. Someone runs a script against the wrong environment. Power is lost. Weather happens. A network element is misconfigured. An update goes wrong. A data center robot runs into a rack. Someone steals a hard drive. The DNS is wrong. You know, real-life failures.
How does this all work? It is apparent from Kari’s work that systems are not failing because redundant components are ensuring that the “failed” components are not missed. VSP Global Active Device features active-active stretch-clustering, allowing replicated data storage to appear as a single LUN between geographically distributed sites. This single LUN approach means minimal impact from component failures.
Not only are components redundant, but their internal elements are also redundant: think processors, paths, and circuits. This inner redundancy means the components themselves have higher reliability.
Another type of reliability we don’t often talk about is malfunctioning components that are technically working but performing poorly. They are available but not reliable. Applications can fail under unexpected latency or irregular performance. Connections can be dropped. That is where the VSP AI-enabled monitoring and management system helps us understand what is happening. Hitachi Ops Center Analyzer is a standard part of VSP. It can monitor data center assets for risks and problems while leveraging machine learning to ensure that data is moving at optimal speeds. Predictive analytics assist infrastructure administrators in planning for changes to business requirements. It can even help with root cause analysis when problems occur.
100% Data Availability Guarantee
Also covered in our roundtable was Hitachi Vantara’s uniqueness of their 100% data availability guarantee. Customers who deploy a high-end series of VSP along with a set of required features can trust that a credit will be provided if there is a loss of data availability due to a failure or malfunction and data cannot be read or written.
As I mentioned in the roundtable, I watched Kari trying to break the VSP well enough to impact the system and felt both horrified and a bit jealous that she was getting to do this. I smiled when she said, “I can do better; I can break this!” I certainly understand that feeling. Nevertheless, most component failures are not due to malicious behavior; they often come down to just time and probabilities. I want to sleep better at night, not just because someone said their systems are resilient, but because they showed me and backed that up with a guarantee.
I can do better; I can break this!– Kari Byron
I hope your data infrastructure is reliable enough that you will never have to worry about Kari Byron showing up in your data center.