It has been two days since HDS surprised the enterprise storage world by not announcing a new storage platform to take on the EMC Symmetrix V-Max. Instead, HDS introduced High Availability Manager (“HAM” to us), disappointing some and confusing others. Now that the dust has settled some, it has become clearer just what HAM is and how it works, and we come away more impressed. HDS has taken simple, proven technologies (path management, clustering, synchronous replication) and remixed them into a super-high-availability solution for the largest enterprises. Perhaps this is not what many expected, but it’s certainly a worthwhile addition to the company’s family of products.
High Availability Manager consists of three main components:
- A conventional multi-pathing agent like HDLM on each server. This enables the server to continue accessing the storage if one USP fails. It will “think” it’s talking to a single storage target, but will actually be talking to two USPs that can be metro distances (60-100 miles) away, given proper connectivity. Microsoft MPIO will probably be supported as well, and VMware native multi-pathing (NMP) should come shortly after release. Don’t hold your breath for PowerPath to be officially supported in the short term, but it ought to work fine without changes.
- Existing TrueCopy synchronous storage replication technology will keep the data and quorum disks (see below) in lock-step. This gives the limitation on distance between systems, since latency is the enemy of storage protocol performance. Once the arrays move too far apart, write performance will suffer on the local array while it waits for data to be copied.
- Conventional clustering technology with a heartbeat and shared quorum disk lets both arrays know what’s going on. The quorum “lives” on the remote side, with that secondary array watching to make sure the primary array is still running. If the heartbeat goes away, the secondary array marks the HAMmed LUNs read/write and starts handling I/O from the servers, which will just have failed over.
So HDS’ secret HAM sauce is ketchup. There’s no amazing new technology here, and maybe that’s for the best in the kind of huge, conservative enterprise environments that will use this product. The big change was in programming the USP controllers to monitor the quorum disk and orchestrate the entire failover. In fact, this might even be considered a feature of TrueCopy, not a standalone product.
HDS ought to reconsider one element of the HAM pitch, however. Although it will undoubtedly yield a very highly-available storage architecture, nothing provides 100% availability. There are many moving parts involved, and unplanned outages can still happen. Multipath driver bugs are not unheard of: Back in the day, one version of a certain three-letter company’s product just plain refused to fail over! An instantaneous outage on one of the host channels or a skipped heartbeat on the USP controllers could also cause a failed failover. Plus, there is no provision for automated fail-back. Once the failover occurs, the system would certainly be operating in a degraded-availability mode and would require a (planned) outage to re-establish operations.
All that being said, HAM remains an attractive offering for shops with multiple USPs visible to critical servers. They can turn on the HAM software with existing hardware and add peace of mind, knowing that everything is that much more available. One aspect that really impresses is the fact that each USP can be running a different firmware, reducing the risk of upgrade-induced outages. Once it ships to customers (in the fourth quarter of this year), we will have to consider the (as-yet unnamed) cost.
And what about the fact that HAM will also allow seamless upgrades to a new generation of USP hardware? HDS still isn’t talking about that possibility!