Cloud infrastructure has often been presented as a great target for enterprise workloads that need disaster recovery capabilities. There’s a lot to like about using the cloud instead of a second data centre as a location for backup data. It can be more cost-effective than running your own facility. You don’t need your own people to run it for you. A number of solutions are available that perform a backup of your machines to cloud and, when you need to recover, convert those backup images into something that can run in the cloud. Happy days, yes?
A number of these solutions provide the ability to protect on-premises VMware workloads in AWS. How do they do that? The make use of the VM Import / Export tool to convert the VMware image to an EC2 image. This is clever and convenient, but it was never really designed to be fast. Indeed, the speed of the conversion is entirely based on the size of the VM being converted. It can take an awfully long time to get your VM up and running – which is not something you really want to see when you’re in the middle of a disaster recovery activity.
Druva takes a different approach to the problem and converts the image in place. When you need to power on a VM, the image is restored onto an EBS volume, where it is then “surgically altered”, and turned into an image compatible with EC2. This means that, instead of taking hours or days to convert your images, you’re looking at a wait time of 15 to 20 minutes – irrespective of the size of the VM you’re converting. The benefit of this is that you can achieve a recovery time objective (RTO) of between 15 and 20 minutes, and you can also offer users a recovery point objective (RPO) of one hour.
The Disaster Recovery as a Service (DRaas) offering requires a one-time configuration and gives you the option to specify the configurations of your VM images (so you assign appropriate resources to your VMs based on the performance requirements). You can also specify the boot order for your VMs and any scripts you want to run.
You can specify how often you want to update your DR cloud image (the default is after every backup). W. Curtis Preston tells me that the very first creation of the cloud image does take a while (as it’s based on the size of a full backup), but the incremental updates normally only take a few minutes.
The great thing about cloud scalability is that you can perform simultaneous reconfigurations of thousands of VMs at a time. You can also boot groups of machines, so in a DR event, you won’t need to wait for your VMs to come up in a serial fashion.
This all sounds great, but we know that cloud resources aren’t free. You need to pay for the cost of using the Druva service, and there’s a cost associated with hosting the VM that’s doing the conversion. Along with this, there’s the cost of the EBS volumes used, EBS snapshots, and the resources used when you declare a disaster will also cost you. Compared to the price of running a secondary data centre presence, however, you may find that this makes for quite a compelling option.
There’s also support for failback of workloads directly back to vSphere on-premises environments. You can also “fail sideways” and migrate those VMs you’ve recovered to the AWS Region of your choice. The capability is VMware-only today, but I understand that Druva is working on making this available for Microsoft Hyper-V as well.
Disaster recovery can be a stressful activity. Declaring a disaster is a serious thing in the first place and having to deal with complicated recovery processes and time-consuming conversion activities can really take away from many of the potential benefits of using public cloud as a solution for DR workloads. Druva has managed to come up with a way of recovering in the cloud that doesn’t need to be complicated, nor does it need to be time-consuming. This gives you time to focus on getting the rest of your business back up and running. This strikes me as a good thing.