Few days ago I wrote about the impact of vMotion on a Data Center network and the traffic flow issues. Now let’s walk through what happens when you move a running virtual machine (VM) between two data centers (long-distance vMotion). Imagine we’re moving a web server that is:
- Serving a few Internet clients (with firewall/NAT and/or load balancing somewhere in the path);
- Getting most of its data from a database server sitting nearby;
- Reading and writing to a local disk.
The traffic flows are shown in the following diagram:
After you move the VM, its sessions remain intact. The traffic to/from the Internet still has to pass through the original firewall/load balancer (otherwise you’d lose the sessions) and the database traffic is still going to the original database server (otherwise the web applications would generate “a few” database errors).
Even worse, in many cases all disk write requests generated by the VM would have to go back to the primary data center. The resulting traffic flow spaghetti mess was aptly named the traffic trombone by Greg Ferro.
- Storage vMotion can be used to transfer the virtual disk file to another logical disk (LUN) with a primary copy in the second data center, effectively localizing the SAN traffic.
- (Speculative) SAN write requests might be quickly optimized if the virtual disk file (VMDK) is stored in a truly distributed NFS store (as opposed to active/standby block storage).
To give you a real-world (actually a lab) example: Cisco and VMware published a white paper describing how they managed to move a live Microsoft SQL server to a backup data center … resulting in ~15% performance degradation and unspecified increase in WAN traffic.
What do you think?
Now that you know what’s behind the scenes of long-distance vMotion, please tell me why it would make sense to you and where you’d use it in your production network … or, even better, what business problems are your server admins trying to solve with it.