Snapshots show downtime the door
Comment December 22nd, 2007
By Roger Howorth

I’ve recently upgraded several systems to use the latest server virtualisation software from VMware. The business needed the upgrade because the new software enabled them to assign more then 3.6GB of RAM and more than two processors to the accountancy system, which runs in a virtual machine (VM) under VMware Virtual Infrastructure 3. But while the upgrade was easy to justify, some of the biggest benefits of the move were not apparent until we had the new software up and running.

Besides the much needed boost in speed for the accountancy system, it turns out that the biggest benefit of the upgrade was that VI3 includes an updated snapshot feature that can make a snapshot of a VM while it is running.

Based on similar principles to the snapshot features found in many disk arrays, VM snapshots capture a ‘crash consistent’ image of a virtual machine – that is, they capture the contents of the disks at a moment in time in much the same way as would occur if the power were unplugged or the operating system crashed. And much like disk array snapshots, it takes only a few seconds to make a snapshot of a VM. This means snapshots can be made while people are working on the system, which is obviously incredibly useful.

For example, the accountancy system is used by people in Europe, South America and the USA, so the system is busy more or less constantly throughout the week. Scheduling weekday downtime to backup the system before applying patches is very difficult so before upgrading to VI3 patches could only be added at the weekend.

Now snapshots are used to make an instant backup of the server before applying patches, and once everyone is happy that adding the patches didn’t cause a problem the snapshot is deleted, all without taking the server off-line. If there was a problem and the snapshot was restored someone would need to manage any updates to the accountancy data that had been made since the snapshot was made, but of course, it’s rare that applying patches causes a problem. When a snapshot is restored, the main thing to notice is that disk-check utilities will kick off and fix any problems with the file system that might occur due to the crash consistent nature of these snapshots. However, in the last year there was only one episode where adding a Windows patch caused the Citrix server to stop accepting connections.

Because of the need to manage updates to the data if a snapshot is restored, patches are normally applied while the system is quiet, but this could be a 15 minute period rather than an entire weekend. Making a full backup of a traditional server would probably take several hours and would probably require the server to be switched off first. In the event that patching our system did cause a problem, the snapshot backup could be restored in a few seconds. Because the virtualisation hypervisor has full control over the accountancy server’s access to the VM’s disks, the snapshot will bring the system back to exactly the same state that it was when the snapshot was taken.

It turns out that the snapshot feature also forms the basis of some other new features that mean backing up data from VMs is much improved compared to making backups of traditional servers. For example, many VM backup tools now work by making a snapshot and then copying either the entire snapshot or just a few files and directories to tape or other low cost storage.