Using VMware High Availability or Fault Tolerance with a CC-SG Virtual Appliance

The VM administrator interested in high-availability (HA) or fault tolerance (FT) must familiarize themselves with the vSphere Availability Guide ESX for the version in use.

Per the VM and Application Monitoring section of the vSphere Availability Guide ESX 4.1, “Occasionally, virtual machines that are still functioning properly stop sending heartbeats. To avoid unnecessarily resetting such virtual machines, the VM Monitoring service also monitors a virtual machine's I/O activity. If no heartbeats are received within the failure interval, the I/O stats interval (a cluster-level attribute) is checked. The I/O stats interval determines if any disk or network activity has occurred for the virtual machine during the previous two minutes (120 seconds). If not, the virtual machine is reset.”

The trade-off in using HA versus FT is increased recovery time and potential data loss versus increased resource utilization and potentially reduced performance.

FT requires the availability of an HA cluster. HA is built on the following:

The other key requirement is to have the available resources to ensure that HA will function properly when a failure does occur. This can be enforced via admission control, or if failover capacity is allowed to be over-subscribed by disabling admission control, contention for over-subscribed resources is managed by assigning priorities to VMs and defining policies for VM restart. Resource availability is also monitored to ensure the continued viability of the HA cluster.

Once the cluster configuration is completed and cluster has hosts and VMs assigned, the HA failover can be tested.

FT operates on a per VM basis – given that an HA cluster has already been configured and is available. There are also additional host, processor and networking requirements for FT.

The vSphere Availability Guide ESX 4.1 has two sections detailing cluster, host and VM requirements for FT compatibility. This includes a key notice regarding mixing ESX and ESXi hosts in an FT pair – even if you get away with it initially, DON’T DO IT.

At least two FT-certified hosts running the same Fault Tolerance version or host build number. The Fault Tolerance version number appears on a host's Summary tab in the vSphere Client.

Note: For hosts prior to ESX/ESXi 4.1, this tab lists the host build number instead. Patches can cause host build numbers to vary between ESX and ESXi installations. To ensure that your hosts are FT compatible, do not mix ESX and ESXi hosts in an FT pair.

A key requirement is that the hosts must have FT-compatible processors, and be licensed and certified for Fault Tolerance. Make sure that the host has hardware virtualization support enabled in BIOS. The vSphere Client host summary tab provides access to the version and FT configuration information.

If the host is not configured for FT, but is known to be compatible, check the BIOS settings. For example on the Dell R610 ensure that BIOS > Processor Settings > Virtualization Technology is set to Enabled.

FT can be enabled by right-clicking on the VM node and selecting Fault Tolerance > Turn On Fault Tolerance. If the items noted above are not configured correctly you will receive errors and need to fix some settings.

Refer to Table 3-1, Features and Devices Incompatible with Fault Tolerance and Corrective Actions, in the vSphere Availability Guide ESX 4.1 for details.

Symmetric multiprocessor (SMP) virtual machines. Only virtual machines with a single vCPU are compatible with Fault Tolerance.

Reconfigure the virtual machine as a single vCPU. Many workloads have good performance configured as a single vCPU.

When enabling Fault Tolerance for the VM an error is received “The virtual machine has more than one virtual CPU.”

Reduce the number of vCPUs to 1 by accessing Edit Settings.

CD-ROM or floppy virtual devices backed by a physical or remote device. Remove the CD-ROM or floppy virtual device or reconfigure the backing with an ISO installed on shared storage. When enabling Fault Tolerance for the VM an error is received “Device ‘CD-ROM1’ has a backing type that is not supported.”

Remove the device by from the list of hardware devices by accessing Edit Settings. If it is ever needed this device can be re-added to perform maintenance functions after disabling FT.

The virtual machine is running in a monitor mode that is incompatible for Fault Tolerance. Power down the VM before enabling Fault Tolerance. This is a limitation based on the CPU version and the type of guest you are running. You must first power off the VM, then enable FT.

Once these settings have been corrected, go back to the VM and enable FT.