Before going into configuring customized isolation response settings, it’s worth doing a quick recap on VMware HA and seeing what triggers an isolation response. As explained in the vSphere Availability Guide, a vSphere 5 cluster is made up of a master and a number of slave hosts. The master is elected when the cluster is created (and HA is enabled), with each additional host becoming a slave. Only one master is elected per cluster (which is a change from vSphere 4.x). A new master is elected should the existing master fail.
The master has a number of responsibilities. These include:
- Monitoring the slave hosts – identifying virtual machines that need to be restarted if the slave was to experience a failure
- Monitoring the power state of all protected virtual machines
- Reporting the cluster status back to vCenter
Meanwhile, the responsibility of the slave hosts is to monitor their virtual machines, and report their states back to the cluster’s master.
You can easily see whether a given host is a cluster is a master or slave by looking at the host’s summary tab in vCenter:
As stated above, the cluster master is responsible for detecting any failure of the slave hosts. It will detect a failure if:
- The slave host(s) stops working – e.g. power loss, hardware failure, PSOD
- The slave host becomes(s) isolated on the network
- The slave host(s) loses connectivity to the master
The master host monitors the state of the slave hosts by the exchange of network heartbeats. If the master stops receiving heartbeats from a slave host, it will check if the host in question is still exchanging heartbeats with one of the datastores. It will also check to see if the host will respond to a ping. If all these checks fail, then the slave host is deemed to have failed, and it’s virtual machines will be restarted on an alternate host.
If the host is unreachable through network heartbeats, and ICMP ping, but is still exchanging datastore heatbeats, then the slave host is considered to be in a network partition or isolated. The master will monitor the VMs on the host – if they power off, the master will power them on elsewhere. From the slaves point of view, if it stops exchanging network heartbeats, it will try and ping it’s isolation address. If that fails it will trigger the configured isolation response. By default, a host uses it’s default gateway as the isolation address that it will attempt to ping. To add further resilience, you can add more isolation addresses. To do so, in your cluster’s settings, go to the vSphere HA section then click Advanced Options. Add “das.isolationaddressX″ under the option column (where X is between 0 and 9, as you can have up to 10 isolation addresses), and the IP address you wish to ping:
Host Monitoring Status
Before looking at the types of isolation response, it’s worth mentioning the ‘Enable Host Monitoring’ setting. This is enabled by default, and whilst enabled, the master will monitor the slaves for any isolation. However, should it be disabled, the heartbeats will not be monitored, and a network failure will therefore not trigger an isolation response:
As stated above, disabling host monitoring is useful when you are performing planned network maintenance, but should otherwise be enabled.
Configuring Isolation Response
There are three options when choosing the cluster’s isolation response (what the isolated hosts will do with it’s virtual machines):
As shown above, these are:
- Leave powered on – If the host is isolated, the powerstate of the virtual machines will not change
- Power off – When the host is isolated, the virtual machines will be powered off. The cluster master will be responsible for powering them on elsewhere
- Shut down – When the host is isolated an attempt will be made to shutdown the VMs gracefully, using VMtools. After a period of time, if the shutdown is not successful, the VMs will be powered off instead.
The default isolation setting on vSphere 5 is ‘Leave powered on’.
Customized Isolation Response Settings
It’s possible to override the cluster’s isolation response setting on a per virtual machine basis:
In the example above, the cluster setting is set to ‘Leave powered on’, however I have set a number of virtual machine’s to shutdown and power off.
Useful Links and Resources
Keep up to date with new posts on Buildvirtual.net - Follow us on Twitter: Follow @buildvirtual