This post is intended to cover the VCAP-DCA objective around configuring HA redundancy. This will look at configuring the management network, configuring datastore heartbeats, and will take a look at how HA clusters respond to network partitions.
Management Network Resilience
A common way to make a hosts management network resilient is to assign multiple network adapters to the vSwitch where the management port group resides. An often seen configuration is to share this vSwitch with the vMotion portgroup. For example:
The physical NICs would be connected to two separate physical switches (they may both be switch stack members), to protect against switch failure. Both the management vlan and the vMotion vlan would be presented to the host via a 802.1Q trunk.
The management network, in this example, has vmnic0 set as Active with vmnic1 being set as standby. Conversely, the vMotion network has vmnic1 as active with vmnic0 being standby. Load balancing should be set to ‘Route based on originating port ID’, which is the default.
In the event of a switch, NIC or link failure, the management or vMotion network will fail over and use it’s standby NIC. The management interface should be configured so that ‘Management Traffic’ is enabled. This means that the HA heartbeat traffic will be sent out of this interface:
Datastore Heartbeats
I briefly mentioned datastore heartbeating in this post on host isolation detection and response. Datastore heartbeating is used when the cluster master is no longer exchanging network heartbeats with a slave host. If the slave has also stopped sending datastore heartbeats it is deemed to have suffered a failure, and the virtual machines will be restarted on other hosts in the cluster.
Datastore heartbeating is configured in the HA cluster settings:
vCenter will select the datastores to be used for datastore heartbeating. As shown above, you can manually override this and select preferred datastores. In addition to the settings here, you can use the ‘das.heartbeatdsperhost‘. This lets you configure the number of datastores each host will use for heartbeating. The default is 2 datastores, though it can be increased to 5.
Datastore heartbeating uses the .vSphere-HA directory, located in the root of each datastore.
Network Partitions
A network partition describes the situation when a subset of a HA cluster’s hosts cannot communicate with other hosts in the cluster, over the management network. A partitioned cluster will mean that the cluster will be unable to protect virtual machines effectively. The cluster can only protect virtual machines that are running on hosts in the same network partition as the cluster master. The master host must also be able to communicate with vCenter. Network partitions should be corrected as soon as possible.
Useful Links and Resources
https://www.vmware.com/files/pdf/techpaper/vmw-vsphere-high-availability.pdf