I’ve written about Storage IO Control previously, when covering the VCAP5-DCA objectives. I wanted to revisit the topic as part of my preparations for the design certification.
So to start, what is Storage IO Control or SIOC? VMware state that:
“vSphere Network I/O Control (NIOC) and vSphere Storage I/O Control (SIOC) monitors your network and storage and automatically shifts resources to your high-priority applications according to the rules and policies you’ve set up. It extends the familiar constructs of shares and limits, which exist for CPU and memory, to address network or storage utilization through a dynamic allocation of I/O capacity across a cluster of vSphere hosts. It increases administrator productivity by reducing active performance management..”
As discussed here, SIOC is a tool that helps manage IO congestion on datastores where it is enabled. Note that it is disabled by default, and can be enabled on a per-datastore basis. As with other ‘share’ methods of controlling resource, it only kicks in when certain thresholds are breached. Under ‘normal’ operation it won’t take any action. I want to mostly focus this article on SIOC from a design perspective, so will start with some of the requirements for running SIOC. Summarised, these include:
- Datastores that have SIOC enabled must be managed by a single vCenter server. This is because the SIOC settings will apply to any host connected to the datastore, so not a good idea to have hosts outside of vCenter connected. Hosts connected to the SIOC enabled datastore will write to a file called IORMSTATS.SF, which is present on each datastore.
- SIOC is not supported for RDMs
- SIOC is supported on FC, iSCSI and NFS presented storage
- SIOC is not supported on datastores that are made up from multiple extents
Generally it is recommended to enabled SIOC for your datastores if you are able to – Enterprise Plus is a requirement. I covered how to enable SIOC in this post, so won’t cover that here. There are some things to be aware of though – as there is a decision to make on with how to set the congestion threshold:
As shown above, there are a couple of different ways you can configure how SIOC works- using a manual threshold (which defaults to 30ms), or by using a percentage of peak throughput.
The default congestion threshold setting on vSphere 6 is to use the percentage of peak throughput which is generally recommended. Once enabled, as stated here – “disk shares are evaluated globally and the portion of the datastore’s resources each host receives depends on the sum of the shares of the virtual machines running on that host relative to the sum of the shares of all the virtual machines accessing that datastore”.
Once the congestion threshold is breached SIOC will throttle IOs, based on VMs and their disk shares. This effectively helps prevent a particularly demanding virtual machine from causing problems to others. Disk shares are set on a per-VM basis, by editing the VMs settings:
Some best practices to keep in mind when using SIOC:
- Avoid mixing vSphere LUNs and non-vSphere LUNs on the same physical storage. SIOC can detect this and will raise an alarm.
- Configure host IO queue size with the highest allowed value
- Keep congestion threshold conservative, though set lower if latency is more important than throughput.
- Be aware that a lower priority VM that has a lot of VMDKs may get a higher priority