Tuesday, November 12, 2013

VMware vSphere Storage APIs – Array Integration (VAAI)

In a virtualized environment, storage operations traditionally have been expensive from a resource perspective. 
Functions such as cloning and snapshots can be performed more efficiently by the storage device than by 
the host.
VMware vSphere® Storage APIs – Array Integration (VAAI), also referred to as hardware acceleration or 
hardware offload APIs, are a set of APIs to enable communication between VMware vSphere ESXi™ hosts and 
storage devices. The APIs define a set of “storage primitives” that enable the ESXi host to offload certain storage 
operations to the array, which reduces resource overhead on the ESXi hosts and can significantly improve 
performance for storage-intensive operations such as storage cloning, zeroing, and so on. The goal of VAAI is to 
help storage vendors provide hardware assistance to speed up VMware® I/O operations that are more efficiently 
accomplished in the storage hardware.
Without the use of VAAI, cloning or migration of virtual machines by the vSphere VMkernel Data Mover involves 
software data movement. The Data Mover issues I/O to read and write blocks to and from the source and 
destination datastores. With VAAI, the Data Mover can use the API primitives to offload operations to the array if 
possible. For example, if the desired operation were to copy a virtual machine disk (VMDK) file from one 
datastore to another inside the same array, the array would be directed to make the copy completely inside the 
array. Whenever a data movement operation is invoked and the corresponding hardware offload operation is 
enabled, the Data Mover will first attempt to use the hardware offload. If the hardware offload operation fails, 
the Data Mover reverts to the traditional software method of data movement. 
In nearly all cases, hardware data movement will perform significantly better than software data movement. It 
will consume fewer CPU cycles and less bandwidth on the storage fabric. Improvements in performance can be 
observed by timing operations that use the VAAI primitives and using esxtop to track values such as CMDS/s, 
READS/s, WRITES/s, MBREAD/s, and MBWRTN/s of storage adapters during the operation.
In the initial VMware vSphere 4.1 implementation, three VAAI primitives were released. These primitives applied 
only to block (Fibre Channel, iSCSI, FCoE) storage. There were no VAAI primitives for NAS storage in this 
initial release.
In vSphere 5.0, VAAI primitives for NAS storage and VMware vSphere Thin Provisioning were introduced. 


VAAI Block Primitives

In VMware vSphere VMFS, many operations must establish a lock on the volume when updating a resource. 
Because VMFS is a clustered file system, many ESXi hosts can share the volume. When one host must make an 
update to the VMFS metadata, a locking mechanism is required to maintain file system integrity and prevent 
another host from coming in and updating the same metadata. The following operations require this lock:
1. Acquire on-disk locks 
2. Upgrade an optimistic lock to an exclusive/physical lock 
3. Unlock a read-only/multiwriter lock 
4. Acquire a heartbeat 
5. Clear a heartbeat 
6. Replay a heartbeat 
7. Reclaim a heartbeat 
8. Acquire on-disk lock with dead owner
It is not essential to understand all of these operations in the context of this whitepaper. It is sufficient to 
understand that various VMFS metadata operations require a lock.

ATS is an enhanced locking mechanism designed to replace the use of SCSI reservations on VMFS volumes 
when doing metadata updates. A SCSI reservation locks a whole LUN and prevents other hosts from doing 
metadata updates of a VMFS volume when one host sharing the volume has a lock. This can lead to various 
contention issues when many virtual machines are using the same datastore. It is a limiting factor for scaling to 
very large VMFS volumes. ATS is a lock mechanism that must modify only a disk sector on the VMFS volume. 
When successful, it enables an ESXi host to perform a metadata update on the volume. This includes allocating 
space to a VMDK during provisioning, because certain characteristics must be updated in the metadata to 
reflect the new size of the file. The introduction of ATS addresses the contention issues with SCSI reservations 
and enables VMFS volumes to scale to much larger sizes.
In vSphere 4.0, VMFS3 used SCSI reservations for establishing the lock, because there was no VAAI support in 
that release. In vSphere 4.1, on a VAAI-enabled array, VMFS3 used ATS for only operations 1 and 2 listed 
previously, and only when there was no contention for disk lock acquisitions. VMFS3 reverted to using SCSI 
reservations if there was a multihost collision when acquiring an on-disk lock using ATS.
In the initial VAAI release, the ATS primitives had to be implemented differently on each storage array, requiring 
a different ATS opcode depending on the vendor. ATS is now a standard T10 SCSI command and uses opcode 
0x89 (COMPARE AND WRITE).
For VMFS5 datastores formatted on a VAAI-enabled array, all the critical section functionality from operations 1 
to 8 is done using ATS. There no longer should be any SCSI reservations on VAAI-enabled VMFS5. ATS continues 
to be used even if there is contention. On non-VAAI arrays, SCSI reservations continue to be used for 
establishing critical sections in VMFS5.