15 Disk Cache Modes #
15.1 Disk Interface Cache Modes #
Hypervisors allow for various storage caching strategies to be specified when configuring a VM Guest. Each guest disk interface can have one of the following cache modes specified: writethrough, writeback, none, directsync, or unsafe. If no cache mode is specified, an appropriate default cache mode is used. These cache modes influence how host-based storage is accessed, as follows:
- Read/write data may be cached in the host page cache. 
- The guest's storage controller is informed whether a write cache is present, allowing for the use of a flush command. 
- Synchronous write mode may be used, in which write requests are reported complete only when committed to the storage device. 
- Flush commands (generated by the guest storage controller) may be ignored for performance reasons. 
If a disorderly disconnection between the guest and its storage occurs, the cache mode in use will affect whether data loss occurs. The cache mode can also affect disk performance significantly. Additionally, some cache modes are incompatible with live migration, depending on several factors. There are no simple rules about what combination of cache mode, disk image format, image placement, or storage sub-system is best. The user should plan each guest's configuration carefully and experiment with various configurations to determine the optimal performance.
15.2 Description of Cache Modes #
- cache mode unspecified
- In older QEMU versions, not specifying a cache mode meant that writethrough would be used as the default. With modern versions—as shipped with SUSE Linux Enterprise Server—the various guest storage interfaces have been fixed to handle writeback or writethrough semantics more correctly. This allows for the default caching mode to be switched to writeback. The guest driver for each of - ide,- scsi, and- virtiohave within their power to disable the write back cache, causing the caching mode used to revert to writethrough. The typical guest's storage drivers will maintain the default caching mode as writeback, however.
- writethrough
- This mode causes the hypervisor to interact with the disk image file or block device with - O_DSYNCsemantics. Writes are reported as completed only when the data has been committed to the storage device. The host page cache is used in what can be termed a writethrough caching mode. The guest's virtual storage adapter is informed that there is no writeback cache, so the guest would not need to send down flush commands to manage data integrity. The storage behaves as if there is a writethrough cache.
- writeback
- This mode causes the hypervisor to interact with the disk image file or block device with neither - O_DSYNCnor- O_DIRECTsemantics. The host page cache is used and writes are reported to the guest as completed when they are placed in the host page cache. The normal page cache management will handle commitment to the storage device. Additionally, the guest's virtual storage adapter is informed of the writeback cache, so the guest would be expected to send down flush commands as needed to manage data integrity. Analogous to a raid controller with RAM cache.
- none
- This mode causes the hypervisor to interact with the disk image file or block device with - O_DIRECTsemantics. The host page cache is bypassed and I/O happens directly between the hypervisor user space buffers and the storage device. Because the actual storage device may report a write as completed when placed in its write queue only, the guest's virtual storage adapter is informed that there is a writeback cache. The guest would be expected to send down flush commands as needed to manage data integrity. Performance-wise, it is equivalent to direct access to your host's disk.
- unsafe
- This mode is similar to the - writebackmode discussed above. The key aspect of this “unsafe” mode, is that all flush commands from the guests are ignored. Using this mode implies that the user has accepted the trade-off of performance over risk of data loss in case of a host failure. Useful, for example, during guest installation, but not for production workloads.
- directsync
- This mode causes the hypervisor to interact with the disk image file or block device with both - O_DSYNCand- O_DIRECTsemantics. This means, writes are reported as completed only when the data has been committed to the storage device, and when it is also desirable to bypass the host page cache. Like writethrough, it is helpful to guests that do not send flushes when needed. It was the last cache mode added, completing the possible combinations of caching and direct access semantics.
15.3 Data Integrity Implications of Cache Modes #
- writethrough, none, directsync
- These are the safest modes, and considered equally safe, given that the guest operating system is “modern and well behaved”, which means that it uses flushes as needed. If you have a suspect guest, use writethough, or directsync. Note that some file systems are not compatible with - noneor- directsync, as they do not support O_DIRECT, which these cache modes rely on.
- writeback
- This mode informs the guest of the presence of a write cache, and relies on the guest to send flush commands as needed to maintain data integrity within its disk image. This is a common storage design which is completely accounted for within modern file systems. This mode exposes the guest to data loss in the unlikely case of a host failure, because there is a window of time between the time a write is reported as completed, and that write being committed to the storage device. 
- unsafe
- This mode is similar to writeback caching except for the following: the guest flush commands are ignored, nullifying the data integrity control of these flush commands, and resulting in a higher risk of data loss because of host failure. The name “unsafe” should serve as a warning that there is a much higher potential for data loss because of a host failure than with the other modes. As the guest terminates, the cached data is flushed at that time. 
15.4 Performance Implications of Cache Modes #
   The choice to make full use of the page cache, or to write through it, or
   to bypass it altogether can have dramatic performance implications. Other
   factors that influence disk performance include the capabilities of the
   actual storage system, what disk image format is used, the potential size
   of the page cache and the IO scheduler used. Additionally, not flushing
   the write cache increases performance, but with risk, as noted above. As
   a general rule, high-end systems typically perform best with the cache mode
   none, because of the reduced data copying that
   occurs. The potential benefit of having multiple guests share the common
   host page cache, the ratio of reads to writes, and the use of AIO mode
   native (see below) should also be considered.
  
15.5 Effect of Cache Modes on Live Migration #
   The caching of storage data and metadata restricts the configurations
   that support live migration. Currently, only raw,
   qcow2 and qed image formats can be
   used for live migration. If a clustered file system is used, all cache
   modes support live migration. Otherwise the only cache mode that supports
   live migration on read/write shared storage is none.
  
   The libvirt management layer includes checks for
   migration compatibility based on several factors. If the guest
   storage is hosted on a clustered file system, is read-only or is marked
   shareable, then the cache mode is ignored when determining if migration
   can be allowed. Otherwise libvirt will not allow
   migration unless the cache mode is set to none.
   However, this restriction can be overridden with the
   “unsafe” option to the migration APIs, which is also
   supported by virsh, as for example in
  
virsh migrate --live --unsafe
    The cache mode none is required for the AIO mode setting
    native. If another cache mode is used, then the
    AIO mode will silently be switched back to the default threads. The
    guest flush within the host is implemented using
    fdatasync().