27 Ceph as a back-end for QEMU KVM instance #
The most frequent Ceph use case involves providing block device images to virtual machines. For example, a user may create a 'golden' image with an OS and any relevant software in an ideal configuration. Then, the user takes a snapshot of the image. Finally, the user clones the snapshot (usually many times, see Section 20.3, “Snapshots” for details). The ability to make copy-on-write clones of a snapshot means that Ceph can provision block device images to virtual machines quickly, because the client does not need to download an entire image each time it spins up a new virtual machine.
Ceph block devices can integrate with the QEMU virtual machines. For more information on QEMU KVM, see https://documentation.suse.com/sles/15-SP3/html/SLES-all/part-virt-qemu.html.
27.1 Installing qemu-block-rbd
#
In order to use Ceph block devices, QEMU needs to have the appropriate
driver installed. Check whether the qemu-block-rbd
package is installed, and install it if needed:
#
zypper install qemu-block-rbd
27.2 Using QEMU #
The QEMU command line expects you to specify the pool name and image name. You may also specify a snapshot name.
qemu-img command options \ rbd:pool-name/image-name@snapshot-name:option1=value1:option2=value2...
For example, specifying the id and conf options might look like the following:
qemu-img command options \
rbd:pool_name/image_name:id=glance:conf=/etc/ceph/ceph.conf
27.3 Creating images with QEMU #
You can create a block device image from QEMU. You must specify
rbd
, the pool name, and the name of the image you want to
create. You must also specify the size of the image.
qemu-img create -f raw rbd:pool-name/image-name size
For example:
qemu-img create -f raw rbd:pool1/image1 10G Formatting 'rbd:pool1/image1', fmt=raw size=10737418240 nocow=off cluster_size=0
The raw
data format is really the only sensible format
option to use with RBD. Technically, you could use other QEMU-supported
formats such as qcow2
, but doing so would add additional
overhead, and would also render the volume unsafe for virtual machine live
migration when caching is enabled.
27.4 Resizing images with QEMU #
You can resize a block device image from QEMU. You must specify
rbd
, the pool name, and the name of the image you want to
resize. You must also specify the size of the image.
qemu-img resize rbd:pool-name/image-name size
For example:
qemu-img resize rbd:pool1/image1 9G Image resized.
27.5 Retrieving image info with QEMU #
You can retrieve block device image information from QEMU. You must
specify rbd
, the pool name, and the name of the image.
qemu-img info rbd:pool-name/image-name
For example:
qemu-img info rbd:pool1/image1 image: rbd:pool1/image1 file format: raw virtual size: 9.0G (9663676416 bytes) disk size: unavailable cluster_size: 4194304
27.6 Running QEMU with RBD #
QEMU can access an image as a virtual block device directly via
librbd
. This avoids an additional context switch,
and can take advantage of RBD caching.
You can use qemu-img
to convert existing virtual machine
images to Ceph block device images. For example, if you have a qcow2
image, you could run:
qemu-img convert -f qcow2 -O raw sles12.qcow2 rbd:pool1/sles12
To run a virtual machine booting from that image, you could run:
#
qemu -m 1024 -drive format=raw,file=rbd:pool1/sles12
RBD caching can significantly improve performance. QEMU’s cache options
control librbd
caching:
#
qemu -m 1024 -drive format=rbd,file=rbd:pool1/sles12,cache=writeback
For more information on RBD caching, refer to Section 20.5, “Cache settings”.
27.7 Enabling discard and TRIM #
Ceph block devices support the discard operation. This means that a guest
can send TRIM requests to let a Ceph block device reclaim unused space.
This can be enabled in the guest by mounting XFS
with the discard option.
For this to be available to the guest, it must be explicitly enabled for the
block device. To do this, you must specify a
discard_granularity
associated with the drive:
#
qemu -m 1024 -drive format=raw,file=rbd:pool1/sles12,id=drive1,if=none \
-device driver=ide-hd,drive=drive1,discard_granularity=512
The above example uses the IDE driver. The virtio driver does not support discard.
If using libvirt
, edit your libvirt domain’s
configuration file using virsh edit
to include the
xmlns:qemu
value. Then, add a qemu:commandline
block
as a child of that domain. The following example shows how
to set two devices with qemu id=
to different
discard_granularity
values.
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> <qemu:commandline> <qemu:arg value='-set'/> <qemu:arg value='block.scsi0-0-0.discard_granularity=4096'/> <qemu:arg value='-set'/> <qemu:arg value='block.scsi0-0-1.discard_granularity=65536'/> </qemu:commandline> </domain>
27.8 Setting QEMU cache options #
QEMU’s cache options correspond to the following Ceph RBD Cache settings.
Writeback:
rbd_cache = true
Writethrough:
rbd_cache = true rbd_cache_max_dirty = 0
None:
rbd_cache = false
QEMU’s cache settings override Ceph’s default settings (settings that are not explicitly set in the Ceph configuration file). If you explicitly set RBD Cache settings in your Ceph configuration file (refer to Section 20.5, “Cache settings”), your Ceph settings override the QEMU cache settings. If you set cache settings on the QEMU command line, the QEMU command line settings override the Ceph configuration file settings.