18 High Availability for virtualization #
This chapter explains how to configure virtual machines as highly available cluster resources.
18.1 Overview #
Virtual machines can take different roles in a High Availability cluster:
A virtual machine can be managed by the cluster as a resource, without the cluster managing the services that run on the virtual machine. In this case, the VM is opaque to the cluster. This is the scenario described in this document.
A virtual machine can be a cluster resource and run
pacemaker_remote
, which allows the cluster to manage services running on the virtual machine. In this case, the VM is a guest node and is transparent to the cluster. For this scenario, see Section 4, “Use case 2: setting up a cluster with guest nodes”.A virtual machine can run a full cluster stack. In this case, the VM is a regular cluster node and is not managed by the cluster as a resource. For this scenario, see Démarrage rapide de l'installation et de la configuration.
The following procedures describe how to set up highly available virtual machines on block storage, with another block device used as an OCFS2 volume to store the VM lock files and XML configuration files. The virtual machines and the OCFS2 volume are configured as resources managed by the cluster, with resource constraints to ensure that the lock file directory is always available before a virtual machine starts on any node. This prevents the virtual machines from starting on multiple nodes.
18.2 Requirements #
A running High Availability cluster with at least two nodes and a fencing device such as SBD.
Passwordless
root
SSH login between the cluster nodes.A network bridge on each cluster node, to be used for installing and running the VMs. This must be separate from the network used for cluster communication and management.
Two or more shared storage devices (or partitions on a single shared device), so that all cluster nodes can access the files and storage required by the VMs:
A device to use as an OCFS2 volume, which will store the VM lock files and XML configuration files. Creating and mounting the OCFS2 volume is explained in the following procedure.
A device containing the VM installation source (such as an ISO file or disk image).
Depending on the installation source, you might also need another device for the VM storage disks.
To avoid I/O starvation, these devices must be separate from the shared device used for SBD.
Stable device names for all storage paths, for example,
/dev/disk/by-id/DEVICE_ID
. A shared storage device might have mismatched/dev/sdX
names on different nodes, which will cause VM migration to fail.
18.3 Configuring cluster resources to manage the lock files #
Use this procedure to configure the cluster to manage the virtual machine lock files. The lock file directory must be available on all nodes so that the cluster is aware of the lock files no matter which node the VMs are running on.
You only need to run the following commands on one of the cluster nodes.
Create an OCFS2 volume on one of the shared storage devices:
#
mkfs.ocfs2 /dev/disk/by-id/DEVICE_ID
Run
crm configure
to start thecrm
interactive shell.Create a primitive resource for DLM:
crm(live)configure#
primitive dlm ocf:pacemaker:controld \ op monitor interval=60 timeout=60
Create a primitive resource for the OCFS2 volume:
crm(live)configure#
primitive ocfs2 Filesystem \ params device="/dev/disk/by-id/DEVICE_ID" directory="/mnt/shared" fstype=ocfs2 \ op monitor interval=20 timeout=40
Create a group for the DLM and OCFS2 resources:
crm(live)configure#
group g-virt-lock dlm ocfs2
Clone the group so that it runs on all nodes:
crm(live)configure#
clone cl-virt-lock g-virt-lock \ meta interleave=true
Review your changes with
show
.If everything is correct, submit your changes with
commit
and leave the crm live configuration withquit
.Check the status of the group clone. It should be running on all nodes:
#
crm status
[...] Full List of Resources: [...] * Clone Set: cl-virt-lock [g-virt-lock]: * Started: [ alice bob ]
18.4 Preparing the cluster nodes to host virtual machines #
Use this procedure to install and start the required virtualization services, and to configure the nodes to store the VM lock files on the shared OCFS2 volume.
This procedure uses crm cluster run
to run commands on all
nodes at once. If you prefer to manage each node individually, you can omit the
crm cluster run
portion of the commands.
Install the virtualization packages on all nodes in the cluster:
#
crm cluster run "zypper install -y -t pattern kvm_server kvm_tools"
On one node, find and enable the
lock_manager
setting in the file/etc/libvirt/qemu.conf
:lock_manager = "lockd"
On the same node, find and enable the
file_lockspace_dir
setting in the file/etc/libvirt/qemu-lockd.conf
, and change the value to point to a directory on the OCFS2 volume:file_lockspace_dir = "/mnt/shared/lockd"
Copy these files to the other nodes in the cluster:
#
crm cluster copy /etc/libvirt/qemu.conf
#
crm cluster copy /etc/libvirt/qemu-lockd.conf
Enable and start the
libvirtd
service on all nodes in the cluster:#
crm cluster run "systemctl enable --now libvirtd"
This also starts the
virtlockd
service.
18.5 Adding virtual machines as cluster resources #
Use this procedure to add virtual machines to the cluster as cluster resources, with
resource constraints to ensure the VMs can always access the lock files. The lock files are
managed by the resources in the group g-virt-lock
, which is available on
all nodes via the clone cl-virt-lock
.
Install your virtual machines on one of the cluster nodes, with the following restrictions:
The installation source and storage must be on shared devices.
Do not configure the VMs to start on host boot.
For more information, see Virtualization Guide for SUSE Linux Enterprise Server.
If the virtual machines are running, shut them down. The cluster will start the VMs after you add them as resources.
Dump the XML configuration to the OCFS2 volume. Repeat this step for each VM:
#
virsh dumpxml VM1 > /mnt/shared/VM1.xml
Make sure the XML files do not contain any references to unshared local paths.
Run
crm configure
to start thecrm
interactive shell.Create primitive resources to manage the virtual machines. Repeat this step for each VM:
crm(live)configure#
primitive VM1 VirtualDomain \ params config="/mnt/shared/VM1.xml" remoteuri="qemu+ssh://%n/system" \ meta allow-migrate=true \ op monitor timeout=30s interval=10s
The option
allow-migrate=true
enables live migration. If the value is set tofalse
, the cluster migrates the VM by shutting it down on one node and restarting it on another node.If you need to set utilization attributes to help place VMs based on their load impact, see Section 7.10, “Placing resources based on their load impact”.
Create a colocation constraint so that the virtual machines can only start on nodes where
cl-virt-lock
is running:crm(live)configure#
colocation col-fs-virt inf: ( VM1 VM2 VMX ) cl-virt-lock
Create an ordering constraint so that
cl-virt-lock
always starts before the virtual machines:crm(live)configure#
order o-fs-virt Mandatory: cl-virt-lock ( VM1 VM2 VMX )
Review your changes with
show
.If everything is correct, submit your changes with
commit
and leave the crm live configuration withquit
.Check the status of the virtual machines:
#
crm status
[...] Full List of Resources: [...] * Clone Set: cl-virt-lock [g-virt-lock]: * Started: [ alice bob ] * VM1 (ocf::heartbeat:VirtualDomain): Started alice * VM2 (ocf::heartbeat:VirtualDomain): Started alice * VMX (ocf::heartbeat:VirtualDomain): Started alice
The virtual machines are now managed by the High Availability cluster, and can migrate between the cluster nodes.
After adding virtual machines as cluster resources, do not manage them manually. Only use the cluster tools as described in Chapter 8, Managing cluster resources.
To perform maintenance tasks on cluster-managed VMs, see Section 28.2, “Different options for maintenance tasks”.
18.6 Testing the setup #
Use the following tests to confirm that the virtual machine High Availability setup works as expected.
Perform these tests in a test environment, not a production environment.
The virtual machine
VM1
is running on nodealice
.On node
bob
, try to start the VM manually withvirsh start VM1
.Expected result: The
virsh
command fails.VM1
cannot be started manually onbob
when it is running onalice
.
The virtual machine
VM1
is running on nodealice
.Open two terminals.
In the first terminal, connect to
VM1
via SSH.In the second terminal, try to migrate
VM1
to nodebob
withcrm resource move VM1 bob
.Run
crm_mon -r
to monitor the cluster status until it stabilizes. This might take a short time.In the first terminal, check whether the SSH connection to
VM1
is still active.Expected result: The cluster status shows that
VM1
has started onbob
. The SSH connection toVM1
remains active during the whole migration.
The virtual machine
VM1
is running on nodebob
.Reboot
bob
.On node
alice
, runcrm_mon -r
to monitor the cluster status until it stabilizes. This might take a short time.Expected result: The cluster status shows that
VM1
has started onalice
.
The virtual machine
VM1
is running on nodealice
.Simulate a crash on
alice
by forcing the machine off or unplugging the power cable.On node
bob
, runcrm_mon -r
to monitor the cluster status until it stabilizes. VM failover after a node crashes usually takes longer than VM migration after a node reboots.Expected result: After a short time, the cluster status shows that
VM1
has started onbob
.