15 Configuring virtual machines with virsh
#
You can use virsh
to configure virtual machines (VM) on the command
line as an alternative to using the Virtual Machine Manager. With virsh
, you can
control the state of a VM, edit the configuration of a VM or even
migrate a VM to another host. The following sections describe how to
manage VMs by using virsh
.
15.1 Editing the VM configuration #
The configuration of a VM is stored in an XML file in
/etc/libvirt/qemu/
and looks like this:
<domain type='kvm'> <name>sles15</name> <uuid>ab953e2f-9d16-4955-bb43-1178230ee625</uuid> <memory unit='KiB'>2097152</memory> <currentMemory unit='KiB'>2097152</currentMemory> <vcpu placement='static'>2</vcpu> <os> <type arch='x86_64' machine='pc-q35-2.0'>hvm</type> </os> <features>...</features> <cpu mode='custom' match='exact' check='partial'> <model fallback='allow'>Skylake-Client-IBRS</model> </cpu> <clock>...</clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'>...</disk> </devices> ... </domain>
To edit the configuration of a VM Guest, check if it is offline:
>
sudo
virsh list --inactive
If your VM Guest is in this list, you can safely edit its configuration:
>
sudo
virsh edit NAME_OF_VM_GUEST
Before saving the changes, virsh
validates your input against a RelaxNG
schema.
15.2 Changing the machine type #
When installing with the virt-install
tool, the
machine type for a VM Guest is pc-q35 by default.
The machine type is stored in the VM Guest's configuration file in the
type
element:
<type arch='x86_64' machine='pc-q35-2.3'>hvm</type>
As an example, the following procedure shows how to change this value to
the machine type q35
. The value q35
is an Intel* chipset and includes
PCIe, supports up to 12 USB ports, and
has support for SATA and
IOMMU.
Check whether your VM Guest is inactive:
>
sudo
virsh list --inactive
Id Name State ---------------------------------------------------- - sles15 shut offEdit the configuration for this VM Guest:
>
sudo
virsh edit sles15
Replace the value of the
machine
attribute withpc-q35-2.0
:<type arch='x86_64' machine='pc-q35-2.0'>hvm</type>
Restart the VM Guest:
>
sudo
virsh start sles15
Check if the machine type has changed. Log in to the VM Guest and run the following command:
>
sudo
dmidecode | grep Product
Product Name: Standard PC (Q35 + ICH9, 2009)
Whenever the QEMU version on the host system is upgraded, for example,
when upgrading the VM Host Server to a new service pack, upgrade the machine
type of the VM Guests to the latest available version. To check, use
the command qemu-system-x86_64 -M help
on the
VM Host Server.
The default machine type pc-i440fx
, for example, is
regularly updated. If your VM Guest still runs with a machine type of
pc-i440fx-1.X
, we
strongly recommend an update to
pc-i440fx-2.X
. This
allows taking advantage of the most recent updates and corrections in
machine definitions, and ensures better future compatibility.
15.3 Configuring hypervisor features #
libvirt
automatically enables a default set of
hypervisor features that are sufficient in most circumstances, but also
allows enabling and disabling features as needed. As an example, Xen does
not support enabling PCI pass-through by default. It must be enabled with
the passthrough
setting. Hypervisor features can be
configured with virsh
. Look for the <features>
element
in the VM Guest's configuration file and adjust its features as
required. Continuing with the Xen pass-through example:
>
sudo
virsh edit sle15sp1 <features> <xen> <passthrough/> </xen> </features>
Save your changes and restart the VM Guest.
See the Hypervisor features section of the libvirt Domain XML format manual at https://libvirt.org/formatdomain.html#elementsFeatures for more information.
15.4 Configuring CPU #
Many aspects of the virtual CPUs presented to VM Guests are configurable
with virsh
. The number of current and maximum CPUs allocated to a
VM Guest can be changed, as well as the model of the CPU and its feature
set. The following subsections describe how to change the common CPU
settings of a VM Guest.
15.4.1 Configuring the number of CPUs #
The number of allocated CPUs is stored in the VM Guest's XML
configuration file in /etc/libvirt/qemu/
in the
vcpu
element:
<vcpu placement='static'>1</vcpu>
In this example, the VM Guest has only one allocated CPU. The following procedure shows how to change the number of allocated CPUs for the VM Guest:
Check whether your VM Guest is inactive:
>
sudo
virsh list --inactive
Id Name State ---------------------------------------------------- - sles15 shut offEdit the configuration for an existing VM Guest:
>
sudo
virsh edit sles15
Change the number of allocated CPUs:
<vcpu placement='static'>2</vcpu>
Restart the VM Guest:
>
sudo
virsh start sles15
Check if the number of CPUs in the VM has changed.
>
sudo
virsh vcpuinfo sled15
VCPU: 0 CPU: N/A State: N/A CPU time N/A CPU Affinity: yy VCPU: 1 CPU: N/A State: N/A CPU time N/A CPU Affinity: yy
You can also change the number of CPUs while the VM Guest is running. CPUs can be hotplugged until the maximum number configured at VM Guest start is reached. Likewise, they can be hot-unplugged until the lower limit of 1 is reached. The following example shows changing the active CPU count from 2 to a predefined maximum of 4.
Check the current live vcpu count:
>
sudo
virsh vcpucount sles15 | grep live
maximum live 4 current live 2Change the current, or active, number of CPUs to 4:
>
sudo
virsh setvcpus sles15 --count 4 --live
Check that the current live vcpu count is now 4:
>
sudo
virsh vcpucount sles15 | grep live
maximum live 4 current live 4
With KVM, it is possible to define a VM Guest with more than 255
CPUs. However, additional configuration is necessary to start and run
the VM Guest. The ioapic
feature needs to be
tuned and an IOMMU device needs to be added to the VM Guest. Below
is an example configuration for 288 CPUs.
<domain> <vcpu placement='static'>288</vcpu> <features> <ioapic driver='qemu'/> </features> <devices> <iommu model='intel'> <driver intremap='on' eim='on'/> </iommu> </devices> </domain>
15.4.2 Configuring the CPU model #
The CPU model exposed to a VM Guest can often influence the workload
running within it. The default CPU model is derived from a CPU mode
known as host-model
.
<cpu mode='host-model'/>
When starting a VM Guest with the CPU mode host-model
,
libvirt
copies its model of the host CPU into the VM Guest
definition. The host CPU model and features copied to the VM Guest
definition can be observed in the output of the virsh
capabilities
.
Another interesting CPU mode is host-passthrough
.
<cpu mode='host-passthrough'/>
When starting a VM Guest with the CPU mode
host-passthrough
, it is presented with a CPU that is
exactly the same as the VM Host Server CPU. This can be useful when the
VM Guest workload requires CPU features not available in libvirt
's
simplified host-model
CPU. The
host-passthrough
CPU mode comes with the
disadvantage of reduced migration flexibility. A VM Guest with
host-passthrough
CPU mode can only be migrated to a
VM Host Server with identical hardware.
When using the host-passthrough
CPU mode, it is
still possible to disable undesirable features. The following
configuration presents the VM Guest with a CPU that is exactly the
same as the host CPU but with the vmx
feature
disabled.
<cpu mode='host-passthrough'> <feature policy='disable' name='vmx'/> </cpu>
The custom
CPU mode is another common mode used to
define a normalized CPU that can be migrated throughout dissimilar
hosts in a cluster. For example, in a cluster with hosts containing
Nehalem, IvyBridge and SandyBridge CPUs, the VM Guest can be
configured with a custom
CPU mode that contains a
Nehalem CPU model.
<cpu mode='custom' match='exact'> <model fallback='allow'>Nehalem</model> <feature policy='require' name='vme'/> <feature policy='require' name='ds'/> <feature policy='require' name='acpi'/> <feature policy='require' name='ss'/> <feature policy='require' name='ht'/> <feature policy='require' name='tm'/> <feature policy='require' name='pbe'/> <feature policy='require' name='dtes64'/> <feature policy='require' name='monitor'/> <feature policy='require' name='ds_cpl'/> <feature policy='require' name='vmx'/> <feature policy='require' name='est'/> <feature policy='require' name='tm2'/> <feature policy='require' name='xtpr'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='dca'/> <feature policy='require' name='rdtscp'/> <feature policy='require' name='invtsc'/> </cpu>
For more information on libvirt
's CPU model and topology options, see
the CPU model and topology documentation at
https://libvirt.org/formatdomain.html#cpu-model-and-topology.
15.6 Configuring memory allocation #
The amount of memory allocated for the VM Guest can also be configured
with virsh
. It is stored in the memory
element and defines
the maximum allocation of memory for the VM Guest at boot time. The
optional currentMemory
element defines the actual memory
allocated to the VM Guest. currentMemory
can be less than
memory
, allowing for increasing (or
ballooning) the memory while the VM Guest is
running. If currentMemory
is omitted, it defaults to the same
value as the memory
element.
You can adjust memory settings by editing the VM Guest configuration, but be aware that changes do not take place until the next boot. The following steps demonstrate changing a VM Guest to boot with 4G of memory, but allow later expansion to 8G:
Open the VM Guest's XML configuration:
>
sudo
virsh edit sles15
Search for the
memory
element and set to 8G:... <memory unit='KiB'>8388608</memory> ...
If the
currentMemory
element does not exist, add it below thememory
element, or change its value to 4G:[...] <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>4194304</currentMemory> [...]
Changing the memory allocation while the VM Guest is running can be done
with the setmem
subcommand. The following example
shows increasing the memory allocation to 8G:
Check VM Guest existing memory settings:
>
sudo
virsh dominfo sles15 | grep memory
Max memory: 8388608 KiB Used memory: 4194608 KiBChange the used memory to 8G:
>
sudo
virsh setmem sles15 8388608
Check the updated memory settings:
>
sudo
virsh dominfo sles15 | grep memory
Max memory: 8388608 KiB Used memory: 8388608 KiB
VM Guests with memory requirements of 4 TB or more must either
use the host-passthrough
CPU mode, or explicitly
specify the virtual CPU address size when using
host-model
or custom
CPU modes.
The default virtual CPU address size may not be sufficient for memory
configurations of 4 TB or more. The following example shows how to
use the VM Host Server's physical CPU address size when using the
host-model
CPU mode.
[...] <cpu mode='host-model' check='partial'> <maxphysaddr mode='passthrough'> </cpu> [...]
For more information on specifying virtual CPU address size, see the
maxphysaddr
option in the CPU model and
topology documentation at
https://libvirt.org/formatdomain.html#cpu-model-and-topology.
15.7 Adding a PCI device #
To assign a PCI device to VM Guest with virsh
, follow these steps:
Identify the host PCI device to assign to the VM Guest. In the following example, we are assigning a DEC network card to the guest:
>
sudo
lspci -nn
[...] 03:07.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip \ 21140 [FasterNet] [1011:0009] (rev 22) [...]Write down the device ID,
03:07.0
in this example.Gather detailed information about the device using
virsh nodedev-dumpxml ID
. To get the ID, replace the colon and the period in the device ID (03:07.0
) with underscores. Prefix the result with “pci_0000_”:pci_0000_03_07_0
.>
sudo
virsh nodedev-dumpxml pci_0000_03_07_0
<device> <name>pci_0000_03_07_0</name> <path>/sys/devices/pci0000:00/0000:00:14.4/0000:03:07.0</path> <parent>pci_0000_00_14_4</parent> <driver> <name>tulip</name> </driver> <capability type='pci'> <domain>0</domain> <bus>3</bus> <slot>7</slot> <function>0</function> <product id='0x0009'>DECchip 21140 [FasterNet]</product> <vendor id='0x1011'>Digital Equipment Corporation</vendor> <numa node='0'/> </capability> </device>Write down the values for domain, bus and function (see the previous XML code printed in bold).
Detach the device from the host system before attaching it to the VM Guest:
>
sudo
virsh nodedev-detach pci_0000_03_07_0
Device pci_0000_03_07_0 detachedTip: Multi-function PCI devicesWhen using a multi-function PCI device that does not support FLR (function level reset) or PM (power management) reset, you need to detach all its functions from the VM Host Server. The whole device must be reset for security reasons.
libvirt
refuses to assign the device if one of its functions is still in use by the VM Host Server or another VM Guest.Convert the domain, bus, slot, and function value from decimal to hexadecimal. In our example, domain = 0, bus = 3, slot = 7, and function = 0. Ensure that the values are inserted in the right order:
>
printf "<address domain='0x%x' bus='0x%x' slot='0x%x' function='0x%x'/>\n" 0 3 7 0
This results in:
<address domain='0x0' bus='0x3' slot='0x7' function='0x0'/>
Run
virsh edit
on your domain, and add the following device entry in the<devices>
section using the result from the previous step:<hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0' bus='0x03' slot='0x07' function='0x0'/> </source> </hostdev>
Tip:managed
compared tounmanaged
libvirt
recognizes two modes for handling PCI devices: they can bemanaged
orunmanaged
. In the managed case,libvirt
handles all details of unbinding the device from the existing driver if needed, resetting the device, binding it tovfio-pci
before starting the domain, etc. When the domain is terminated or the device is removed from the domain,libvirt
unbinds fromvfio-pci
and rebinds to the original driver when using a managed device. If the device is unmanaged, the user must ensure that all these management aspects of the device are done before assigning it to a domain, and after the device is no longer used by the domain.In the example above, the
managed='yes'
option means that the device is managed. To switch the device mode to unmanaged, setmanaged='no'
in the listing above. If you do so, you need to take care of the related driver with thevirsh nodedev-detach
andvirsh nodedev-reattach
commands. Before starting the VM Guest, you need to detach the device from the host by runningvirsh nodedev-detach pci_0000_03_07_0
. In case the VM Guest is not running, you can make the device available for the host by runningvirsh nodedev-reattach pci_0000_03_07_0
.Shut down the VM Guest and disable SELinux if it is running on the host.
>
sudo
setsebool -P virt_use_sysfs 1
Start your VM Guest to make the assigned PCI device available:
>
sudo
virsh start sles15
On a newer QEMU machine type (pc-i440fx-2.0 or higher) with SLES 11
SP4 KVM guests, the acpiphp
module is not loaded by default in the guest. This module must be
loaded to enable hotplugging of disk and network devices. To load the
module manually, use the command modprobe acpiphp
.
It is also possible to autoload the module by adding install
acpiphp /bin/true
to the
/etc/modprobe.conf.local
file.
KVM guests using the QEMU Q35 machine type have a PCI topology that
includes a pcie-root
controller and seven
pcie-root-port
controllers. The
pcie-root
controller does not support hotplugging.
Each pcie-root-port
controller supports hotplugging
a single PCIe device. PCI controllers cannot be hotplugged, so plan
accordingly and add more pcie-root-port
s to hotplug
more than seven PCIe devices. A pcie-to-pci-bridge
controller can be added to support hotplugging legacy PCI devices. See
https://libvirt.org/pci-hotplug.html for more
information about PCI topology between QEMU machine types.
15.7.1 PCI Pass-Through for IBM Z #
To support IBM Z, QEMU extended PCI representation by allowing the user to
configure extra attributes. Two more
attributes—uid
and
fid
—were added to the
<zpci/>
libvirt
specification.
uid
represents user-defined identifier, while
fid
represents PCI function identifier. These
attributes are optional and if you do not specify them, they are
automatically generated with non-conflicting values.
To include zPCI attribute in your domain specification, use the following example definition:
<controller type='pci' index='0' model='pci-root'/> <controller type='pci' index='1' model='pci-bridge'> <model name='pci-bridge'/> <target chassisNr='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'> <zpci uid='0x0001' fid='0x00000000'/> </address> </controller> <interface type='bridge'> <source bridge='virbr0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'> <zpci uid='0x0007' fid='0x00000003'/> </address> </interface>
15.8 Adding a USB device #
To assign a USB device to VM Guest using virsh
, follow these steps:
Identify the host USB device to assign to the VM Guest:
>
sudo
lsusb
[...] Bus 001 Device 003: ID 0557:2221 ATEN International Co., Ltd Winbond Hermon [...]Write down the vendor and product IDs. In our example, the vendor ID is
0557
and the product ID is2221
.Run
virsh edit
on your domain, and add the following device entry in the<devices>
section using the values from the previous step:<hostdev mode='subsystem' type='usb'> <source startupPolicy='optional'> <vendor id='0557'/> <product id='2221'/> </source> </hostdev>
Tip: Vendor/product or device's addressInstead of defining the host device with
<vendor/>
and<product/>
IDs, you can use the<address/>
element as described for host PCI devices in Section 15.7, “Adding a PCI device”.Shut down the VM Guest and disable SELinux if it is running on the host:
>
sudo
setsebool -P virt_use_sysfs 1
Start your VM Guest to make the assigned PCI device available:
>
sudo
virsh start sles15
15.9 Adding SR-IOV devices #
Single Root I/O Virtualization (SR-IOV) capable PCIe devices can replicate their resources, so they appear as multiple devices. Each of these “pseudo-devices” can be assigned to a VM Guest.
SR-IOV is an industry specification that was created by the Peripheral Component Interconnect Special Interest Group (PCI-SIG) consortium. It introduces physical functions (PF) and virtual functions (VF). PFs are full PCIe functions used to manage and configure the device. PFs also can move data. VFs lack the configuration and management part—they only can move data and a reduced set of configuration functions. As VFs do not have all PCIe functions, the host operating system or the Hypervisor must support SR-IOV to access and initialize VFs. The theoretical maximum for VFs is 256 per device (consequently the maximum for a dual-port Ethernet card would be 512). In practice, this maximum is much lower, since each VF consumes resources.
15.9.1 Requirements #
The following requirements must be met to use SR-IOV:
An SR-IOV-capable network card (as of SUSE Linux Enterprise Server 15, only network cards support SR-IOV)
An AMD64/Intel 64 host supporting hardware virtualization (AMD-V or Intel VT-x), see Section 7.1.1, “KVM hardware requirements” for more information
A chipset that supports device assignment (AMD-Vi or Intel VT-d)
libvirt
0.9.10 or betterSR-IOV drivers must be loaded and configured on the host system
A host configuration that meets the requirements listed at Important: Requirements for VFIO and SR-IOV
A list of the PCI addresses of the VFs assigned to VM Guests
The information whether a device is SR-IOV-capable can be obtained
from its PCI descriptor by running lspci
. A device
that supports SR-IOV reports a capability
similar to the following:
Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
Before adding an SR-IOV device to a VM Guest when initially setting it up, the VM Host Server already needs to be configured as described in Section 15.9.2, “Loading and configuring the SR-IOV host drivers”.
15.9.2 Loading and configuring the SR-IOV host drivers #
To access and initialize VFs, an SR-IOV-capable driver needs to be loaded on the host system.
Before loading the driver, make sure the card is properly detected by running
lspci
. The following example shows thelspci
output for the dual-port Intel 82576NS network card:>
sudo
/sbin/lspci | grep 82576
01:00.0 Ethernet controller: Intel Corporation 82576NS Gigabit Network Connection (rev 01) 01:00.1 Ethernet controller: Intel Corporation 82576NS Gigabit Network Connection (rev 01) 04:00.0 Ethernet controller: Intel Corporation 82576NS Gigabit Network Connection (rev 01) 04:00.1 Ethernet controller: Intel Corporation 82576NS Gigabit Network Connection (rev 01)In case the card is not detected, the hardware virtualization support in the BIOS/EFI may not have been enabled. To check if hardware virtualization support is enabled, look at the settings in the host's BIOS.
Check whether the SR-IOV driver is already loaded by running
lsmod
. In the following example, a check for the igb driver (for the Intel 82576NS network card) returns a result. That means the driver is already loaded. If the command returns nothing, the driver is not loaded.>
sudo
/sbin/lsmod | egrep "^igb "
igb 185649 0Skip the following step if the driver is already loaded. If the SR-IOV driver is not yet loaded, the non-SR-IOV driver needs to be removed first, before loading the new driver. Use
rmmod
to unload a driver. The following example unloads the non-SR-IOV driver for the Intel 82576NS network card:>
sudo
/sbin/rmmod igbvf
Load the SR-IOV driver subsequently using the
modprobe
command—the VF parameter (max_vfs
) is mandatory:>
sudo
/sbin/modprobe igb max_vfs=8
As an alternative, you can also load the driver via SYSFS:
Find the PCI ID of the physical NIC by listing Ethernet devices:
>
sudo
lspci | grep Eth
06:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) 06:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)To enable VFs, echo the number of desired VFs to load to the
sriov_numvfs
parameter:>
sudo
echo 1 > /sys/bus/pci/devices/0000:06:00.1/sriov_numvfs
Verify that the VF NIC was loaded:
>
sudo
lspci | grep Eth
06:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) 06:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) 06:08.0 Ethernet controller: Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)Obtain the maximum number of VFs available:
>
sudo
lspci -vvv -s 06:00.1 | grep 'Initial VFs'
Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 01Create a
/etc/systemd/system/before.service
file which loads VF via SYSFS on boot:[Unit] Before= [Service] Type=oneshot RemainAfterExit=true ExecStart=/bin/bash -c "echo 1 > /sys/bus/pci/devices/0000:06:00.1/sriov_numvfs" # beware, executable is run directly, not through a shell, check the man pages # systemd.service and systemd.unit for full syntax [Install] # target in which to start the service WantedBy=multi-user.target #WantedBy=graphical.target
Before starting the VM, it is required to create another service file (
after-local.service
) pointing to the/etc/init.d/after.local
script that detaches the NIC. Otherwise the VM would fail to start:[Unit] Description=/etc/init.d/after.local Compatibility After=libvirtd.service Requires=libvirtd.service [Service] Type=oneshot ExecStart=/etc/init.d/after.local RemainAfterExit=true [Install] WantedBy=multi-user.target
Copy it to
/etc/systemd/system
.#! /bin/sh # ... virsh nodedev-detach pci_0000_06_08_0
Save it as
/etc/init.d/after.local
.Reboot the machine and check if the SR-IOV driver is loaded by re-running the
lspci
command from the first step of this procedure. If the SR-IOV driver was loaded successfully you should see additional lines for the VFs:01:00.0 Ethernet controller: Intel Corporation 82576NS Gigabit Network Connection (rev 01) 01:00.1 Ethernet controller: Intel Corporation 82576NS Gigabit Network Connection (rev 01) 01:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 01:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 01:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) [...] 04:00.0 Ethernet controller: Intel Corporation 82576NS Gigabit Network Connection (rev 01) 04:00.1 Ethernet controller: Intel Corporation 82576NS Gigabit Network Connection (rev 01) 04:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 04:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 04:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) [...]
15.9.3 Adding a VF network device to a VM Guest #
When the SR-IOV hardware is properly set up on the VM Host Server, you can add VFs to VM Guests. To do so, you need to collect specific data first.
The following procedure uses example data. Replace it with appropriate data from your setup.
Use the
virsh nodedev-list
command to get the PCI address of the VF you want to assign and its corresponding PF. Numerical values from thelspci
output shown in Section 15.9.2, “Loading and configuring the SR-IOV host drivers”, for example,01:00.0
or04:00.1
, are transformed by adding the prefixpci_0000_
and by replacing colons and dots with underscores. So a PCI ID listed as04:00.0
bylspci
is listed aspci_0000_04_00_0
by virsh. The following example lists the PCI IDs for the second port of the Intel 82576NS network card:>
sudo
virsh nodedev-list | grep 0000_04_
pci_0000_04_00_0 pci_0000_04_00_1 pci_0000_04_10_0 pci_0000_04_10_1 pci_0000_04_10_2 pci_0000_04_10_3 pci_0000_04_10_4 pci_0000_04_10_5 pci_0000_04_10_6 pci_0000_04_10_7 pci_0000_04_11_0 pci_0000_04_11_1 pci_0000_04_11_2 pci_0000_04_11_3 pci_0000_04_11_4 pci_0000_04_11_5The first two entries represent the PFs, whereas the other entries represent the VFs.
Run the following
virsh nodedev-dumpxml
command on the PCI ID of the VF you want to add:>
sudo
virsh nodedev-dumpxml pci_0000_04_10_0
<device> <name>pci_0000_04_10_0</name> <parent>pci_0000_00_02_0</parent> <capability type='pci'> <domain>0</domain> <bus>4</bus> <slot>16</slot> <function>0</function> <product id='0x10ca'>82576 Virtual Function</product> <vendor id='0x8086'>Intel Corporation</vendor> <capability type='phys_function'> <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </capability> </capability> </device>The following data is needed for the next step:
<domain>0</domain>
<bus>4</bus>
<slot>16</slot>
<function>0</function>
Create a temporary XML file, for example,
/tmp/vf-interface.xml
, containing the data necessary to add a VF network device to an existing VM Guest. The minimal content of the file needs to look like the following:<interface type='hostdev'>1 <source> <address type='pci' domain='0' bus='11' slot='16' function='0'2/>2 </source> </interface>
VFs do not get a fixed MAC address; it changes every time the host reboots. When adding network devices the “traditional” way with
hostdev
, it would require to reconfigure the VM Guest's network device after each reboot of the host, because of the MAC address change. To avoid this kind of problem,libvirt
introduced thehostdev
value, which sets up network-specific data before assigning the device.Specify the data you acquired in the previous step here.
In case a device is already attached to the host, it cannot be attached to a VM Guest. To make it available for guests, detach it from the host first:
>
sudo
virsh nodedev-detach pci_0000_04_10_0
Add the VF interface to an existing VM Guest:
>
sudo
virsh attach-device GUEST /tmp/vf-interface.xml --OPTION
GUEST needs to be replaced by the domain name, ID or UUID of the VM Guest. --OPTION can be one of the following:
--persistent
This option always adds the device to the domain's persistent XML. If the domain is running, the device is hotplugged.
--config
This option affects the persistent XML only, even if the domain is running. The device appears in the VM Guest on next boot.
--live
This option affects a running domain only. If the domain is inactive, the operation fails. The device is not persisted in the XML and becomes available in the VM Guest on next boot.
--current
This option affects the current state of the domain. If the domain is inactive, the device is added to the persistent XML and becomes available on next boot. If the domain is active, the device is hotplugged but not added to the persistent XML.
To detach a VF interface, use the
virsh detach-device
command, which also takes the options listed above.
15.9.4 Dynamic allocation of VFs from a pool #
If you define the PCI address of a VF into a VM Guest's configuration statically as described in Section 15.9.3, “Adding a VF network device to a VM Guest”, it is hard to migrate such guest to another host. The host must have identical hardware in the same location on the PCI bus, or the VM Guest configuration must be modified before each start.
Another approach is to create a libvirt
network with a device pool
that contains all the VFs of an SR-IOV device.
The VM Guest then references this network, and each time it is
started, a single VF is dynamically allocated to it. When the VM Guest
is stopped, the VF is returned to the pool, available for another
guest.
15.9.4.1 Defining network with pool of VFs on VM Host Server #
The following example of network definition creates a pool of all VFs
for the SR-IOV device with its physical
function (PF) at the network interface eth0
on the
host:
<network> <name>passthrough</name> <forward mode='hostdev' managed='yes'> <pf dev='eth0'/> </forward> </network>
To use this network on the host, save the above code to a file, for
example /tmp/passthrough.xml
, and execute the
following commands. Remember to replace eth0
with
the real network interface name of your SR-IOV
device's PF:
>
sudo
virsh net-define /tmp/passthrough.xml
>
sudo
virsh net-autostart passthrough
>
sudo
virsh net-start passthrough
15.9.4.2 Configuring VM Guests to use VF from the pool #
The following example of VM Guest device interface definition uses a
VF of the SR-IOV device from the pool created
in Section 15.9.4.1, “Defining network with pool of VFs on VM Host Server”. libvirt
automatically derives the list of all VFs associated with that PF the
first time the guest is started.
<interface type='network'> <source network='passthrough'> </interface>
After the first VM Guest starts that uses the network with the pool
of VFs, verify the list of associated VFs. Do so by running
virsh net-dumpxml passthrough
on the host.
<network connections='1'> <name>passthrough</name> <uuid>a6a26429-d483-d4ed-3465-4436ac786437</uuid> <forward mode='hostdev' managed='yes'> <pf dev='eth0'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x1'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x3'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x5'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x7'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x1'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x3'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x5'/> </forward> </network>
15.10 Listing attached devices #
Although there is no mechanism in virsh
to list all VM Host Server's devices
that have already been attached to its VM Guests, you can list all
devices attached to a specific VM Guest by running the following
command:
virsh dumpxml VMGUEST_NAME | xpath -e /domain/devices/hostdev
For example:
>
sudo
virsh dumpxml sles12 | -e xpath /domain/devices/hostdev Found 2 nodes: -- NODE -- <hostdev mode="subsystem" type="pci" managed="yes"> <driver name="xen" /> <source> <address domain="0x0000" bus="0x0a" slot="0x10" function="0x1" /> </source> <address type="pci" domain="0x0000" bus="0x00" slot="0x0a" function="0x0" /> </hostdev> -- NODE -- <hostdev mode="subsystem" type="pci" managed="yes"> <driver name="xen" /> <source> <address domain="0x0000" bus="0x0a" slot="0x10" function="0x2" /> </source> <address type="pci" domain="0x0000" bus="0x00" slot="0x0b" function="0x0" /> </hostdev>
<interface type='hostdev'>
For SR-IOV devices that are attached to the VM Host Server via
<interface type='hostdev'>
, you need to use a
different XPath query:
virsh dumpxml VMGUEST_NAME | xpath -e /domain/devices/interface/@type
15.11 Configuring storage devices #
Storage devices are defined within the disk
element. The usual
disk
element supports several attributes. The following two
attributes are the most important:
The
type
attribute describes the source of the virtual disk device. Valid values arefile
,block
,dir
,network
, orvolume
.The
device
attribute shows how the disk is exposed to the VM Guest OS. As an example, possible values can includefloppy
,disk
,cdrom
, and others.
The following child elements are the most important:
driver
contains the driver and the bus. These are used by the VM Guest to work with the new disk device.The
target
element contains the device name under which the new disk is shown in the VM Guest. It also contains the optional bus attribute, which defines the type of bus on which the new disk should operate.
The following procedure shows how to add storage devices to the VM Guest:
Edit the configuration for an existing VM Guest:
>
sudo
virsh edit sles15
Add a
disk
element inside thedevices
element together with the attributestype
anddevice
:<disk type='file' device='disk'>
Specify a
driver
element and use the default values:<driver name='qemu' type='qcow2'/>
Create a disk image as a source for the new virtual disk device:
>
sudo
qemu-img create -f qcow2 /var/lib/libvirt/images/sles15.qcow2 32G
Add the path for the disk source:
<source file='/var/lib/libvirt/images/sles15.qcow2'/>
Define the target device name in the VM Guest and the bus on which the disk should work:
<target dev='vda' bus='virtio'/>
Restart your VM:
>
sudo
virsh start sles15
Your new storage device should be available in the VM Guest OS.
15.12 Configuring controller devices #
libvirt
manages controllers automatically
based on the type of virtual devices used by the VM Guest. If the
VM Guest contains PCI and SCSI devices, PCI and SCSI controllers are
created and managed automatically. libvirt
also models
controllers that are hypervisor-specific, for example, a
virtio-serial
controller for KVM VM Guests or a
xenbus
controller for Xen VM Guests. Although the
default controllers and their configuration are generally fine, there may
be use cases where controllers or their attributes need to be adjusted
manually. For example, a virtio-serial controller may need more ports, or
a xenbus controller may need more memory or more virtual interrupts.
The xenbus controller is unique in that it serves as the controller for
all Xen paravirtual devices. If a VM Guest has many disk and/or network
devices, the controller may need more memory. Xen's
max_grant_frames
attribute sets how many grant frames,
or blocks of shared memory, are allocated to the
xenbus
controller for each VM Guest.
The default of 32 is enough in most circumstances, but a VM Guest with
multiple I/O devices and an I/O-intensive workload may experience
performance issues because of grant frame exhaustion. The
xen-diag
can check the current and maximum
max_grant_frames
values for dom0 and your VM Guests.
The VM Guests must be running:
>
sudo
virsh list Id Name State -------------------------------- 0 Domain-0 running 3 sle15sp1 running>
sudo
xen-diag gnttab_query_size 0 domid=0: nr_frames=1, max_nr_frames=256>
sudo
xen-diag gnttab_query_size 3 domid=3: nr_frames=3, max_nr_frames=32
The sle15sp1
guest is using only three frames out of
32. If you are seeing performance issues, and log entries that point to
insufficient frames, increase the value with virsh
. Look for the
<controller type='xenbus'>
line in the guest's
configuration file and add the maxGrantFrames
control
element:
>
sudo
virsh edit sle15sp1 <controller type='xenbus' index='0' maxGrantFrames='40'/>
Save your changes and restart the guest. Now it should show your change:
>
sudo
xen-diag gnttab_query_size 3 domid=3: nr_frames=3, max_nr_frames=40
Similar to maxGrantFrames, the xenbus controller also supports
maxEventChannels
. Event channels are like paravirtual
interrupts, and in conjunction with grant frames, form a data transfer
mechanism for paravirtual drivers. They are also used for inter-processor
interrupts. VM Guests with a large number of vCPUs and/or many
paravirtual devices may need to increase the maximum default value of
1023. maxEventChannels can be changed similarly to maxGrantFrames:
>
sudo
virsh edit sle15sp1 <controller type='xenbus' index='0' maxGrantFrames='128' maxEventChannels='2047'/>
See the Controllers section of the libvirt Domain XML format manual at https://libvirt.org/formatdomain.html#elementsControllers for more information.
15.13 Configuring video devices #
When using the Virtual Machine Manager, only the Video device model can be defined. The amount of allocated VRAM or 2D/3D acceleration can only be changed in the XML configuration.
15.13.1 Changing the amount of allocated VRAM #
Edit the configuration for an existing VM Guest:
>
sudo
virsh edit sles15
Change the size of the allocated VRAM:
<video> <model type='vga' vram='65535' heads='1'> ... </model> </video>
Check if the amount of VRAM in the VM has changed by looking at the amount in the Virtual Machine Manager.
15.13.2 Changing the state of 2D/3D acceleration #
Edit the configuration for an existing VM Guest:
>
sudo
virsh edit sles15
To enable/disable 2D/3D acceleration, change the value of
accel3d
andaccel2d
accordingly:<video> <model> <acceleration accel3d='yes' accel2d='no'> </model> </video>
Only virtio
and vbox
video
devices are capable of 2D/3D acceleration. You cannot enable it on
other video devices.
15.14 Configuring network devices #
This section describes how to configure specific aspects of virtual
network devices by using virsh
.
Find more details about libvirt
network interface specification in
https://libvirt.org/formatdomain.html#elementsDriverBackendOptions.
15.14.1 Scaling network performance with multiqueue virtio-net #
The multiqueue virtio-net feature scales the network performance by allowing the VM Guest's virtual CPUs to transfer packets in parallel. Refer to Section 35.3.3, “Scaling network performance with multiqueue virtio-net” for more general information.
To enable multiqueue virtio-net for a specific VM Guest, edit its XML configuration as described in Section 15.1, “Editing the VM configuration” and modify its network interface as follows:
<interface type='network'> [...] <model type='virtio'/> <driver name='vhost' queues='NUMBER_OF_QUEUES'/> </interface>
15.15 Using macvtap to share VM Host Server network interfaces #
Macvtap provides direct attachment of a VM Guest virtual interface to a host network interface. The macvtap-based interface extends the VM Host Server network interface and has its own MAC address on the same Ethernet segment. Typically, this is used to make both the VM Guest and the VM Host Server show up directly on the switch that the VM Host Server is connected to.
Macvtap cannot be used with network interfaces already connected to a Linux bridge. Before attempting to create the macvtap interface, remove the interface from the bridge.
When using macvtap, a VM Guest can communicate with other VM Guests, and with other external hosts on the network. But it cannot communicate with the VM Host Server on which the VM Guest runs. This is the defined behavior of macvtap, because of the way the VM Host Server's physical Ethernet is attached to the macvtap bridge. Traffic from the VM Guest into that bridge that is forwarded to the physical interface cannot be bounced back up to the VM Host Server's IP stack. Similarly, traffic from the VM Host Server's IP stack that is sent to the physical interface cannot be bounced back up to the macvtap bridge for forwarding to the VM Guest.
Virtual network interfaces based on macvtap are supported by libvirt by
specifying an interface type of direct
. For example:
<interface type='direct'> <mac address='aa:bb:cc:dd:ee:ff'/> <source dev='eth0' mode='bridge'/> <model type='virtio'/> </interface>
The operation mode of the macvtap device can be controlled with the
mode
attribute. The following list shows its possible
values and a description for each:
vepa
: all VM Guest packets are sent to an external bridge. Packets whose destination is a VM Guest on the same VM Host Server as where the packet originates from are sent back to the VM Host Server by the VEPA capable bridge (today's bridges are typically not VEPA capable).bridge
: packets whose destination is on the same VM Host Server as where they originate from are directly delivered to the target macvtap device. Both origin and destination devices need to be inbridge
mode for direct delivery. If either of them is invepa
mode, a VEPA capable bridge is required.private
: all packets are sent to the external bridge and delivered to a target VM Guest on the same VM Host Server if they are sent through an external router or gateway and that device sends them back to the VM Host Server. This procedure is followed if either the source or destination device is in private mode.passthrough
: a special mode that gives more power to the network interface. All packets are forwarded to the interface, allowing virtio VM Guests to change the MAC address or set promiscuous mode to bridge the interface or create VLAN interfaces on top of it. A network interface is not shareable inpassthrough
mode. Assigning an interface to a VM Guest disconnects it from the VM Host Server. For this reason SR-IOV virtual functions are often assigned to the VM Guest inpassthrough
mode.
15.16 Disabling a memory balloon device #
Memory Balloon has become a default option for KVM. The device is added to
the VM Guest explicitly, so you do not need to add this element in the
VM Guest's XML configuration. To disable Memory Balloon in the VM Guest
for any reason, set model='none'
as shown below:
<devices> <memballoon model='none'/> </device>
15.17 Configuring multiple monitors (dual head) #
libvirt
supports a dual head configuration to display the video output
of the VM Guest on multiple monitors.
The Xen hypervisor does not support dual head configuration.
While the virtual machine is running, verify that the xf86-video-qxl package is installed in the VM Guest:
>
rpm -q xf86-video-qxlShut down the VM Guest and start editing its configuration XML as described in Section 15.1, “Editing the VM configuration”.
Verify that the model of the virtual graphics card is “qxl”:
<video> <model type='qxl' ... />
Increase the
heads
parameter in the graphics card model specification from the default1
to2
, for example:<video> <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='2' primary='yes'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </video>
Configure the virtual machine to use the Spice display instead of VNC:
<graphics type='spice' port='5916' autoport='yes' listen='0.0.0.0'> <listen type='address' address='0.0.0.0'/> </graphics>
Start the virtual machine and connect to its display with
virt-viewer
, for example:>
virt-viewer --connect qemu+ssh://USER@VM_HOST/systemFrom the list of VMs, select the one whose configuration you have modified and confirm with
.After the graphical subsystem (Xorg) loads in the VM Guest, select
› › to open a new window with the second monitor's output.
15.18 Crypto adapter pass-through to KVM guests on IBM Z #
15.18.1 Introduction #
IBM Z machines include cryptographic hardware with useful functions such as random number generation, digital signature generation, or encryption. KVM allows dedicating these crypto adapters to guests as pass-through devices. The means that the hypervisor cannot observe communications between the guest and the device.
15.18.2 What is covered #
This section describes how to dedicate a crypto adapter and domains on an IBM Z host to a KVM guest. The procedure includes the following basic steps:
Mask the crypto adapter and domains from the default driver on the host.
Load the
vfio-ap
driver.Assign the crypto adapter and domains to the
vfio-ap
driver.Configure the guest to use the crypto adapter.
15.18.3 Requirements #
You need to have the QEMU /
libvirt
virtualization environment correctly installed and functional.The
vfio_ap
andvfio_mdev
modules for the running kernel need to be available on the host operating system.
15.18.4 Dedicate a crypto adapter to a KVM host #
Verify that the
vfio_ap
andvfio_mdev
kernel modules are loaded on the host:>
lsmod | grep vfio_If any of them is not listed, load it manually, for example:
>
sudo
modprobe vfio_mdevCreate a new MDEV device on the host and verify that it was added:
uuid=$(uuidgen) $ echo ${uuid} | sudo tee /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/create dmesg | tail [...] [272197.818811] iommu: Adding device 24f952b3-03d1-4df2-9967-0d5f7d63d5f2 to group 0 [272197.818815] vfio_mdev 24f952b3-03d1-4df2-9967-0d5f7d63d5f2: MDEV: group_id = 0
Identify the device on the host's logical partition that you intend to dedicate to a KVM guest:
>
ls -l /sys/bus/ap/devices/ [...] lrwxrwxrwx 1 root root 0 Nov 23 03:29 00.0016 -> ../../../devices/ap/card00/00.0016/ lrwxrwxrwx 1 root root 0 Nov 23 03:29 card00 -> ../../../devices/ap/card00/In this example, it is card
0
queue16
. To match the Hardware Management Console (HMC) configuration, you need to convert from16
hexadecimal to22
decimal.Mask the adapter from the
zcrypt
use:>
lszcrypt CARD.DOMAIN TYPE MODE STATUS REQUEST_CNT ------------------------------------------------- 00 CEX5C CCA-Coproc online 5 00.0016 CEX5C CCA-Coproc online 5Mask the adapter:
>
cat /sys/bus/ap/apmask 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff echo -0x0 | sudo tee /sys/bus/ap/apmask 0x7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffMask the domain:
>
cat /sys/bus/ap/aqmask 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff echo -0x0 | sudo tee /sys/bus/ap/aqmask 0xfffffdffffffffffffffffffffffffffffffffffffffffffffffffffffffffffAssign adapter 0 and domain 16 (22 decimal) to
vfio-ap
:>
sudo
echo +0x0 > /sys/devices/vfio_ap/matrix/${uuid}/assign_adapter>
echo +0x16 | sudo tee /sys/devices/vfio_ap/matrix/${uuid}/assign_domain>
echo +0x16 | sudo tee /sys/devices/vfio_ap/matrix/${uuid}/assign_control_domainVerify the matrix that you have configured:
>
cat /sys/devices/vfio_ap/matrix/${uuid}/matrix 00.0016Either create a new VM (refer to Chapter 10, Guest installation) and wait until it is initialized, or use an existing VM. In both cases, make sure the VM is shut down.
Change its configuration to use the MDEV device:
>
sudo
virsh edit VM_NAME [...] <hostdev mode='subsystem' type='mdev' model='vfio-ap'> <source> <address uuid='24f952b3-03d1-4df2-9967-0d5f7d63d5f2'/> </source> </hostdev> [...]Restart the VM:
>
sudo
virsh reboot VM_NAMELog in to the guest and verify that the adapter is present:
>
lszcrypt CARD.DOMAIN TYPE MODE STATUS REQUEST_CNT ------------------------------------------------- 00 CEX5C CCA-Coproc online 1 00.0016 CEX5C CCA-Coproc online 1
15.18.5 Further reading #
The installation of virtualization components is detailed in Chapter 6, Installation of virtualization components.
The
vfio_ap
architecture is detailed in https://www.kernel.org/doc/Documentation/s390/vfio-ap.txt.A general outline together with a detailed procedure is described in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1787405.
The architecture of VFIO Mediated devices (MDEVs) is detailed in https://www.kernel.org/doc/html/latest/driver-api/vfio-mediated-device.html.