Jump to content
documentation.suse.com / NVIDIA Virtual GPU for KVM Guests
SUSE Linux Enterprise Server 15 SP5

NVIDIA Virtual GPU for KVM Guests

Publication Date: December 13, 2024

1 Introduction

NVIDIA virtual GPU (vGPU) is a graphics virtualization solution that provides multiple virtual machines (VMs) simultaneous access to one physical Graphics Processing Unit (GPU) on the VM Host Server. This article refers to the Volta and Ampere GPU architecture.

2 Configuring vGPU manager in VM Host Server

2.1 Prepare VM Host Server environment

  1. Verify that you have a compatible server and GPU cards. Check specifications for details:

  2. Verify that VM Host Server is SUSE Linux Enterprise Server 15 SP3 or newer:

    > cat /etc/issue
    Welcome to SUSE Linux Enterprise Server 15 SP3  (x86_64) - Kernel \r (\l).
  3. Get the vGPU drivers from NVIDIA. In order to get the software, please follow the steps at https://docs.nvidia.com/grid/latest/grid-software-quick-start-guide/index.html#redeeming-pak-and-downloading-grid-software. For example, for vGPU 13.0 installation, you will need the following files:

    NVIDIA-Linux-x86_64-470.63-vgpu-kvm.run  # vGPU manager for the VM host
    NVIDIA-Linux-x86_64-470.63.01-grid.run   # vGPU driver for the VM guest
  4. If you are using Ampere architecture GPU cards, verify that VM Host Server supports VT-D/IOMMU and SR-IOV technologies, and that they are enabled in BIOS.

  5. Enable IOMMU. Verify that it is included in the boot command line:

    cat /proc/cmdline
    BOOT_IMAGE=/boot/vmlinuz-default [...] intel_iommu=on [...]

    If not, add the following line to /etc/default/grub.

    • For Intel CPUs:

      GRUB_CMDLINE_LINUX="intel_iommu=on"

      For AMD CPUs:

      GRUB_CMDLINE_LINUX="amd_iommu=on"

    Then generate new GRUB 2 configuration file and reboot:

    > sudo grub2-mkconfig -o /boot/grub2/grub.cfg
    > sudo systemctl reboot
    Tip
    Tip

    You can verify that IOMMU is loaded by running the following command:

    sudo dmesg | grep -e IOMMU
  6. Enable SR-IOV. Refer to https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#vgpu-types-tesla-v100-pcie for useful information.

  7. Disable the nouveau kernel module by adding the following line it to the top of the /etc/modprobe.d/50-blacklist.conf file:

    blacklist nouveau

2.2 Install the NVIDIA KVM driver

  1. Exit from the graphical mode:

    > sudo init 3
  2. Install kernel-default-devel and gcc packages and their dependencies:

    > sudo zypper in kernel-default-devel gcc
  3. Download the vGPU software from the NVIDIA portal. Make the NVIDIA vGPU driver executable and run it:

    > chmod +x NVIDIA-Linux-x86_64-450.55-vgpu-kvm.run
    > sudo ./NVIDIA-Linux-x86_64-450.55-vgpu-kvm.run

    You can find detailed information about the installation process in the log file /var/log/nvidia-installer.log

    Tip
    Tip

    To enable dynamic kernel-module support, and thus have the module rebuilt automatically when a new kernel is installed, add the --dkms option:

    > sudo ./NVIDIA-Linux-x86_64-450.55-vgpu-kvm.run --dkms
  4. When the driver installation is finished, reboot the system:

    > sudo systemctl reboot

2.3 Verify the driver installation

  1. Verify loaded kernel modules:

    > lsmod | grep nvidia
    nvidia_vgpu_vfio       49152  9
    nvidia              14393344  229 nvidia_vgpu_vfio
    mdev                   20480  2 vfio_mdev,nvidia_vgpu_vfio
    vfio                   32768  6 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1

    The modules containing the vfio string are required dependencies.

  2. Print the GPU device status with the nvidia-smi command. The output should be similar to the following one:

    > nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.63       Driver Version: 470.63       CUDA Version: N/A      |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA A40          Off  | 00000000:31:00.0 Off |                    0 |
    |  0%   46C    P0    39W / 300W |      0MiB / 45634MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
  3. Check the sysfs file system. For Volta and earlier GPU cards, new directory mdev_supported_types is added, for example:

    cd /sys/bus/pci/devices/00000000\:31\:00.0/mdev_supported_types

    For Ampere GPU cards, the directory will be created automatically for each virtual function after SR-IOV is enabled.

3 Creating a vGPU device

3.1 Create a legacy vGPU device without support for SR-IOV

All the NVIDIA Volta and earlier architecture GPUs work in this mode.

  1. Obtain the Bus/Device/Function (BDF) numbers of the host GPU device:

    > lspci | grep NVIDIA
    84:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
  2. Check for the mdev supported devices and detailed information:

    > ls /sys/bus/pci/devices/0000:84:00.0/mdev_supported_types/
    nvidia-105  nvidia-106  nvidia-107  nvidia-108  nvidia-109  nvidia-110 [...]

    The map of vGPU mdev devices and their type is as follows:

    • nvidia-105 to nvidia-109: 1Q 2Q 4Q 8Q 16Q

    • nvidia-110 to nvidia-114: 1A 2A 4A 8A 16A

    • nvidia-115, nvidia-163, nvidia-217, nvidia-247: 1B 2B 2B4 1B4

    • nvidia-299 to nvidia-301: 4C 8C 16C

    Refer to https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#vgpu-types-tesla-v100-pcie for more details.

  3. Inspect a vGPU device:

    > cd /sys/bus/pci/devices/0000:03:00.0/mdev_supported_types/
    > ls nvidia-105
    > cat nvidia-105/description
    num_heads=2, frl_config=60, framebuffer=1024M, max_resolution=4096x2160, max_instance=16
    > cat nvidia-105/name
    GRID V100-1Q
  4. Generate a unique ID and create an mdev device based on it:

          > uuidgen
          4f3b6e47-0baa-4900-b0b1-284c1ecc192f
          > sudo echo "4f3b6e47-0baa-4900-b0b1-284c1ecc192f" > nvidia-105/create
  5. Verify the new mdev device. You can inspect the content of the /sys/bus/mdev/devices directory:

    > cd /sys/bus/mdev/devices
    > ls -l
    lrwxrwxrwx 1 root root 0 Aug 30 23:03 86380ffb-8f13-4685-9c48-0e0f4e65fb87 \
     -> ../../../devices/pci0000:80/0000:80:02.0/0000:84:00.0/86380ffb-8f13-4685-9c48-0e0f4e65fb87
    lrwxrwxrwx 1 root root 0 Aug 30 23:03 86380ffb-8f13-4685-9c48-0e0f4e65fb88 \
     -> ../../../devices/pci0000:80/0000:80:02.0/0000:84:00.0/86380ffb-8f13-4685-9c48-0e0f4e65fb88
    lrwxrwxrwx 1 root root 0 Aug 30 23:03 86380ffb-8f13-4685-9c48-0e0f4e65fb89 \
     -> ../../../devices/pci0000:80/0000:80:02.0/0000:84:00.0/86380ffb-8f13-4685-9c48-0e0f4e65fb89
    lrwxrwxrwx 1 root root 0 Aug 30 23:03 86380ffb-8f13-4685-9c48-0e0f4e65fb90 \
     -> ../../../devices/pci0000:80/0000:80:02.0/0000:84:00.0/86380ffb-8f13-4685-9c48-0e0f4e65fb90

    Or you can use the mdevctl command:

    > sudo mdevctl list
    86380ffb-8f13-4685-9c48-0e0f4e65fb90 0000:84:00.0 nvidia-299
    86380ffb-8f13-4685-9c48-0e0f4e65fb89 0000:84:00.0 nvidia-299
    86380ffb-8f13-4685-9c48-0e0f4e65fb87 0000:84:00.0 nvidia-299
    86380ffb-8f13-4685-9c48-0e0f4e65fb88 0000:84:00.0 nvidia-299
  6. Query the new vGPU device capability:

    > sudo nvidia-smi vgpu -q
    GPU 00000000:84:00.0
    Active vGPUs                      : 1
    vGPU ID                           : 3251634323
       VM UUID                       : ee7b7a4b-388a-4357-a425-5318b2c65b3f
       VM Name                       : sle15sp3
       vGPU Name                     : GRID V100-4C
       vGPU Type                     : 299
       vGPU UUID                     : d471c7f2-0a53-11ec-afd3-38b06df18e37
       MDEV UUID                     : 86380ffb-8f13-4685-9c48-0e0f4e65fb87
       Guest Driver Version          : 460.91.03
       License Status                : Licensed
       GPU Instance ID               : N/A
       Accounting Mode               : Disabled
       ECC Mode                      : N/A
       Accounting Buffer Size        : 4000
       Frame Rate Limit              : N/A
       FB Memory Usage
           Total                     : 4096 MiB
           Used                      : 161 MiB
           Free                      : 3935 MiB
       Utilization
           Gpu                       : 0 %
           Memory                    : 0 %
           Encoder                   : 0 %
           Decoder                   : 0 %
       Encoder Stats
           Active Sessions           : 0
           Average FPS               : 0
           Average Latency           : 0
       FBC Stats
           Active Sessions           : 0
           Average FPS               : 0
           Average Latency           : 0

3.2 Create a vGPU device with support for SR-IOV

All NVIDIA Ampere and newer architecture GPUs work in this mode.

  1. Obtain the Bus/Device/Function (BDF) numbers of the host GPU device:

    > lspci | grep NVIDIA
    b1:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1)
  2. Enable virtual functions:

    > sudo /usr/lib/nvidia/sriov-manage -e 00:b1:0000.0
    Note
    Note

    This configuration is not persistent and must be re-enabled after the host reboot.

  3. Obtain the Bus/Domain/Function (BDF) of virtual functions on the GPU:

    > ls -l /sys/bus/pci/devices/0000:b1:00.0/ | grep virtfn
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn0 -> ../0000:b1:00.4
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn1 -> ../0000:b1:00.5
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn10 -> ../0000:b1:01.6
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn11 -> ../0000:b1:01.7
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn12 -> ../0000:b1:02.0
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn13 -> ../0000:b1:02.1
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn14 -> ../0000:b1:02.2
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn15 -> ../0000:b1:02.3
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn2 -> ../0000:b1:00.6
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn3 -> ../0000:b1:00.7
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn4 -> ../0000:b1:01.0
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn5 -> ../0000:b1:01.1
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn6 -> ../0000:b1:01.2
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn7 -> ../0000:b1:01.3
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn8 -> ../0000:b1:01.4
    lrwxrwxrwx 1 root root           0 Sep 21 11:58 virtfn9 -> ../0000:b1:01.5
  4. Create a vGPU device. Select the virtual function (VF) that you want to use to create the vGPU device and assign it a unique ID.

    Important
    Important

    Each VF can only create one vGPU instance. If you want to create more vGPU instances, you need to use a different VF.

    > cd /sys/bus/pci/devices/0000:b1:00.0/virtfn1/mdev_supported_types
    > for i in *; do echo "$i" $(cat $i/name) available: $(cat $i/avail*); done
    nvidia-468 GRID A100-4C available: 0
    nvidia-469 GRID A100-5C available: 0
    nvidia-470 GRID A100-8C available: 0
    nvidia-471 GRID A100-10C available: 1
    nvidia-472 GRID A100-20C available: 0
    nvidia-473 GRID A100-40C available: 0
    nvidia-474 GRID A100-1-5C available: 0
    nvidia-475 GRID A100-2-10C available: 0
    nvidia-476 GRID A100-3-20C available: 0
    nvidia-477 GRID A100-4-20C available: 0
    nvidia-478 GRID A100-7-40C available: 0
    nvidia-479 GRID A100-1-5CME available: 0
    > uuidgen
    f715f63c-0d00-4007-9c5a-b07b0c6c05de
    > sudo echo "f715f63c-0d00-4007-9c5a-b07b0c6c05de" > nvidia-471/create
    > sudo dmesg | tail
    [...]
    [ 3218.491843] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: Adding to iommu group 322
    [ 3218.499700] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: MDEV: group_id = 322
    [ 3599.608540] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: Removing from iommu group 322
    [ 3599.616753] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: MDEV: detaching iommu
    [ 3626.345530] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: Adding to iommu group 322
    [ 3626.353383] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: MDEV: group_id = 322
  5. Verify the new vGPU device:

    > cd /sys/bus/mdev/devices/
    > ls
    f715f63c-0d00-4007-9c5a-b07b0c6c05de
  6. Query the new vGPU device capability:

    > sudo nvidia-smi vgpu -q
    GPU 00000000:B1:00.0
    Active vGPUs                      : 1
    vGPU ID                           : 3251634265
      VM UUID                       : b0d9f0c6-a6c2-463e-967b-06cb206415b6
      VM Name                       : sles15sp2-gehc-vm1
      vGPU Name                     : GRID A100-10C
      vGPU Type                     : 471
      vGPU UUID                     : 444f610c-1b08-11ec-9554-ebd10788ee14
      MDEV UUID                     : f715f63c-0d00-4007-9c5a-b07b0c6c05de
      Guest Driver Version          : N/A
      License Status                : N/A
      GPU Instance ID               : N/A
      Accounting Mode               : N/A
      ECC Mode                      : Disabled
      Accounting Buffer Size        : 4000
      Frame Rate Limit              : N/A
      FB Memory Usage
          Total                     : 10240 MiB
          Used                      : 0 MiB
          Free                      : 10240 MiB
      Utilization
          Gpu                       : 0 %
          Memory                    : 0 %
          Encoder                   : 0 %
          Decoder                   : 0 %
      Encoder Stats
          Active Sessions           : 0
          Average FPS               : 0
          Average Latency           : 0
      FBC Stats
          Active Sessions           : 0
          Average FPS               : 0
          Average Latency           : 0

3.3 Creating a MIG-backed vGPU

Important
Important

SR-IOV is required to be enabled if you want to create vGPUs and assign them to guest VMs.

  1. Enable MIG mode for a GPU:

    > sudo nvidia-smi -i 0 -mig 1
    Enabled MIG Mode for GPU 00000000:B1:00.0
    All done.
  2. Query the GPU instance profile:

    > sudo nvidia-smi mig -lgip
    +-----------------------------------------------------------------------------+
    | GPU instance profiles:                                                      |
    | GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
    |                              Free/Total   GiB              CE    JPEG  OFA  |
    |=============================================================================|
    |   0  MIG 1g.5gb        19     7/7        4.75       No     14     0     0   |
    |                                                             1     0     0   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 1g.5gb+me     20     1/1        4.75       No     14     1     0   |
    |                                                             1     1     1   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 2g.10gb       14     3/3        9.75       No     28     1     0   |
    |                                                             2     0     0   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 3g.20gb        9     2/2        19.62      No     42     2     0   |
    |                                                             3     0     0   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 4g.20gb        5     1/1        19.62      No     56     2     0   |
    |                                                             4     0     0   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 7g.40gb        0     1/1        39.50      No     98     5     0   |
    |                                                             7     1     1   |
    +-----------------------------------------------------------------------------+
  3. Create a GPU instance specifying '5' as a GPU profile instance ID and optionally create a Compute Instance on it, either on the host server or within the guest:

    > sudo nvidia-smi mig -cgi 5
    Successfully created GPU instance ID  1 on GPU  0 using profile MIG 4g.20gb (ID  5)
    > sudo nvidia-smi mig -cci -gi 1
    Successfully created compute instance ID  0 on GPU  0 GPU instance ID  1 using profile MIG 4g.20gb (ID  3)
  4. Verify the GPU instance:

    > sudo nvidia-smi
    Tue Sep 21 11:19:36 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.63       Driver Version: 470.63       CUDA Version: N/A      |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA A100-PCI...  On   | 00000000:B1:00.0 Off |                   On |
    | N/A   38C    P0    38W / 250W |      0MiB / 40536MiB |     N/A      Default |
    |                               |                      |              Enabled |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | MIG devices:                                                                |
    +------------------+----------------------+-----------+-----------------------+
    | GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
    |      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
    |                  |                      |        ECC|                       |
    |==================+======================+===========+=======================|
    |  0    1   0   0  |      0MiB / 20096MiB | 56      0 |  4   0    2    0    0 |
    |                  |      0MiB / 32767MiB |           |                       |
    +------------------+----------------------+-----------+-----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
  5. Use the MIG instance. You can use the instance directly with the UUID—for example, assign it to a container or CUDA process.

    You can also create a vGPU on top of it and assign it to a VM guest. The procedure is the same as for the vGPU with SR-IOV support. Refer to Section 3.2, “Create a vGPU device with support for SR-IOV”.

    > sudo nvidia-smi -L
    GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-ee14e29d-dd5b-2e8e-eeaf-9d3debd10788)
     MIG 4g.20gb     Device  0: (UUID: MIG-fed03f85-fd95-581b-837f-d582496d0260)

4 Assign the vGPU device to a VM Guest

4.1 Assign by libvirt

  1. Create a libvirt-based virtual machine (VM) with UEFI support and a normal VGA display.

  2. Edit the VM's configuration by running virsh edit VM-NAME.

  3. Add the new mdev device with the unique ID you used when creating the vGPU device to the <devices/> section.

    Note
    Note

    If you are using Q-series, use display='on' instead.

    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
      <source>
        <address uuid='4f3b6e47-0baa-4900-b0b1-284c1ecc192f'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </hostdev>

4.2 Assign by QEMU

Add the following device to the QEMU command line. Use the unique ID that you used when creating the vGPU device:

-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/4f3b6e47-0baa-4900-b0b1-284c1ecc192f

5 Configuring vGPU in VM Guest

5.1 Prepare the VM Guest

  • During VM Guest installation, disable secure boot, enable the SSH service, and select wicked for networking.

  • Disable the nouveau video driver. Edit the file /etc/modprobe.d/50-blacklist.conf and add the following line to its upper section:

    blacklist nouveau
    Important
    Important

    Disabling nouveau will work after you re-generate the initrd image with dracut, and then reboot the VM Guest.

5.2 Install the vGPU driver in the VM Guest

  1. Install the following packages and their dependencies:

    > sudo zypper install kernel-default-devel libglvnd-devel
  2. Download the vGPU software from the NVIDIA portal. Make the NVIDIA vGPU driver executable and run it:

          > chmod +x NVIDIA-Linux-x86_64-470.63.01-grid.run
          > sudo ./NVIDIA-Linux-x86_64-470.63.01-grid.run
    Tip
    Tip

    To enable dynamic kernel module support in order to get the module rebuilt automatically when a new kernel is installed, add the --dkms option:

    > sudo ./NVIDIA-Linux-x86_64-470.63.01-grid.run --dkms
  3. During driver installation, select to run the nvidia-xconfig utility.

  4. Verify the driver installation by checking the output of the nvidia-smi command:

    > sudo nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GRID A100-10C       On   | 00000000:07:00.0 Off |                    0 |
    | N/A   N/A    P0    N/A /  N/A |    930MiB / 10235MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+

6 Licensing vGPU in the VM Guest

  1. Create the configuration file /etc/nvidia/gridd.conf based on /etc/nvidia/gridd.conf.template.

    1. For licenses that are served from the NVIDIA License System, update the following options:

      FeatureType

      For GPU passthrough, set FeatureType to 4 for computing and 2 for graphic purposes. In case of a virtual GPU, whatever vGPU type is created via mdev determines the feature set that is enabled in VM Guest.

      ClientConfigTokenPath

      Optional: If you want to store the client configuration token in a custom location, add the ClientConfigTokenPath configuration parameter on a new line as ClientConfigTokenPath="PATH_TO_TOKEN". By default, the client searches for the client configuration token in the /etc/nvidia/ClientConfigToken/ directory.

      Copy the client configuration token to the directory in which you want to store it.

    2. For licenses that are served from the legacy NVIDIA vGPU software license server, update the following options:

      ServerAddress

      Add your license server IP address.

      ServerPort

      Use the default "7070" or the port configured during the server setup.

      FeatureType

      For GPU passthrough, set FeatureType to 4 for computing and 2 for graphic purposes. In case of a virtual GPU, whatever vGPU type is created via mdev determines the feature set that is enabled in VM Guest.

  2. Restart the nvidia-gridd service:

    > sudo systemctl restart nvidia-gridd.service
  3. Inspect the log file for possible errors:

    > sudo grep gridd /var/log/messages
    [...]
    Aug 5 15:40:06 localhost nvidia-gridd: Started (4293)
    Aug 5 15:40:24 localhost nvidia-gridd: License acquired successfully.

7 Configuring a graphics mode

7.1 Create or update the /etc/X11/xorg.conf file

  1. If there is no /etc/X11/xorg.conf on the VM Guest, run the nvidia-xconfig utility.

  2. Query the GPU device for detailed information:

    > nvidia-xconfig --query-gpu-info
    Number of GPUs: 1
    
    GPU #0:
    Name      : GRID V100-16Q
    UUID      : GPU-089f39ad-01cb-11ec-89dc-da10f5778138
    PCI BusID : PCI:0:10:0
    
    Number of Display Devices: 0
  3. Add GPU's BusID to /etc/X11/xorg.conf, for example:

    Section "Device"
    Identifier "Device0"
    Driver "nvidia"
    BusID "PCI:0:10:0"
    VendorName "NVIDIA Corporation"
    EndSection

7.2 Verify the graphics mode

Verify the following:

  • A graphic desktop is booted correctly.

  • The 'X' process of a running X-server is running in GPU:

    > nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GRID V100-4C        On   | 00000000:00:0A.0 Off |                  N/A |
    | N/A   N/A    P0    N/A /  N/A |    468MiB /  4096MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A      1921      G   /usr/bin/X                         76MiB |
    |    0   N/A  N/A      1957      G   /usr/bin/gnome-shell               87MiB |
    +-----------------------------------------------------------------------------+

7.3 Remote display

You need to install and configure the VNC server package x11vnc inside the VM Guest, and start it with the following command:

> sudo x11vnc -display :0 -auth /run/user/1000/gdm/Xauthority -forever -shared -ncache -bg -usepw -geometry 1900x1080

You can use virt-manager or virt-viewer to display the graphical output of a VM Guest.

Important
Important

For a libvirt-based VM Guest, verify that its XML configuration includes display=on as suggested in Section 4.1, “Assign by libvirt.

8 Configuring compute mode

  1. Download and install the CUDA toolkit. You can find it at https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=SLES&target_version=15&target_type=runfile_local.

  2. Download CUDA samples from https://github.com/nvidia/cuda-samples.

  3. Run CUDA sampling example:

    > cd YOUR_GIT_CLONE_LOCATION/cuda-samples/Samples/0_Introduction/clock
    >  make
    /usr/local/cuda/bin/nvcc -ccbin g++ -I../../common/inc  -m64    --threads 0 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode
    [...]
    mkdir -p ../../bin/x86_64/linux/release
    cp clock ../../bin/x86_64/linux/release
    > ./clock
    CUDA Clock sample
    GPU Device 0: "Volta" with compute capability 7.0
    Average clocks/block = 2820.718750

9 Additional tasks

This section introduces additional procedures that may be helpful after you have configured your vGPU.

9.1 Disabling Frame Rate Limiter

Frame Rate Limiter (FRL) is enabled by default. It limits the vGPU to a fixed frame rate, for example 60fps. If you experience a bad graphic display, you may need to disable FRL, for example:

> sudo echo "frame_rate_limiter=0" > /sys/bus/mdev/devices/86380ffb-8f13-4685-9c48-0e0f4e65fb87/nvidia/vgpu_params

9.2 Enabling/Disabling Error Correcting Code (ECC)

Since the NVIDIA Pascal architecture, NVIDIA GPU Cards support ECC memory to improve data integrity. ECC is also supported by software since NVIDIA vGPU 9.0.

To enable ECC:

> sudo nvidia-smi –e 1
> nvidia-smi -q
Ecc Mode
   Current                           : Enabled
   Pending                           : Enabled

To disable ECC:

> sudo nvidia-smi –e 0

9.3 Black screen in Virt-manager

If you see only a black screen in Virt-manager, press AltCtrl2 from Virt-manager viewer. You should be able to get in the display again.

9.4 Black screen in VNC client when using a non-QEMU VNC server

Use the xvnc server.

9.5 Kernel panic occurs because the Nouveau and NVIDIA drivers compete on GPU resources

The boot messages will look as follows:

[ 16.742439] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 16.742441] RIP: 0010:__pci_enable_msi_range+0x3a9/0x3f0
[ 16.742443] Code: 76 60 49 8d 56 50 48 89 df e8 73 f6 fc ff e9 3b fe ff ff 31 f6 48 89 df e8 64 73 fd ff e9 d6 fe ff ff 44 89 fd e9 1a ff ff ff <0f> 0b bd ea ff ff ff e9 0e ff ff ff bd ea ff ff ff e9 04 ff f
f ff
[ 16.742444] RSP: 0018:ffffb04bc052fb28 EFLAGS: 00010202
[ 16.742445] RAX: 0000000000000010 RBX: ffff9e93a85bc000 RCX: 0000000000000001
[ 16.742457] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e93a85bc000
[ 16.742458] RBP: ffff9e93a2550800 R08: 0000000000000002 R09: ffffb04bc052fb1c
[ 16.742459] R10: 0000000000000050 R11: 0000000000000020 R12: ffff9e93a2550800
[ 16.742459] R13: 0000000000000001 R14: ffff9e93a2550ac8 R15: 0000000000000001
[ 16.742460] FS: 00007f9f26889740(0000) GS:ffff9e93bfdc0000(0000) knlGS:0000000000000000
[ 16.742461] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.742462] CR2: 00000000008aeb90 CR3: 0000000286470003 CR4: 0000000000170ee0
[ 16.742465] Call Trace:
[ 16.742503] ? __pci_find_next_cap_ttl+0x93/0xd0
[ 16.742505] pci_enable_msi+0x16/0x30
[ 16.743039] nv_init_msi+0x1a/0xf0 [nvidia]
[ 16.743154] nv_open_device+0x81b/0x890 [nvidia]
[ 16.743248] nvidia_open+0x2f7/0x4d0 [nvidia]
[ 16.743256] ? kobj_lookup+0x113/0x160
[ 16.743354] nvidia_frontend_open+0x53/0x90 [nvidia]
[ 16.743361] chrdev_open+0xc4/0x1a0
[ 16.743370] ? cdev_put.part.2+0x20/0x20
[ 16.743374] do_dentry_open+0x204/0x3a0
[ 16.743378] path_openat+0x2fc/0x1520
[ 16.743382] ? unlazy_walk+0x32/0xa0
[ 16.743383] ? terminate_walk+0x8c/0x100
[ 16.743385] do_filp_open+0x9b/0x110
[ 16.743387] ? chown_common+0xf7/0x1c0
[ 16.743390] ? kmem_cache_alloc+0x18a/0x270
[ 16.743392] ? do_sys_open+0x1bd/0x260
[ 16.743394] do_sys_open+0x1bd/0x260
[ 16.743400] do_syscall_64+0x5b/0x1e0
[ 16.743409] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 16.743418] RIP: 0033:0x7f9f2593961d
[ 16.743420] Code: f0 25 00 00 41 00 3d 00 00 41 00 74 48 64 8b 04 25 18 00 00 00 85 c0 75 64 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 97 00 00 00 48 8b 4c 24 28 64 48 33 0
c 25
[ 16.743420] RSP: 002b:00007ffcfa214930 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 16.743422] RAX: ffffffffffffffda RBX: 00007ffcfa214c30 RCX: 00007f9f2593961d
[ 16.743422] RDX: 0000000000080002 RSI: 00007ffcfa2149b0 RDI: 00000000ffffff9c
[ 16.743423] RBP: 00007ffcfa2149b0 R08: 0000000000000000 R09: 0000000000000000
[ 16.743424] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 16.743424] R13: 00007ffcfa214abc R14: 0000000000925ae0 R15: 0000000000000000
[ 16.743426] ---[ end trace 8bf4d15315659a3e ]---
[ 16.743431] NVRM: GPU 0000:00:0a.0: Failed to enable MSI; falling back to PCIe virtual-wire interrupts.

Make sure to run mkintrd and reboot after disabling the Nouveau driver. Refer to Section 5.1, “Prepare the VM Guest”.

9.6 Filing an NVIDIA vGPU bug

While filing an NVIDIA vGPU-related bug report to us, please attach the vGPU configuration data nvidia-bug-report.log.gz collected by the nvidia-bug-report.sh utility. Make sure you cover both VM Host Server and VM Guest.

9.7 Configuring a License Server

Refer to https://docs.nvidia.com/grid/ls/latest/grid-license-server-user-guide/index.html.

10 For more information

NVIDIA has an extensive documentation on vGPU. Refer to https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html for details.

11 NVIDIA virtual GPU background

11.1 NVIDIA GPU architectures

There are two types of GPU architectures:

Time-sliced vGPU architecture

Introduced on GPUs that are based on the NVIDIA Ampere GPU architecture. Only Ampere GPU cards can support MIG-backed vGPU.

Multi-Instance GPU (MIG) vGPU architecture

All GPU cards support time-sliced vGPU. To do so, Ampere GPU cards use the Single Root I/O Virtualization (SR-IOV) mechanism, while Volta and the earlier GPU cards use the mediated device mechanism. Volta and the earlier architecture are based on mediated device mechanism. These two mechanisms are transparent to a VM. However, they need different configurations from the host side.

11.2 vGPU types

Each physical GPU can support several different types of vGPUs. vGPU types have a fixed amount of frame buffer, the number of supported display heads, and maximum resolutions. NVIDIA has four types of vGPUs: A, B, C, and Q-series. SUSE currently supports Q and C-series.

Table 1: vGPU types

vGPU series

Optimal workload

Q-series

Virtual workstations for creative and technical professionals who require the performance and features of the NVIDIA Quadro technology.

C-series

Compute-intensive server workloads, for example, artificial intelligence (AI), deep learning, or high-performance computing (HPC).

B-series

Virtual desktops for business professionals and knowledge workers.

A-series

Application streaming or session-based solutions for virtual applications users.

11.3 Valid vGPU configurations on a single GPU

11.3.1 Time-sliced vGPU configurations

For time-sliced vGPUs, all vGPUs types must be the same:

Example time-sliced vGPU configurations on NVIDIA Tesla M60 (source: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html)
Figure 3: Example time-sliced vGPU configurations on NVIDIA Tesla M60 (source: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html)

11.3.2 MIG-backed vGPU configurations

For MIG-backed vGPUs, vGPUs can be both homogeneous and mixed-type:

Example MIG-backed vGPU configurations on NVIDIA A100 PCIe 40GB (source: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html)
Figure 4: Example MIG-backed vGPU configurations on NVIDIA A100 PCIe 40GB (source: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html)