Deploying and Installing SUSE AI
- WHAT?
This document provides a comprehensive, step-by-step guide for the SUSE AI deployment.
- WHY?
To help users successfully complete the deployment process.
- GOAL
To learn enough information to deploy SUSE AI in both testing and production environments.
- EFFORT
Less than one hour of reading and an advanced knowledge of Linux deployment.
SUSE AI is a versatile product consisting of multiple software layers and components. This document outlines the complete workflow for deployment and installation of all SUSE AI dependencies, as well as SUSE AI itself. You can also find references to recommended hardware and software requirements, as well as steps to take after the product installation.
For hardware, software and application-specific requirements, refer to SUSE AI requirements.
1 Installation overview #
The following chart illustrates the installation process of SUSE AI. It outlines the following possible scenarios:
You have clean cluster nodes prepared without a supported Linux operating system installed.
You have a supported Linux operating system and Kubernetes distribution installed on cluster nodes.
You have SUSE Rancher Prime and all supportive components installed on the Kubernetes cluster and are prepared to install the required applications from the AI Library.
2 Installing the Linux and Kubernetes distribution #
This procedure includes the steps to install the base Linux operating system and a Kubernetes distribution for users who start deploying on cluster nodes from scratch. If you already have a Kubernetes cluster installed and running, you can skip this procedure and continue with Section 4.1, “Installation procedure”.
Install and register a supported Linux operating system on each cluster node. We recommend using one of the following operating systems:
SUSE Linux Enterprise Server 15 SP6 for a traditional non-transactional operating system. For more information, see Section 2.1, “Installing SUSE Linux Enterprise Server”.
SUSE Linux Micro 6.1 for an immutable transactional operating system. For more information, see SUSE Linux Micro 6.1 documentation.
For a list of supported operating systems, refer to https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/.
Install the NVIDIA GPU driver on cluster nodes with GPUs. Refer to Section 2.2, “Installing NVIDIA GPU drivers” for details.
Install Kubernetes on cluster nodes. We recommend using the supported SUSE Rancher Prime: RKE2 distribution. Refer to Section 2.3, “Installing SUSE Rancher Prime: RKE2” for details. For a list of supported Kubernetes platforms, refer to https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/.
2.1 Installing SUSE Linux Enterprise Server #
Use the following procedures to install SLES on all supported hardware platforms. They assume you have successfully booted into the installation system. For more detailed installation instructions and deployment strategies, refer to SUSE Linux Enterprise Server Deployment Guide.
2.1.1 The Unified Installer #
Starting with SLES 15, the installation medium consists only of the Unified Installer, a minimal system for installing, updating and registering all SUSE Linux Enterprise base products. During the installation, you can add functionality by selecting modules and extensions to be installed on top of the Unified Installer.
2.1.2 Installing offline or without registration #
The default installation medium 15 SP6-Online-ARCH-GM-media1.iso is optimized for size and does not contain any modules and extensions. Therefore, the installation requires network access to register your product and retrieve repository data for the modules and extensions.
For installation without registering the system, use the
15 SP6-Full-ARCH-GM-media1.iso image from
https://www.suse.com/download/sles/ and refer to
Installing
without registration.
Use the following command to copy the contents of the installation image to a removable flash disk.
>sudodd if=IMAGE of=FLASH_DISK bs=4M && sync
IMAGE needs to be replaced with the path to
the 15 SP6-Online-ARCH-GM-media1.iso or
15 SP6-Full-ARCH-GM-media1.iso image file.
FLASH_DISK needs to be replaced with the
flash device. To identify the device, insert it and run:
# grep -Ff <(hwinfo --disk --short) <(hwinfo --usb --short)
disk:
/dev/sdc General USB Flash DiskMake sure the size of the device is sufficient for the desired image. You can check the size of the device with:
# fdisk -l /dev/sdc | grep -e "^/dev"
/dev/sdc1 * 2048 31490047 31488000 15G 83 Linux
In this example, the device has a capacity of 15 GB. The command to
use for the 15 SP6-Full-ARCH-GM-media1.iso would be:
dd if=15 SP6-Full-ARCH-GM-media1.iso of=/dev/sdc bs=4M && sync
The device must not be mounted when running the dd
command. Note that all data on the partition will be erased.
2.1.3 The installation procedure #
To install SLES, boot or IPL into the installer from the Unified Installer medium and start the installation.
2.1.3.1 Language, keyboard and product selection #
The and settings are initialized with the language you chose on the boot screen. If you do not change the default, it remains English (US). Change the settings here, if necessary. Use the text box to test the layout.
Select SUSE Linux Enterprise Server 15 SP6 for installation. You need to have a registration code for the product. Proceed with .
If you have difficulty reading the labels in the installer, you can change the widget colors and theme.
Click the button or press
Shift–F3 to
open a theme selection dialog. Select a theme from the list and
the dialog.
Shift–F4 switches to the color scheme for vision-impaired users. Press the buttons again to switch back to the default scheme.
2.1.3.2 License agreement #
Read the License Agreement. It is presented in the language you have chosen on the boot screen. Translations are available via the drop-down list. You need to accept the agreement by checking to install SLES. Proceed with .
2.1.3.3 Network settings #
A system analysis is performed, where the installer probes for storage devices and tries to find other installed systems. If the network was automatically configured via DHCP during the start of the installation, you are presented the registration step.
If the network is not yet configured, the dialog opens. Choose a network interface from the list and configure it with . Alternatively, an interface manually. See the sections on installer network settings and configuring a network connection with YaST for more information. If you prefer to do an installation without network access, skip this step without making any changes and proceed with .
2.1.3.4 Registration #
To get technical support and product updates, you need to register and activate SLES with the SUSE Customer Center or a local registration server. Registering your product at this stage also grants you immediate access to the update repository. This enables you to install the system with the latest updates and patches available.
When registering, repositories and dependencies for modules and extensions are loaded from the registration server.
To register at the SUSE Customer Center, enter the associated with your SUSE Customer Center account and the for SLES. Proceed with .
If your organization provides a local registration server, you may alternatively register to it. Activate and either choose a URL from the drop-down list or type in an address. Proceed with .
If you are offline or want to skip registration, activate . Accept the warning with and proceed with .
Important: Skipping the registrationYour system and extensions need to be registered to retrieve updates and to be eligible for support. Skipping the registration is only possible when installing from the
15 SP6-Full-ARCH-GM-media1.isoimage.If you do not register during the installation, you can do so at any time later from the running system. To do so, run › or the command-line tool
SUSEConnect.
After SLES has been successfully registered, you are asked whether to install the latest available online updates during the installation. If choosing , the system will be installed with the most current packages without having to apply the updates after installation. Activating this option is recommended.
By default, the firewall on SUSE AI only blocks incoming
connections. If your system is behind another firewall that blocks
outgoing traffic, make sure to allow connections to
https://scc.suse.com/ and
https://updates.suse.com on ports 80 and 443
to receive updates.
2.1.3.5 Extension and module selection #
After the system is successfully registered, the installer lists modules and extensions that are available for SLES. Modules are components that allow you to customize the product according to your needs. They are included in your SLES subscription. Extensions add functionality to your product. They must be purchased separately.
The availability of certain modules or extensions depends on the product selected in the first step of the installation. For a description of the modules and their lifecycles, select a module to see the accompanying text. More detailed information is available in the Modules and Extensions Quick Start.
The selection of modules indirectly affects the scope of the installation, because it defines which software sources (repositories) are available for installation and in the running system.
The following modules and extensions are available for SUSE Linux Enterprise Server:
- Basesystem Module
This module adds a basic system on top of the Unified Installer. It is required by all other modules and extensions. The scope of an installation that only contains the base system is comparable to the installation pattern minimal system of previous SLES versions. This module is selected for installation by default and should not be deselected.
Dependencies: None
- Certifications Module
Contains the FIPS certification packages.
Dependencies: Server Applications
- Confidential Computing Technical Preview
Contains packages related to confidential computing.
Dependencies: Basesystem
- Containers Module
Contains support and tools for containers.
Dependencies: Basesystem
- Desktop Applications Module
Adds a graphical user interface and essential desktop applications to the system.
Dependencies: Basesystem
- Development Tools Module
Contains the compilers (including gcc) and libraries required for compiling and debugging applications. Replaces the former Software Development Kit (SDK).
Dependencies: Basesystem, Desktop Applications
- Legacy Module
Helps you with migrating applications from earlier versions of SLES and other systems to SLES 15 SP6 by providing packages which are discontinued on SUSE Linux Enterprise. Packages in this module are selected based on the requirements for migration and the level of complexity of configuration.
This module is recommended when migrating from a previous product version.
Dependencies: Basesystem, Server Applications
- NVIDIA Compute Module
Contains the NVIDIA CUDA (Compute Unified Device Architecture) drivers.
The software in this module is provided by NVIDIA under the CUDA End User License Agreement and is not supported by SUSE.
Dependencies: Basesystem
- Public Cloud Module
Contains all tools required to create images for deploying SLES in cloud environments such as Amazon Web Services (AWS), Microsoft Azure, Google Compute Platform, or OpenStack.
Dependencies: Basesystem, Server Applications
- Python 3 Module
This module contains the most recent versions of the selected Python 3 packages.
Dependencies: Basesystem
- SAP Business One Server
This module contains packages and system configurations specific to SAP Business One Server. It is maintained and supported under the SUSE Linux Enterprise Server product subscription.
Dependencies: Basesystem, Server Applications, Desktop Applications, Development Tools
- Server Applications Module
Adds server functionality by providing network services such as DHCP server, name server, or Web server.
Dependencies: Basesystem
- SUSE Linux Enterprise High Availability
Adds clustering support for mission-critical setups to SLES. This extension requires a separate license key.
Dependencies: Basesystem, Server Applications
- SUSE Linux Enterprise Live Patching
Adds support for performing critical patching without having to shut down the system. This extension requires a separate license key.
Dependencies: Basesystem, Server Applications
- SUSE Linux Enterprise Workstation Extension
Extends the functionality of SLES with packages from SUSE Linux Enterprise Desktop, like additional desktop applications (office suite, e-mail client, graphical editor, etc.) and libraries. It allows combining both products to create a fully featured workstation. This extension requires a separate license key.
Dependencies: Basesystem, Desktop Applications
- SUSE Package Hub
Provides access to packages for SLES maintained by the openSUSE community. These packages are delivered without L3 support and do not interfere with the supportability of SLES. For more information, refer to https://packagehub.suse.com/.
Dependencies: Basesystem
- Transactional Server Module
Adds support for transactional updates. Updates are either applied to the system as a single transaction or not applied at all. This happens without influencing the running system. If an update fails, or if the successful update is deemed to be incompatible or otherwise incorrect, it can be discarded to immediately return the system to its previous functioning state.
Dependencies: Basesystem
- Web and Scripting Module
Contains packages intended for a running Web server.
Dependencies: Basesystem, Server Applications
Certain modules depend on the installation of other modules. Therefore, when selecting a module, other modules may be selected automatically to fulfill dependencies.
Depending on the product, the registration server can mark modules and extensions as recommended. Recommended modules and extensions are preselected for registration and installation. To avoid installing these recommendations, deselect them manually.
Select the modules and extensions you want to install and proceed with . In case you have chosen one or more extensions, you will be prompted to provide the respective registration codes. Depending on your choice, it may also be necessary to accept additional license agreements.
When performing an offline installation from the 15 SP6-Full-ARCH-GM-media1.iso, only the is selected by default. To install the complete default package set of SUSE Linux Enterprise Server, additionally select the and the .
2.1.3.6 Add-on product #
The dialog allows you to add additional software sources (called “repositories”) to SLES that are not provided by the SUSE Customer Center. Add-on products may include third-party products and drivers as well as additional software for your system.
You can also add driver update repositories via the dialog. Driver updates for SUSE Linux Enterprise are provided at https://drivers.suse.com/. These drivers have been created through the SUSE SolidDriver Program.
To skip this step, proceed with . Otherwise, activate . Specify a media type, a local path, or a network resource hosting the repository and follow the on-screen instructions.
Check to download the files describing the repository now. If deactivated, they will be downloaded after the installation has started. Proceed with and insert a medium if required. Depending on the content of the product, it may be necessary to accept additional license agreements. Proceed with . If you have chosen an add-on product requiring a registration key, you will be asked to enter it before proceeding to the next step.
2.1.3.7 System role #
The availability of system roles depends on your selection of modules and extensions. System roles define, for example, the set of software patterns that are preselected for the installation. Refer to the description on the screen to make your choice. Select a role and proceed with . If from the enabled modules only one role or no role is suitable for the respective base product, the dialog is omitted.
From this point on, the Release Notes can be viewed from any screen during the installation process by selecting .
2.1.3.8 Suggested partitioning #
Review the partition setup proposed by the system. If necessary, change it. You have the following options:
Starts a wizard that lets you refine the partitioning proposal. The options available here depend on your system setup. If it contains more than a single hard disk, you can choose which disk or disks to use and where to place the root partition. If the disks already contain partitions, decide whether to remove or resize them.
In subsequent steps, you may also add LVM support and disk encryption. You can change the file system for the root partition and decide whether or not to have a separate home partition.
Opens the . This gives you full control over the partitioning setup and lets you create a custom setup. This option is intended for experts. For details, see the Expert Partitioner chapter.
For partitioning purposes, disk space is measured in binary units
rather than in decimal units. For example, if you enter sizes of
1GB, 1GiB or
1G, they all signify 1 GiB (Gibibyte), as
opposed to 1 GB (Gigabyte).
- Binary
1 GiB = 1 073 741 824 bytes.
- Decimal
1 GB = 1 000 000 000 bytes.
- Difference
1 GiB ≈ 1.07 GB.
To accept the proposed setup without any changes, choose to proceed.
2.1.3.9 Clock and time zone #
Select the clock and time zone to use in your system. To manually adjust the time or to configure an NTP server for time synchronization, choose . See the section on Clock and Time Zone for detailed information. Proceed with .
2.1.3.10 Local user #
To create a local user, type the first and last name in the field, the login name in the field, and the password in the field.
The password should be at least eight characters long and should contain both uppercase and lowercase letters and numbers. The maximum length for passwords is 72 characters, and passwords are case-sensitive.
For security reasons, it is also strongly recommended
not to enable .
You should also not but provide a separate root
password in the next installation step.
If you install on a system where a previous Linux installation was found, you may . Click for a list of available user accounts. Select one or more users.
In an environment where users are centrally managed (for example, by NIS or LDAP), you can skip the creation of local users. Select in this case.
Proceed with .
2.1.3.11 Authentication for the system administrator “root” #
Type a password for the system administrator (called the root
user) or provide a public SSH key. If you want, you can use both.
Because the root user is equipped with extensive permissions, the
password should be chosen carefully. You should never forget the
root password. After you entered it here, the password cannot be
retrieved.
It is recommended to use only US ASCII characters. In the event of a system error or when you need to start your system in rescue mode, the keyboard may not be localized.
To access the system remotely via SSH using a public key, import a key
from removable media or an existing partition. See the section on
Authentication
for the system administrator root for more information.
Proceed with .
2.1.3.12 Installation settings #
Use the screen to review and—if necessary—change several proposed installation settings. The current configuration is listed for each setting. To change it, click the headline. Certain settings, such as firewall or SSH, can be changed directly by clicking the respective links.
Changes you can make here can also be made later at any time from the installed system. However, if you need remote access right after the installation, you may need to open the SSH port in the settings.
The scope of the installation is defined by the modules and extensions you have chosen for this installation. However, depending on your selection, not all packages available in a module are selected for installation.
Clicking opens the screen, where you can change the software selection by selecting or deselecting patterns. Each pattern contains several software packages needed for specific functions (for example, ). For a more detailed selection based on software packages to install, select to switch to the YaST . See Installing or removing software for more information.
This section shows the boot loader configuration. Changing the defaults is recommended only if really needed. Refer to The boot loader GRUB 2 for details.
The refer to kernel boot command-line parameters for software mitigations that have been deployed to prevent CPU side-channel attacks. Click the selected entry to choose a different option. For details, see the section on CPU Mitigations.
By default, the is enabled on all configured network interfaces. To disable
firewalld, click (not recommended). Refer to the Masquerading and Firewalls chapter for configuration details.Note: Firewall settings for receiving updatesBy default, the firewall on SUSE AI only blocks incoming connections. If your system is behind another firewall that blocks outgoing traffic, make sure to allow connections to
https://scc.suse.com/andhttps://updates.suse.comon ports 80 and 443 to receive updates.The is enabled by default, but its port (22) is closed in the firewall. Click to open the port or to disable the service. If SSH is disabled, remote logins will not be possible. Refer to Securing network operations with OpenSSH for more information.
The default is . To disable it, select as the module in the settings.
Click to the
Defense Information Systems Agency STIGsecurity policy. If any installation settings are incompatible with the policy, you will be prompted to modify them accordingly. Certain settings can be adjusted automatically while others require user input.Enabling a security profile enables a full SCAP remediation on first boot. You can also perform a or and manually remediate the system later with OpenSCAP. For more information, refer to the section on Security Profiles.
Displays the current network configuration. By default,
wickedis used for server installations and NetworkManager for desktop workloads. Click to change the settings. For details, see the section on Configuring a network connection with YaST.Important: Support for NetworkManagerSUSE only supports NetworkManager for desktop workloads with SLED or the Workstation extension. All server certifications are done with
wickedas the network configuration tool, and using NetworkManager may invalidate them. NetworkManager is not supported by SUSE for server workloads.Kdump saves the memory image (“core dump”) to the file system in case the kernel crashes. This enables you to find the cause of the crash by debugging the dump file. Kdump is preconfigured and enabled by default. See the Basic Kdump configuration for more information.
If you have installed the desktop applications module, the system boots into the target, with network, multi-user and display manager support. Switch to if you do not need to log in via a display manager.
View detailed hardware information by clicking . In the resulting screen, you can also change —see the section on System Information for more information.
2.1.3.13 Start the installation #
After you have finalized the system configuration on the screen, click . Depending on your software selection, you may need to agree to license agreements before the installation confirmation screen pops up. Up to this point, no changes have been made to your system. After you click a second time, the installation process starts.
2.1.3.14 The installation process #
During the installation, the progress is shown. After the installation routine has finished, the computer is rebooted into the installed system.
2.2 Installing NVIDIA GPU drivers #
This article demonstrates how to implement host-level NVIDIA GPU support via the open-driver. The open-driver is part of the core package repositories. Therefore, there is no need to compile it or download executable packages. This driver is built into the operating system rather than dynamically loaded by the NVIDIA GPU Operator. This configuration is desirable for customers who want to pre-build all artifacts required for deployment into the image, and where the dynamic selection of the driver version via Kubernetes is not a requirement.
2.2.1 Installing NVIDIA GPU drivers on SUSE Linux Enterprise Server #
2.2.1.1 Requirements #
If you are following this guide, it assumes that you have the following already available:
At least one host with SLES 15 SP6 installed, physical or virtual.
Your hosts are attached to a subscription as this is required for package access.
A compatible NVIDIA GPU installed or fully passed through to the virtual machine in which SLES is running.
Access to the
rootuser—these instructions assume you are therootuser, and not escalating your privileges viasudo.
2.2.1.2 Considerations before the installation #
2.2.1.2.1 Select the driver generation #
You must verify the driver generation for the NVIDIA GPU that your
system has. For modern GPUs, the G06 driver is the
most common choice. Find more details in
the
support database.
This section details the installation of the G06
generation of the driver.
2.2.1.2.2 Additional NVIDIA components #
Besides the NVIDIA open-driver provided by SUSE as part of SLES,
you might also need additional NVIDIA components. These could include
OpenGL libraries, CUDA toolkits, command-line utilities such as
nvidia-smi, and container-integration components such
as nvidia-container-toolkit. Many of these components are not shipped by
SUSE as they are proprietary NVIDIA software. This section describes
how to configure additional repositories that give you access to these
components and provides examples of using these tools to achieve a fully
functional system.
2.2.1.3 The installation procedure #
Add a package repository from NVIDIA. This allows pulling in additional utilities, for example,
nvidia-smi.For the AMD64/Intel 64 architecture, run:
#zypper ar \ https://developer.download.nvidia.com/compute/cuda/repos/sles15/x86_64/ \ cuda-sle15#zypper --gpg-auto-import-keys refreshFor the Arm AArch64 architecture, run:
#zypper ar \ https://developer.download.nvidia.com/compute/cuda/repos/sles15/sbsa/ \ cuda-sle15transactional update #zypper --gpg-auto-import-keys refreshInstall the Open Kernel driver KMP and detect the driver version.
#zypper install -y --auto-agree-with-licenses \ nv-prefer-signed-open-driver#version=$(rpm -qa --queryformat '%{VERSION}\n' \ nv-prefer-signed-open-driver | cut -d "_" -f1 | sort -u | tail -n 1)You can then install the appropriate packages for additional utilities that are useful for testing purposes.
#zypper install -y --auto-agree-with-licenses \ nvidia-compute-utils-G06=${version} \ nvidia-persistenced=${version}Reboot the host to make the changes effective.
#rebootLog back in and use the
nvidia-smitool to verify that the driver is loaded successfully and that it can both access and enumerate your GPUs.#nvidia-smiThe output of this command should show you something similar to the following output. In the example below, the system has one GPU.
Fri Aug 1 15:32:10 2025 +------------------------------------------------------------------------------+ | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | |------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |==============================+========================+======================| | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | | N/A 33C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | | | | N/A | +------------------------------+------------------------+----------------------+ +------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |==============================================================================| | No running processes found | +------------------------------------------------------------------------------+
2.2.1.4 Validation of the driver installation #
Running the nvidia-smi command has verified that, at
the host level, the NVIDIA device can be accessed and that the drivers
are loading successfully. To validate that it is functioning, you need to
validate that the GPU can take instructions from a user-space application,
ideally via a container and through the CUDA library, as that is
typically what a real workload would use. For this, we can make a further
modification to the host OS by installing
nvidia-container-toolkit.
Install the nvidia-container-toolkit package from the NVIDIA Container Toolkit repository.
#zypper ar \ "https://nvidia.github.io/libnvidia-container/stable/rpm/"\ nvidia-container-toolkit.repo#zypper --gpg-auto-import-keys install \ -y nvidia-container-toolkitThe
nvidia-container-toolkit.repofile contains a stable repositorynvidia-container-toolkitand an experimental repositorynvidia-container-toolkit-experimental. Use the stable repository for production use. The experimental repository is disabled by default.Verify that the system can successfully enumerate the devices using the NVIDIA Container Toolkit. The output should be verbose, with INFO and WARN messages, but no ERROR messages.
#nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yamlThis ensures that any container started on the machine can employ discovered NVIDIA GPU devices.
You can then run a Podman-based container. Doing this via
podmangives you a good way of validating access to the NVIDIA device from within a container, which should give confidence for doing the same with Kubernetes at a later stage.Give Podman access to the labeled NVIDIA devices that were taken care of by the previous command and simply run the
bashcommand.#podman run --rm --device nvidia.com/gpu=all \ --security-opt=label=disable \ -it registry.suse.com/bci/bci-base:latest bashYou can now execute commands from within a temporary Podman container. It does not have access to your underlying system and is ephemeral—whatever you change in the container does not persist. Also, you cannot break anything on the underlying host.
Inside the container, install the required CUDA libraries. Identify their version from the output of the
nvidia-smicommand. From the above example, we are installing CUDA version 13.0 with many examples, demos and development kits to fully validate the GPU.#zypper ar \ http://developer.download.nvidia.com/compute/cuda/repos/sles15/x86_64/ \ cuda-sle15-sp6#zypper --gpg-auto-import-keys refresh#zypper install -y cuda-libraries-13-0 cuda-demo-suite-12-9Inside the container, run the
deviceQueryCUDA example of the same version, which comprehensively validates GPU access via CUDA and from within the container itself.#/usr/local/cuda-12.9/extras/demo_suite/deviceQueryStarting... CUDA Device Query (Runtime API) Detected 1 CUDA Capable device(s) Device 0: "Tesla T4" CUDA Driver Version / Runtime Version 13.0/ 13.0 CUDA Capability Major/Minor version number: 7.5 Total amount of global memory: 14913 MBytes (15637086208 bytes) (40) Multiprocessors, ( 64) CUDA Cores/MP: 2560 CUDA Cores GPU Max Clock rate: 1590 MHz (1.59 GHz) Memory Clock rate: 5001 Mhz Memory Bus Width: 256-bit L2 Cache Size: 4194304 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1024 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 3 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 30 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 13.0, CUDA Runtime Version = 13.0, NumDevs = 1, Device0 = Tesla T4 Result = PASSFrom inside the container, you can continue to run any other CUDA workload—such as compilers—to run further tests. When finished, you can exit the container.
#exitImportantChanges you have made in the container and packages you have installed inside will be lost and will not impact the underlying operating system.
2.2.2 Installing NVIDIA GPU drivers on SUSE Linux Micro #
2.2.2.1 Requirements #
If you are following this guide, it assumes that you have the following already available:
At least one host with SUSE Linux Micro 6.1 installed, physical or virtual.
Your hosts are attached to a subscription as this is required for package access.
A compatible NVIDIA GPU installed or fully passed through to the virtual machine in which SUSE Linux Micro is running.
Access to the
rootuser—these instructions assume you are therootuser, and not escalating your privileges viasudo.
2.2.2.2 Considerations before the installation #
2.2.2.2.1 Select the driver generation #
You must verify the driver generation for the NVIDIA GPU that your
system has. For modern GPUs, the G06 driver is the
most common choice. Find more details in
the
support database.
This section details the installation of the G06
generation of the driver.
2.2.2.2.2 Additional NVIDIA components #
Besides the NVIDIA open-driver provided by SUSE as part of SUSE Linux Micro,
you might also need additional NVIDIA components. These could include
OpenGL libraries, CUDA toolkits, command-line utilities such as
nvidia-smi, and container-integration components such
as nvidia-container-toolkit. Many of these components are not shipped by
SUSE as they are proprietary NVIDIA software. This section describes
how to configure additional repositories that give you access to these
components and provides examples of using these tools to achieve a fully
functional system.
2.2.2.3 The installation procedure #
On each GPU-enabled host, open up a
transactional-update shellsession to create a new read/write snapshot of the underlying operating system so that we can make changes to the immutable platform.#transactional-update shellWhen you are in the
transactional-update shellsession, add a package repository from NVIDIA. This allows pulling in additional utilities, for example,nvidia-smi.For the AMD64/Intel 64 architecture, run:
transactional update #zypper ar \ https://developer.download.nvidia.com/compute/cuda/repos/sles15/x86_64/ \ cuda-sle15transactional update #zypper --gpg-auto-import-keys refreshFor the Arm AArch64 architecture, run:
transactional update #zypper ar \ https://developer.download.nvidia.com/compute/cuda/repos/sles15/sbsa/ \ cuda-sle15transactional update #zypper --gpg-auto-import-keys refreshInstall the Open Kernel driver KMP and detect the driver version.
transactional update #zypper install -y --auto-agree-with-licenses \ nvidia-open-driver-G06-signed-cuda-kmp-defaulttransactional update #version=$(rpm -qa --queryformat '%{VERSION}\n' \ nvidia-open-driver-G06-signed-cuda-kmp-default \ | cut -d "_" -f1 | sort -u | tail -n 1)You can then install the appropriate packages for additional utilities that are useful for testing purposes.
transactional update #zypper install -y --auto-agree-with-licenses \ nvidia-compute-utils-G06=${version} \ nvidia-persistenced=${version}Exit the
transactional-updatesession and reboot to the new snapshot that contains the changes you have made.transactional update #exit#rebootAfter the system has rebooted, log back in and use the
nvidia-smitool to verify that the driver is loaded successfully and that it can both access and enumerate your GPUs.#nvidia-smiThe output of this command should show you something similar to the following output. In the example below, the system has one GPU.
Fri Aug 1 14:53:26 2025 +------------------------------------------------------------------------------+ | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | |---------------------------------+---------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=================================+=====================+======================| | 0 Tesla T4 On |00000000:00:1E.0 Off | 0 | | N/A 34C P8 10W / 70W | 0MiB / 15360MiB | 0% Default | | | | N/A | +---------------------------------+---------------------+----------------------+ +------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |==============================================================================| | No running processes found | +------------------------------------------------------------------------------+
2.2.2.4 Validation of the driver installation #
Running the nvidia-smi command has verified that, at
the host level, the NVIDIA device can be accessed and that the drivers
are loading successfully. To validate that it is functioning, you need to
validate that the GPU can take instructions from a user-space application,
ideally via a container and through the CUDA library, as that is
typically what a real workload would use. For this, we can make a further
modification to the host OS by installing
nvidia-container-toolkit.
Open another transactional-update shell.
#transactional-update shellInstall the nvidia-container-toolkit package from the NVIDIA Container Toolkit repository.
transactional update #zypper ar \ "https://nvidia.github.io/libnvidia-container/stable/rpm/"\ nvidia-container-toolkit.repotransactional update #zypper --gpg-auto-import-keys install \ -y nvidia-container-toolkitThe
nvidia-container-toolkit.repofile contains a stable repositorynvidia-container-toolkitand an experimental repositorynvidia-container-toolkit-experimental. Use the stable repository for production use. The experimental repository is disabled by default.Exit the
transactional-updatesession and reboot to the new snapshot that contains the changes you have made.transactional update #exit#rebootVerify that the system can successfully enumerate the devices using the NVIDIA Container Toolkit. The output should be verbose, with INFO and WARN messages, but no ERROR messages.
#nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yamlThis ensures that any container started on the machine can employ discovered NVIDIA GPU devices.
You can then run a Podman-based container. Doing this via
podmangives you a good way of validating access to the NVIDIA device from within a container, which should give confidence for doing the same with Kubernetes at a later stage.Give Podman access to the labeled NVIDIA devices that were taken care of by the previous command and simply run the
bashcommand.#podman run --rm --device nvidia.com/gpu=all \ --security-opt=label=disable \ -it registry.suse.com/bci/bci-base:latest bashYou can now execute commands from within a temporary Podman container. It does not have access to your underlying system and is ephemeral—whatever you change in the container does not persist. Also, you cannot break anything on the underlying host.
Inside the container, install the required CUDA libraries. Identify their version from the output of the
nvidia-smicommand. From the above example, we are installing CUDA version 13.0 with many examples, demos and development kits to fully validate the GPU.#zypper ar \ http://developer.download.nvidia.com/compute/cuda/repos/sles15/x86_64/ \ cuda-sle15-sp6#zypper --gpg-auto-import-keys refresh#zypper install -y cuda-libraries-13-0 cuda-demo-suite-12-9Inside the container, run the
deviceQueryCUDA example of the same version, which comprehensively validates GPU access via CUDA and from within the container itself.#/usr/local/cuda-12.9/extras/demo_suite/deviceQueryStarting... CUDA Device Query (Runtime API) Detected 1 CUDA Capable device(s) Device 0: "Tesla T4" CUDA Driver Version / Runtime Version 13.0 / 13.0 CUDA Capability Major/Minor version number: 7.5 Total amount of global memory: 14914 MBytes (15638134784 bytes) (40) Multiprocessors, ( 64) CUDA Cores/MP: 2560 CUDA Cores GPU Max Clock rate: 1590 MHz (1.59 GHz) Memory Clock rate: 5001 Mhz Memory Bus Width: 256-bit L2 Cache Size: 4194304 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1024 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 3 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 30 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 13.0, CUDA Runtime Version = 13.0, NumDevs = 1, Device0 = Tesla T4 Result = PASSFrom inside the container, you can continue to run any other CUDA workload—such as compilers—to run further tests. When finished, you can exit the container.
#exitImportantChanges you have made in the container and packages you have installed inside will be lost and will not impact the underlying operating system.
2.3 Installing SUSE Rancher Prime: RKE2 #
This guide will help you quickly launch a cluster with default options.
New to Kubernetes? The official Kubernetes docs already have great tutorials outlining the basics.
You can use any RKE2 Prime version listed on the Prime Artifacts URL for the assets mentioned in these steps. To learn more about the Prime Artifacts URL, see our Prime-only documentation. Authentication is required. Use your SUSE Customer Center (SCC) credentials to log in.
2.3.1 Prerequisites #
Make sure your environment fulfills the requirements. If NetworkManager is installed and enabled on your hosts, ensure that it is configured to ignore CNI-managed interfaces.
If the host kernel supports AppArmor, the AppArmor tools (usually available via the apparmor-parser package) must also be present before installing RKE2.
The RKE2 installation process must be run as the
rootuser or throughsudo.
2.3.2 Server node installation #
SUSE Rancher Prime: RKE2 provides an installation script that is a convenient way to install it as a service on systemd-based systems. This script is available at https://get.rke2.io. To install RKE2 using this method, do the following:
Run the installer, where INSTALL_RKE2_ARTIFACT_URL is the Prime Artifacts URL and INSTALL_RKE2_CHANNEL is a release channel you can subscribe to and defaults to
stable. In this example,INSTALL_RKE2_CHANNEL="latest"gives you the latest version of RKE2.>sudocurl -sfL https://get.rke2.io/ | \ sudo INSTALL_RKE2_ARTIFACT_URL=PRIME-ARTIFACTS-URL/rke2 \ INSTALL_RKE2_CHANNEL="latest" sh -To specify a version, set the INSTALL_RKE2_VERSION environment variable.
>sudocurl -sfL https://get.rke2.io/ | \ sudo INSTALL_RKE2_ARTIFACT_URL=PRIME-ARTIFACTS-URL/rke2 \ INSTALL_RKE2_VERSION="VERSION" ./install.shThis will install the
rke2-serverservice and therke2binary onto your machine. Due to its nature, it will fail unless it runs as therootuser or throughsudo.Enable the
rke2-serverservice.>sudosystemctl enable rke2-server.serviceTo pull images from the Rancher Prime registry, set the following value in
etc/rancher/rke2/config.yaml:>sudosystem-default-registry: registry.rancher.comThis configuration tells RKE2 to use registry.rancher.com as the default location for all container images it needs to deploy within the cluster.
Start the service.
>sudosystemctl start rke2-server.serviceFollow the logs with the following command:
>sudojournalctl -u rke2-server -f
After running this installation:
The
rke2-serverservice will be installed. Therke2-serverservice will be configured to automatically restart after node reboots or if the process crashes or is killed.Additional utilities will be installed at
/var/lib/rancher/rke2/bin/. They include:kubectl,crictl, andctr. Note that these are not on your path by default.Two cleanup scripts,
rke2-killall.shandrke2-uninstall.sh, will be installed to the path at:/usr/local/binfor regular file systems/opt/rke2/binfor read-only and Btrfs file systemsINSTALL_RKE2_TAR_PREFIX/binif INSTALL_RKE2_TAR_PREFIX is set
A kubeconfig file will be written to
/etc/rancher/rke2/rke2.yaml.A token that can be used to register other server or agent nodes will be created at
/var/lib/rancher/rke2/server/node-token.
If you are adding additional server nodes, you must have an odd number in total. An odd number is needed to maintain a quorum. See the High Availability documentation for more details.
2.3.3 Linux agent (worker) node installation #
The steps on this section requires root-level access or
sudo to work.
Run the installer.
>sudocurl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -This will install the
rke2-agentservice and therke2binary onto your machine. Due to its nature, it will fail unless it runs as the root user or throughsudo.Enable the
rke2-agentservice.>sudosystemctl enable rke2-agent.serviceConfigure the
rke2-agentservice.>sudomkdir -p /etc/rancher/rke2/ vim /etc/rancher/rke2/config.yamlContent for
config.yaml:>sudoserver: https://SERVER_IP_OR_DNS:9345 token: TOKEN_FROM_SERVER_NODENoteThe
rke2 serverprocess listens on port9345for new nodes to register. The Kubernetes API is still served on port6443, as normal.Start the service.
>sudosystemctl start rke2-agent.serviceFollow the logs with the following command:
>sudojournalctl -u rke2-agent -f
Each machine must have a unique host name. If your machines do not have
unique host names, set the node-name
parameter in the config.yaml file and provide a
value with a valid and unique host name for each node. To learn more
about the config.yaml file, refer to the
Configuration
options documentation.
2.3.4 Microsoft Windows agent (worker) node installation #
Windows Support works with Calico or Flannel as the CNI for the RKE2 cluster.
The Windows Server Containers feature needs to be enabled for the RKE2 agent to work.
Open a new PowerShell window with administrator privileges.
powershell -Command "Start-Process PowerShell -Verb RunAs"
In the new PowerShell window, run the following command to install the containers feature.
Enable-WindowsOptionalFeature -Online -FeatureName containers –All
This will require a reboot for the
Containersfeature to function properly.
Download the install script.
Invoke-WebRequest -Uri https://raw.githubusercontent.com/rancher/rke2/master/install.ps1 -Outfile install.ps1
This script will download the
rke2.exeWindows binary onto your machine.Configure the rke2-agent for Windows.
>sudoNew-Item -Type Directory c:/etc/rancher/rke2 -Force Set-Content -Path c:/etc/rancher/rke2/config.yaml -Value @" server: https://SERVER_IP_OR_DNS:9345 token: TOKEN_FROM_SERVER_NODE "@To learn more about the
config.yamlfile, refer to the Configuration options documentation.Configure the PATH.
>sudo$env:PATH+=";c:\var\lib\rancher\rke2\bin;c:\usr\local\bin" [Environment]::SetEnvironmentVariable( "Path", [Environment]::GetEnvironmentVariable("Path", [EnvironmentVariableTarget]::Machine) + ";c:\var\lib\rancher\rke2\bin;c:\usr\local\bin", [EnvironmentVariableTarget]::Machine)Run the installer.
>sudo./install.ps1Start the Windows
RKE2Service.>sudorke2.exe agent service --add
Each machine must have a unique host name.
Do not forget to start the RKE2
service with:
Start-Service rke2
If you would prefer to use CLI parameters only instead, run the binary with the desired parameters.
rke2.exe agent --token TOKEN --server SERVER_URL
3 Preparing the cluster for AI Library #
This procedure assumes that you already have the base operating system installed on cluster nodes as well as the SUSE Rancher Prime: RKE2 Kubernetes distribution installed and operational. If you are installing from scratch, refer to Section 2, “Installing the Linux and Kubernetes distribution” first.
Install SUSE Rancher Prime on the cluster.
Install the NVIDIA GPU Operator on the cluster as described in Section 3.2, “Installing the NVIDIA GPU Operator on the SUSE Rancher Prime: RKE2 cluster”.
Connect the Kubernetes cluster to SUSE Rancher Prime as described in Section 3.3, “Registering existing clusters”
Configure the GPU-enabled nodes so that the SUSE AI containers are assigned to Pods that run on nodes equipped with NVIDIA GPU hardware. Find more details about assigning Pods to nodes in Section 3.4, “Assigning GPU nodes to applications”.
Install and configure SUSE Security to scan the nodes used for SUSE AI. Although this step is not required, we strongly encourage it to ensure the security in the production environment.
Install and configure SUSE Observability to observe the nodes used for SUSE AI application. Refer to Section 3.5, “Setting up SUSE Observability for SUSE AI” for more details.
3.1 Installing SUSE Rancher Prime on a Kubernetes cluster #
In this section, you will learn how to deploy SUSE Rancher Prime on a Kubernetes cluster using the Helm CLI.
3.1.1 Prerequisites #
3.1.1.1 Kubernetes cluster #
Set up the SUSE Rancher Prime server's local Kubernetes cluster.
SUSE Rancher Prime can be installed on any Kubernetes cluster. This cluster can use upstream Kubernetes, or it can use one of SUSE Rancher Prime's Kubernetes distributions, or it can be a managed Kubernetes cluster from a provider such as Amazon EKS.
For help setting up a RKE2 cluster, refer to Section 2.3, “Installing SUSE Rancher Prime: RKE2”.
3.1.1.2 Ingress controller #
The SUSE Rancher Prime UI and API are exposed through an Ingress. This means the Kubernetes cluster that you install SUSE Rancher Prime in must contain an Ingress controller.
For SUSE Rancher Prime: RKE2 and K3s installations, you do not have to install the Ingress controller manually because one is installed by default.
3.1.1.3 CLI tools #
The following CLI tools are required for setting up the Kubernetes cluster.
Make sure these tools are installed and available in your
$PATH.
kubectl- Kubernetes command-line tool.helm- Package management for Kubernetes. Refer to the Helm version requirements to choose a version of Helm to install SUSE Rancher Prime. Refer to the instructions provided by the Helm project for your specific platform.
3.1.2 Install the Rancher Helm chart #
SUSE Rancher Prime is installed using the Helm package manager for Kubernetes. Helm charts provide templating syntax for Kubernetes YAML manifest documents. With Helm, we can create configurable deployments instead of just using static files.
To choose a SUSE Rancher Prime version to install, refer to Choosing a SUSE Rancher Prime version.
To choose a version of Helm to install SUSE Rancher Prime with, refer to the Helm version requirements.
The installation instructions assume you are using Helm version 3.
To set up SUSE Rancher Prime,
3.1.3 Add the Helm chart repository #
Use the helm repo add command to add the Helm chart
repository that contains charts to install SUSE Rancher Prime.
> helm repo add rancher-prime helm-chart-repo-urlFor information on the Helm chart repository URL, refer to the SUSE Rancher Prime documentation.
3.1.4 Create a namespace for Rancher #
Define a Kubernetes namespace where the resources created by the chart will be
installed named cattle-system:
> kubectl create namespace cattle-system3.1.5 Choose your SSL configuration #
The SUSE Rancher Prime management server is designed to be secure by default and requires SSL/TLS configuration.
To externally cancel SSL/TLS, see TLS termination on an External Load Balancer. As outlined on that page, this option does have additional requirements for TLS verification.
There are three recommended options for the source of the certificate used for TLS termination at the SUSE Rancher Prime server:
SUSE Rancher Prime-generated TLS certificate: In this case, you need to install
cert-managerinto the cluster. SUSE Rancher Prime utilizescert-managerto issue and maintain its certificates. SUSE Rancher Prime generates a CA certificate of its own, and sign a certificate using that CA.cert-manageris then responsible for managing that certificate. No extra action is needed whenagent-tls-modeis set to strict. More information can be found on this setting in Agent TLS Enforcement.Let's Encrypt: The Let's Encrypt option also uses
cert-manager. However, in this case, cert-manager is combined with a special Issuer for Let's Encrypt that performs all actions (including request and validation) necessary for getting a Let's Encrypt issued cert. This configuration uses HTTP validation (HTTP-01), so the load balancer must have a public DNS record and be accessible from the internet. When settingagent-tls-modetostrict, you must also specify--privateCA=trueand upload the Let's Encrypt CA as described in Adding TLS Secrets. Find more information on this setting in Agent TLS Enforcement.Bring your own certificate: This option allows you to bring your own public- or private-CA signed certificate. SUSE Rancher Prime will use that certificate to secure websocket and HTTPS traffic. In this case, you must upload this certificate (and associated key) as PEM-encoded files with the name
tls.crtandtls.key. If you are using a private CA, you must also upload that certificate. This is because this private CA may not be trusted by your nodes. SUSE Rancher Prime will take that CA certificate, and generate a checksum from it, which the various SUSE Rancher Prime components use to validate their connection to SUSE Rancher Prime. Ifagent-tls-modeis set tostrict, the CA must be uploaded, so that downstream clusters can successfully connect. Find more information in Agent TLS Enforcement.
3.1.6 Install cert-manager #
You should skip this step if you are bringing your own certificate files
(option ingress.tls.source=secret), or if you use
TLS
termination on an external load balancer.
This step is only required to use certificates issued by
SUSE Rancher Prime's generated CA
(ingress.tls.source=rancher) or to request Let's
Encrypt issued certificates
(ingress.tls.source=letsEncrypt).
Recent changes to cert-manager require an upgrade. If you are upgrading SUSE Rancher Prime and using a version of cert-manager older than v0.11.0, please see our upgrade documentation.
These instructions are adapted from the official cert-manager documentation.
To see options on how to customize the cert-manager install including for cases where your cluster uses PodSecurityPolicies, see the cert-manager docs.
# If you have installed the CRDs manually, instead of settinginstallCRDs\ orcrds.enabledto 'true' in your Helm install command, \ you should upgrade your CRD resources before upgrading the Helm chart:kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/VERSION/cert-manager.crds.yaml# Add the Jetstack Helm repository helm repo add jetstack https://charts.jetstack.io # Update your local Helm chart repository cache helm repo update # Install the cert-manager Helm chart helm install cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --set crds.enabled=true
Once you have installed cert-manager, you can verify it is deployed correctly by checking the cert-manager namespace for running pods:
kubectl get pods --namespace cert-manager NAME READY STATUS RESTARTS AGE cert-manager-5c6866597-zw7kh 1/1 Running 0 2m cert-manager-cainjector-577f6d9fd7-tr77l 1/1 Running 0 2m cert-manager-webhook-787858fcdb-nlzsq 1/1 Running 0 2m
3.1.7 Install Rancher with Helm and your chosen certificate option #
The exact command to install SUSE Rancher Prime differs depending on the certificate configuration.
However, irrespective of the certificate configuration, the name of the
SUSE Rancher Prime installation in the cattle-system
namespace should always be rancher.
This final command to install SUSE Rancher Prime requires a domain name
that forwards traffic to SUSE Rancher Prime. If you are using the Helm
CLI to set up a proof-of-concept, you can use a fake domain name when
passing the hostname option. An example of a fake
domain name would be
IP_OF_LINUX_NODE.sslip.io,
which would expose SUSE Rancher Prime on an IP where it is running.
Production installs would require a real domain name.
3.1.7.1 Rancher-generated certificates #
The default is for SUSE Rancher Prime to generate a CA and uses
cert-manager to issue the certificate for access to
the SUSE Rancher Prime server interface.
Because rancher is the default option for
ingress.tls.source, we are not specifying
ingress.tls.source when running the helm
install command.
Set the
hostnameto the DNS name you pointed at your load balancer.Set the
bootstrapPasswordto something unique for theadminuser.To install a specific SUSE Rancher Prime version, use the
--versionflag, example:--version 2.7.0
> helm install rancher rancher-prime/rancher \
--namespace cattle-system \
--set hostname=rancher.my.org \
--set bootstrapPassword=adminWait for SUSE Rancher Prime to be rolled out:
> kubectl -n cattle-system rollout status deploy/rancher
Waiting for deployment "rancher" rollout to finish: 0 of 3 updated replicas are available...
deployment "rancher" successfully rolled out3.1.7.2 Let's Encrypt #
This option uses cert-manager to automatically
request and renew Let's
Encrypt certificates. This is a free service that provides you
with a valid certificate as Let's Encrypt is a trusted CA.
You need to have port 80 open as the HTTP-01 challenge can only be done on port 80.
In the following command,
hostnameis set to the public DNS record,Set the
bootstrapPasswordto something unique for theadminuser.ingress.tls.sourceis set toletsEncryptletsEncrypt.emailis set to the email address used for communication about your certificate (for example, expiry notices)Set
letsEncrypt.ingress.classto whatever your Ingress controller is, for example,traefik,nginx,haproxy, etc.
When agent-tls-mode is set to
strict (the default value for new installs of
SUSE Rancher Prime starting from v2.9.0), you must supply the
privateCA=true chart value (e.x. through
--set privateCA=true) and upload the Let's Encrypt
Certificate Authority as outlined in
Adding
TLS Secrets. Information on identifying the Let's Encrypt Root
CA can be found in the Let's Encrypt
docs.
If you do not upload the CA, then SUSE Rancher Prime may fail to connect
to new or existing downstream clusters.
> helm install rancher rancher-prime/rancher \
--namespace cattle-system \
--set hostname=rancher.my.org \
--set bootstrapPassword=admin \
--set ingress.tls.source=letsEncrypt \
--set letsEncrypt.email=me@example.org \
--set letsEncrypt.ingress.class=nginxWait for SUSE Rancher Prime to be rolled out:
> kubectl -n cattle-system rollout status deploy/rancher
Waiting for deployment "rancher" rollout to finish: 0 of 3 updated replicas are available...
deployment "rancher" successfully rolled out3.1.7.3 Certificates from files #
In this option, Kubernetes secrets are created from your own certificates for SUSE Rancher Prime to use.
When you run this command, the hostname option must
match the Common Name or a Subject
Alternative Names entry in the server certificate or the
Ingress controller will fail to configure correctly.
Although an entry in the Subject Alternative Names is
technically required, having a matching Common Name
maximizes compatibility with older browsers and applications.
To check if your certificates are correct, see How do I check Common Name and Subject Alternative Names in my server certificate?
Set the
hostname.Set the
bootstrapPasswordto something unique for theadminuser.Set
ingress.tls.sourcetosecret.
> helm install rancher rancher-prime/rancher \
--namespace cattle-system \
--set hostname=rancher.my.org \
--set bootstrapPassword=admin \
--set ingress.tls.source=secret
If you are using a Private CA signed certificate , add --set
privateCA=true to the command:
> helm install rancher rancher-prime/rancher \
--namespace cattle-system \
--set hostname=rancher.my.org \
--set bootstrapPassword=admin \
--set ingress.tls.source=secret \
--set privateCA=trueNow that SUSE Rancher Prime is deployed, see Adding TLS Secrets to publish the certificate files so SUSE Rancher Prime and the Ingress controller can use them.
3.1.7.4 Advanced options #
The SUSE Rancher Prime chart configuration has many options for customizing the installation to suit your specific environment. Here are common advanced scenarios.
See the Chart Options for the full list of options.
3.1.8 Verify that the Rancher server is successfully deployed #
After adding the secrets, check if SUSE Rancher Prime was rolled out successfully:
> kubectl -n cattle-system rollout status deploy/rancher
Waiting for deployment "rancher" rollout to finish: 0 of 3 updated replicas are available...
deployment "rancher" successfully rolled out
If you see the following error: error: deployment "rancher"
exceeded its progress deadline, you can check the status of the
deployment by running the following command:
> kubectl -n cattle-system get deploy rancher
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
rancher 3 3 3 3 3m
It should show the same count for DESIRED and
AVAILABLE.
3.1.9 Save your options #
Make sure you save the --set options you used. You will
need to use the same options when you upgrade SUSE Rancher Prime to new
versions with Helm.
3.1.10 Finishing up #
Now you should have a functional SUSE Rancher Prime server.
In a Web browser, go to the DNS name that forwards traffic to your load balancer. Then you should be greeted by the colorful login page.
3.2 Installing the NVIDIA GPU Operator on the SUSE Rancher Prime: RKE2 cluster #
The NVIDIA operator allows administrators of Kubernetes clusters to manage GPUs just like CPUs. It includes everything needed for pods to be able to operate GPUs.
3.2.1 Host OS requirements #
To expose the GPU to the pod correctly, the NVIDIA kernel drivers and
the libnvidia-ml library must be correctly installed
in the host OS. The NVIDIA Operator can automatically install drivers
and libraries on specific operating systems. Check the NVIDIA
documentation for information on
supported
operating system releases. Installation of the NVIDIA components
on your host OS is out of the scope of this document. Refer to the
NVIDIA documentation for instructions.
The following three commands should return a correct output if the kernel driver is correctly installed.
lsmod | grep nvidiareturns a list of NVIDIA kernel modules. For example:nvidia_uvm 2129920 0 nvidia_drm 131072 0 nvidia_modeset 1572864 1 nvidia_drm video 77824 1 nvidia_modeset nvidia 9965568 2 nvidia_uvm,nvidia_modeset ecc 45056 1 nvidia
cat /proc/driver/nvidia/versionreturns the NVRM and GCC versions of the driver. For example:NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 555.42.06 Release Build (abuild@host) Thu Jul 11 12:00:00 UTC 2024 GCC version: gcc version 7.5.0 (SUSE Linux)
find /usr/ -iname libnvidia-ml.soreturns a path to thelibnvidia-ml.solibrary. For example:/usr/lib64/libnvidia-ml.so
This library is used by Kubernetes components to interact with the kernel driver.
3.2.2 Operator installation #
Once the OS is ready and RKE2 is running, adjust the RKE2 nodes:
On the agent nodes of RKE2, run the following command:
#echo PATH=$PATH:/usr/local/nvidia/toolkit >> /etc/default/rke2-agentOn the server nodes of RKE2, run the following command:
#echo PATH=$PATH:/usr/local/nvidia/toolkit >> /etc/default/rke2-server
Then, install the NVIDIA GPU Operator using the following YAML manifest.
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: gpu-operator
namespace: kube-system
spec:
repo: https://helm.ngc.nvidia.com/nvidia
chart: gpu-operator
targetNamespace: gpu-operator
createNamespace: true
valuesContent: |-
toolkit:
env:
- name: CONTAINERD_SOCKET
value: /run/k3s/containerd/containerd.sockThe NVIDIA operator restarts containerd with a hangup call, which restarts RKE2.
After approximately one minute, you can make the following checks to verify that everything works as expected.
Assuming the drivers and
libnvidia-ml.solibrary are installed, check if the operator detects them correctly.>kubectl get node NODENAME \ -o jsonpath='{.metadata.labels}' | grep "nvidia.com/gpu.deploy.driver"You should see the value
pre-installed. If you seetrue, the drivers are not installed correctly. If the requirements in Section 3.2.1, “Host OS requirements” are met, you may have forgotten to reboot the node after installing all packages.You can also check other driver labels:
>kubectl get node NODENAME \ -o jsonpath='{.metadata.labels}' | jq | grep "nvidia.com"You should see labels specifying driver and GPU, for example,
nvidia.com/gpu.machineornvidia.com/cuda.driver.major.Check if the GPU was added by
nvidia-device-plugin-daemonsetas an allocatable resource in the node.>kubectl get node NODENAME \ -o jsonpath='{.status.allocatable}' | jqYou should see
"nvidia.com/gpu":followed by the number of GPUs in the node.Check that the container runtime binary was installed by the operator (in particular, by the
nvidia-container-toolkit-daemonset):>ls /usr/local/nvidia/toolkit/nvidia-container-runtimeVerify whether the containerd configuration was updated to include the NVIDIA container runtime.
>grep nvidia /var/lib/rancher/rke2/agent/etc/containerd/config.tomlRun a pod to verify that the GPU resource can successfully be scheduled on a pod and the pod can detect it.
apiVersion: v1 kind: Pod metadata: name: nbody-gpu-benchmark namespace: default spec: restartPolicy: OnFailure runtimeClassName: nvidia containers: - name: cuda-container image: nvcr.io/nvidia/k8s/cuda-sample:nbody args: ["nbody", "-gpu", "-benchmark"] resources: limits: nvidia.com/gpu: 1 env: - name: NVIDIA_VISIBLE_DEVICES value: all - name: NVIDIA_DRIVER_CAPABILITIES value: compute,utility
Available as of October 2024 releases: v1.28.15+rke2r1, v1.29.10+rke2r1, v1.30.6+rke2r1, v1.31.2+rke2r1.
RKE2 will now use PATH to find alternative container
runtimes, in addition to checking the default paths used by the container
runtime packages. To use this feature, you must modify the RKE2
service's PATH environment variable to add the directories
containing the container runtime binaries.
We recommend modifying one of these two environment files:
/etc/default/rke2-server# or rke2-agent/etc/sysconfig/rke2-server# or rke2-agent
This example adds the PATH in
/etc/default/rke2-server:
> echo PATH=$PATH >> /etc/default/rke2-server
PATH changes should be done with care to avoid placing
untrusted binaries in the path of services that run as root.
3.3 Registering existing clusters #
In this section, you will learn how to register existing RKE2 clusters in SUSE Rancher Prime (Rancher).
The cluster registration feature replaced the feature for importing clusters.
The control that Rancher has to manage a registered cluster depends on the type of cluster. For details, see Section 3.3.3, “Management capabilities for registered clusters”.
3.3.1 Prerequisites #
3.3.1.1 Kubernetes node roles #
Registered RKE Kubernetes clusters must have all three node roles:
etcd, controlplane and
worker. A cluster with only
controlplane components cannot be registered in
Rancher.
For more information on RKE node roles, see the best practices.
3.3.1.2 Permissions #
To register a cluster in Rancher, you must have
cluster-admin privileges within that cluster. If you
do not, grant these privileges to your user by running:
> kubectl create clusterrolebinding cluster-admin-binding \
--clusterrole cluster-admin \
--user USER_ACCOUNT3.3.2 Registering a cluster #
Click > .
On the page, click .
Choose the type of cluster.
Use to configure user authorization for the cluster. Click to add users who can access the cluster. Use the drop-down list to set permissions for each user.
If you are importing a generic Kubernetes cluster in Rancher, perform the following steps for setup:
Click under to set environment variables for the Rancher cluster agent. The environment variables can be set using key-value pairs. If the Rancher agent requires the use of a proxy to communicate with the Rancher server,
HTTP_PROXY,HTTP_PROXY,HTTPS_PROXYandNO_PROXYenvironment variables can be set using agent environment variables.Enable to ensure the cluster supports Kubernetes
NetworkPolicyresources. Users can select the option under the drop-down list to do so.Configure the version management feature for imported RKE2 and K3s clusters.
Click .
The requirements for
cluster-adminprivileges are shown (see Section 3.3.1, “Prerequisites”), including an example command to fulfill them.Copy the
kubectlcommand to your clipboard and run it on a node wherekubeconfigis configured to point to the cluster you want to import. If you are unsure it is configured correctly, runkubectl get nodesto verify before running the command shown in Rancher.If you are using self-signed certificates, you will receive the message
certificate signed by unknown authority. To work around this validation, copy the command starting withcurldisplayed in Rancher to your clipboard. Then run the command on a node wherekubeconfigis configured to point to the cluster you want to import.After you finish running the command(s) on your node, click .
The NO_PROXY environment variable is not standardized,
and the accepted format of the value can differ between applications.
When configuring the NO_PROXY variable for Rancher,
the value must adhere to the format expected by Golang.
Specifically, the value should be a comma-delimited string that contains
only IP addresses, CIDR notation, domain names or special DNS labels
(such as *). For a full description of the expected
value format, refer to the
upstream
Golang documentation.
Your cluster is registered and assigned a state of
Pending. Rancher is deploying resources to manage your cluster.You can access your cluster after its state is updated to
Active.Activeclusters are assigned two projects:Default(containing the namespacedefault) andSystem(containing the namespacescattle-system,ingress-nginx,kube-publicandkube-system, if present).
You cannot re-register a cluster that is currently active in a Rancher setup.
3.3.3 Management capabilities for registered clusters #
The control that Rancher has to manage a registered cluster depends on the type of cluster.
3.3.3.1 Features for all registered clusters #
After registering a cluster, the cluster owner can:
Manage cluster access through role-based access control
Enable logging
Enable Istio
Manage projects and workloads
3.3.3.2 Additional features for registered RKE2 and SUSE Rancher Prime: K3s clusters #
SUSE Rancher Prime: K3s is a lightweight, fully compliant Kubernetes distribution for edge installations.
RKE2 is Rancher's next-generation Kubernetes distribution for data center and cloud installations.
When an RKE2 or SUSE Rancher Prime: K3s cluster is registered in Rancher, Rancher will recognize it. The Rancher UI will expose features available to Section 3.3.3.1, “Features for all registered clusters”, along with the following options for editing and upgrading the cluster:
Enable or disable version management.
Upgrade the Kubernetes version when version management is enabled.
Configure the upgrade strategy.
View a read-only version of the cluster’s configuration arguments and environment variables used to launch each node.
3.3.4 Configuring version management for RKE2 and SUSE Rancher Prime: K3s clusters #
When version management is enabled for an imported cluster, upgrading it outside of Rancher may lead to unexpected consequences.
The version management feature for imported RKE2 and SUSE Rancher Prime: K3s clusters can be configured using one of the following options:
Global default (default): Inherits behavior from the global
imported-cluster-version-managementsetting.True: Enables version management, allowing users to control the Kubernetes version and upgrade strategy of the cluster through Rancher.
False: Disables version management, enabling users to manage the cluster’s Kubernetes version independently, outside of Rancher.
You can define the default behavior for newly created clusters or existing
ones set to “Global default” by modifying the
imported-cluster-version-management setting.
Changes to the global
imported-cluster-version-management setting take effect
during the cluster’s next reconciliation cycle.
If version management is enabled for a cluster, Rancher will deploy
the system-upgrade-controller app, along with the
associated plans and other required Kubernetes resources, to the cluster. If
version management is disabled, Rancher will remove these components
from the cluster.
3.3.5 Configuring RKE2 and SUSE Rancher Prime: K3s cluster upgrades #
It is a Kubernetes best practice to back up the cluster before upgrading. When upgrading a high-availability SUSE Rancher Prime: K3s cluster with an external database, back up the database in whichever way is recommended by the relational database provider.
The concurrency is the maximum number of nodes that are permitted to be unavailable during an upgrade. If the number of unavailable nodes is larger than the concurrency, the upgrade will fail. If an upgrade fails, you may need to repair or remove failed nodes before the upgrade can succeed.
Control plane concurrency: the maximum number of server nodes to upgrade at a single time; also the maximum unavailable server nodes
Worker concurrency: the maximum number of worker nodes to upgrade at the same time; also the maximum unavailable worker nodes
In the RKE2 and SUSE Rancher Prime: K3s documentation, control plane nodes are called server nodes. These nodes run the Kubernetes master, which maintains the desired state of the cluster. By default, these control plane nodes can have workloads scheduled to them by default.
Also in the RKE2 and SUSE Rancher Prime: K3s documentation, nodes with the worker role are called agent nodes. Any workloads or pods that are deployed in the cluster can be scheduled to these nodes by default.
3.3.6 Debug logging and troubleshooting for registered RKE2 and SUSE Rancher Prime: K3s clusters #
Nodes are upgraded by the system upgrade controller running in the downstream cluster. Based on the cluster configuration, Rancher deploys two plans to upgrade nodes: one for control plane nodes and one for workers. The system upgrade controller follows the plans and upgrades the nodes.
To enable debug logging on the system upgrade controller deployment, edit
the
configmap
to set the debug environment variable to true. Then restart the
system-upgrade-controller pod.
Logs created by the
system-upgrade-controller can be
viewed by running this command:
> kubectl logs -n cattle-system system-upgrade-controllerThe current status of the plans can be viewed with this command:
> kubectl get plans -A -o yaml
If the cluster becomes stuck during upgrading, restart the
system-upgrade-controller.
To prevent issues when upgrading, the Kubernetes upgrade best practices should be followed.
3.3.7 Authorized cluster endpoint support for RKE2 and SUSE Rancher Prime: K3s clusters #
Rancher supports Authorized Cluster Endpoints (ACE) for registered RKE2 and SUSE Rancher Prime: K3s clusters. This support includes manual steps you will perform on the downstream cluster to enable the ACE. For additional information on the authorized cluster endpoint, refer to How the Authorized Cluster Endpoint Works.
These steps only need to be performed on the control plane nodes of the downstream cluster. You must configure each control plane node individually.
The following steps will work on both RKE2 and SUSE Rancher Prime: K3s clusters registered in v2.6.x as well as those registered (or imported) from a previous version of Rancher with an upgrade to v2.6.x.
These steps will alter the configuration of the downstream RKE2 and SUSE Rancher Prime: K3s clusters and deploy the
kube-api-authn-webhook. If a future implementation of the ACE requires an update to thekube-api-authn-webhook, then this would also have to be done manually. For more information on this webhook, see Authentication webhook documentation.
Create a file at
/var/lib/rancher/{rke2,k3s}/kube-api-authn-webhook.yamlwith the following contents:apiVersion: v1 kind: Config clusters: - name: Default cluster: insecure-skip-tls-verify: true server: http://127.0.0.1:6440/v1/authenticate users: - name: Default user: insecure-skip-tls-verify: true current-context: webhook contexts: - name: webhook context: user: Default cluster: DefaultAdd the following to the configuration file (or create one if it does not exist). Note that the default location is
/etc/rancher/{rke2,k3s}/config.yaml:kube-apiserver-arg: - authentication-token-webhook-config-file=/var/lib/rancher/{rke2,k3s}/kube-api-authn-webhook.yamlRun the following commands:
>sudosystemctl stop {rke2,k3s}-server>sudosystemctl start {rke2,k3s}-serverFinally, you must go back to the Rancher UI and edit the imported cluster there to complete the ACE enablement. Click on > , then click the tab under . Finally, click the button for . Once the ACE is enabled, you then have the option of entering a fully qualified domain name (FQDN) and certificate information.
The field is optional, and if one is entered, it should point to the downstream cluster. Certificate information is only needed if there is a load balancer in front of the downstream cluster that is using an untrusted certificate. If you have a valid certificate, then nothing needs to be added to the field.
3.3.8 Annotating registered clusters #
For all types of registered Kubernetes clusters except for RKE2 and SUSE Rancher Prime: K3s Kubernetes clusters, Rancher does not have any information about how the cluster is provisioned or configured.
Therefore, when Rancher registers a cluster, it assumes that several capabilities are disabled by default. Rancher assumes this to avoid exposing UI options to the user even when the capabilities are not enabled in the registered cluster.
However, if the cluster has a certain capability, a user of that cluster might still want to select the capability for the cluster in the Rancher UI. To do that, the user will need to manually indicate to Rancher that certain capabilities are enabled for the cluster.
By annotating a registered cluster, it is possible to indicate to Rancher that a cluster was given additional capabilities outside of Rancher.
The following annotation indicates Ingress capabilities. Note that the values of non-primitive objects need to be JSON-encoded, with quotations escaped.
"capabilities.cattle.io/ingressCapabilities": "[
{
"customDefaultBackend":true,
"ingressProvider":"asdf"
}
]"These capabilities can be annotated for the cluster:
ingressCapabilitiesloadBalancerCapabilitiesnodePoolScalingSupportednodePortRangetaintSupport
All the capabilities and their type definitions can be viewed in the
Rancher API view, at
RANCHER_SERVER_URL/v3/schemas/capabilities.
To annotate a registered cluster,
Click > .
On the page, go to the custom cluster you want to annotate and click > .
Expand the section.
Click .
Add an annotation to the cluster with the format
capabilities/<capability>: <value>wherevalueis the cluster capability that will be overridden by the annotation. In this scenario, Rancher is not aware of any capabilities of the cluster until you add the annotation.Click .
The annotation does not give the capabilities to the cluster, but it does indicate to Rancher that the cluster has those capabilities.
3.4 Assigning GPU nodes to applications #
When deploying a containerized application to Kubernetes, you need to ensure that containers requiring GPU resources are run on appropriate worker nodes. For example, Ollama, a core component of SUSE AI, can deeply benefit from the use of GPU acceleration. This topic describes how to satisfy this requirement by explicitly requesting GPU resources and labeling worker nodes for configuring the node selector.
Kubernetes cluster—such as SUSE Rancher Prime: RKE2—must be available and configured with more than one worker node in which certain nodes have NVIDIA GPU resources and others do not.
This document assumes that any kind of deployment to the Kubernetes cluster is done using Helm charts.
3.4.1 Labeling GPU nodes #
To distinguish nodes with the GPU support from non-GPU nodes, Kubernetes uses
labels. Labels are used for relevant metadata and
should not be confused with annotations that provide simple information
about a resource. It is possible to manipulate labels with the
kubectl command, as well as by tweaking configuration
files from the nodes. If an IaC tool such as Terraform is used, labels can
be inserted in the node resource configuration files.
To label a single node, use the following command:
>kubectl label node GPU_NODE_NAME accelerator=nvidia-gpu
To achieve the same result by tweaking the node.yaml
node configuration, add the following content and apply the changes with
kubectl apply -f node.yaml:
apiVersion: v1
kind: Node
metadata:
name: node-name
labels:
accelerator: nvidia-gpuTo label multiple nodes, use the following command:
>kubectl label node \ GPU_NODE_NAME1 \ GPU_NODE_NAME2 ... \ accelerator=nvidia-gpu
If Terraform is being used as an IaC tool, you can add labels to a group
of nodes by editing the .tf files and adding the
following values to a resource:
resource "node_group" "example" {
labels = {
"accelerator" = "nvidia-gpu"
}
}To check if the labels are correctly applied, use the following command:
>kubectl get nodes --show-labels
3.4.2 Assigning GPU nodes #
The matching between a container and a node is configured by the explicit resource allocation and the use of labels and node selectors. The use cases described below focus on NVIDIA GPUs.
3.4.2.1 Enable GPU passthrough #
Containers are isolated from the host environment by default. For the containers that rely on the allocation of GPU resources, their Helm charts must enable GPU passthrough so that the container can access and use the GPU resource. Without enabling the GPU passthrough, the container may still run, but it can only use the main CPU for all computations. Refer to Ollama Helm chart for an example of the configuration required for GPU acceleration.
3.4.2.2 Assignment by resource request #
After the NVIDIA GPU Operator is configured on a node, you can instantiate
applications requesting the resource nvidia.com/gpu
provided by the operator. Add the following content to your
values.yaml file. Specify the number of GPUs
according to your setup.
resources:
requests:
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 13.4.2.3 Assignment by labels and node selectors #
If affected cluster nodes are labeled with a label such as
accelerator=nvidia-gpu, you can configure the node
selector to check for the label. In this case, use the following values
in your values.yaml file.
nodeSelector: accelerator: nvidia-gpu
3.4.3 Verifying Ollama GPU assignment #
If the GPU is correctly detected, the Ollama container logs this event:
| [...] source=routes.go:1172 msg="Listening on :11434 (version 0.0.0)" │ │ [...] source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2502346830/runners │ │ [...] source=payload.go:44 msg="Dynamic LLM libraries [cuda_v12 cpu cpu_avx cpu_avx2]" │ │ [...] source=gpu.go:204 msg="looking for compatible GPUs" │ │ [...] source=types.go:105 msg="inference compute" id=GPU-c9ad37d0-d304-5d2a-c2e6-d3788cd733a7 library=cuda compute │
3.5 Setting up SUSE Observability for SUSE AI #
SUSE Observability provides comprehensive monitoring and insights into your infrastructure and applications. It enables efficient tracking of metrics, logs and traces, helping you maintain optimal performance and troubleshoot issues effectively. This procedure guides you through setting up SUSE Observability for the SUSE AI environment using the SUSE AI Observability Extension.
3.5.1 Deployment scenarios #
You can deploy SUSE Observability and SUSE AI in two different ways:
Single-Cluster setup: Both SUSE AI and SUSE Observability are installed in the same Kubernetes cluster. This is a simpler approach ideal for testing and proof-of-concept deployments. Communication between components can use internal cluster DNS.
Multi-Cluster setup: SUSE AI and SUSE Observability are installed on separate, dedicated Kubernetes clusters. This setup is recommended for production environments because it isolates workloads. Communication requires exposing the SUSE Observability endpoints externally, for example, via an Ingress.
This section provides instructions for both scenarios.
3.5.2 Requirements #
To set up SUSE Observability for SUSE AI, you need to meet the following requirements:
Have access to SUSE Application Collection
Have a valid SUSE AI subscription
Have a valid license for SUSE Observability in SUSE Customer Center
Instrument your applications for telemetry data acquisition with OpenTelemetry.
For details on how to collect traces and metrics from SUSE AI components and user-developed applications, refer to Monitoring SUSE AI with OpenTelemetry and SUSE Observability. It includes configurations that are essential for full observability.
Applications from the SUSE Application Collection are not instrumented by default. If you want to monitor your AI applications, you need to follow the instrumentation guidelines that we provide in the document Monitoring SUSE AI with OpenTelemetry and SUSE Observability.
3.5.3 Setup process overview #
The following chart shows the high-level steps for the setup procedure. You will first set up the SUSE Observability cluster, then configure the SUSE AI cluster, and finally instrument your applications. Execute the steps in each column from left to right and top to bottom.
Blue steps are related to Helm chart installations.
Gray steps represent another type of interaction, such as coding.
You can create and configure Kubernetes clusters for SUSE AI and SUSE Observability as you prefer. If you are using SUSE Rancher Prime, check its documentation. For testing purposes, you can even share one cluster for both deployments. You can skip instructions on setting up a specific cluster if you already have one configured.
The diagram below shows the result of the above steps. There are two clusters represented, one for the SUSE Observability workload and another one for SUSE AI. You may use identical setup or customize it for your environment.
You can install SUSE AI Observability Extension alongside SUSE Observability. It means that you can confidently use the internal Kubernetes DNS.
SUSE Observability contains several components and the following two of them need to be accessible by the AI Cluster:
The Collector endpoint. Refer to Exposing SUSE Observability outside of the cluster or Self-hosted SUSE Observability for details about exposing it.
The SUSE Observability API. Refer to Exposing SUSE Observability outside of the cluster for details about exposing it.
Milvus metrics and traces can be scraped by the OpenTelemetry Collector with simple configurations, provided below. The same is true for GPU metrics.
To get information from Open WebUI or Ollama, you must have a specific instrumentation set. It can be an application instrumented with the OpenLIT SDK or other form of instrumentation following the same patterns.
ImportantRemember that in multi-cluster setups, it is critical to properly expose your endpoints. Configure TLS, be careful with the configuration, and make sure to provide the right keys and tokens. More details are provided in the respective instructions.
3.5.4 Setting up the SUSE Observability cluster #
This initial step is identical for both single-cluster and multi-cluster deployments.
Install SUSE Observability. You can follow the official SUSE Observability installation documentation for all installation instructions. Remember to expose your APIs and collector endpoints to your SUSE AI cluster.
Important: Multi-cluster setupFor multi-cluster setups, you must expose the SUSE Observability API and collector endpoints so that the SUSE AI cluster can reach them. Refer to the guide on exposing SUSE Observability outside of the cluster.
Install the SUSE Observability extension. Create a new Helm values file named
genai_values.yaml. Before creating the file, review the placeholders below.- SUSE_OBSERVABILITY_API_URL
The URL of the SUSE Observability API. For multi-cluster deployments, this is the external URL. For single-cluster deployments, this can be the internal service URL. Example:
http://suse-observability-api.your-domain.com- SUSE_OBSERVABILITY_API_KEY
The API key from the
baseConfig_values.yamlfile used during the SUSE Observability installation.- SUSE_OBSERVABILITY_API_TOKEN_TYPE
Can be
apifor a token from the Web UI orservicefor a Service Token.- SUSE_OBSERVABILITY_TOKEN
The API or Service token itself.
- OBSERVED_SERVER_NAME
The name of the cluster to observe. It must match the name used in the Kubernetes StackPack configuration. Example:
suse-ai-cluster.
Create the
genai_values.yamlfile with the following content:global: imagePullSecrets: - application-collection 1 serverUrl: SUSE_OBSERVABILITY_API_URL apiKey: SUSE_OBSERVABILITY_API_KEY tokenType: SUSE_OBSERVABILITY_API_TOKEN_TYPE apiToken: SUSE_OBSERVABILITY_TOKEN clusterName: OBSERVED_SERVER_NAMEInstructs Helm to use credentials from the SUSE Application Collection. For instructions on how to configure the image pull secrets for the SUSE Application Collection, refer to the official documentation.
Run the install command.
>helm upgrade --install ai-obs \ oci://dp.apps.rancher.io/charts/suse-ai-observability-extension \ -f genai_values.yaml --namespace so-extensions --create-namespaceNote: Self-signed certificates not supportedSelf-signed certificates are not supported. Consider running the extension in the same cluster as SUSE Observability and then use the internal Kubernetes address.
After the installation is complete, a new menu called is added to the Web interface and also a Kubernetes cron job is created that synchronizes the topology view with the components found in the SUSE AI cluster.
Verify SUSE Observability extension. After the installation, you can verify that a new lateral menu appears:
3.5.5 Setting up the SUSE AI cluster #
Follow the instructions for your deployment scenario.
- Single-cluster deployment
In this setup, the SUSE AI components are installed in the same cluster as SUSE Observability and can communicate using internal service DNS.
- Multi-cluster deployment
In this setup, the SUSE AI cluster is separate. Communication relies on externally exposed endpoints of the SUSE Observability cluster.
The difference between deployment scenarios affects the OTEL Collector exporter configuration and the SUSE Observability Agent URL as described in the following list.
- SUSE_OBSERVABILITY_API_URL
The URL of the SUSE Observability API.
Single-cluster example: http://suse-observability-otel-collector.suse-observability.svc.cluster.local:4317
Multi-cluster example: https://suse-observability-api.your-domain.com
- SUSE_OBSERVABILITY_COLLECTOR_ENDPOINT
The endpoint of the SUSE Observability Collector.
Single-cluster example: http://suse-observability-router.suse-observability.svc.cluster.local:8080/receiver/stsAgent
Multi-cluster example: https://suse-observability-router.your-domain.com/receiver/stsAgent
Install NVIDIA GPU Operator. Follow the instructions in https://documentation.suse.com/cloudnative/rke2/latest/en/advanced.html#_deploy_nvidia_operator.
Install OpenTelemetry collector. Create a secret with your SUSE Observability API key in the namespace where you want to install the collector. Retrieve the API key using the Web UI or from the
baseConfig_values.yamlfile that you used during the SUSE Observability installation. If the namespace does not exist yet, create it.kubectl create namespace observability kubectl create secret generic open-telemetry-collector \ --namespace observability \ --from-literal=API_KEY='SUSE_OBSERVABILITY_API_KEY'
Create a new file named
otel-values.yamlwith the following content.global: imagePullSecrets: - application-collection extraEnvsFrom: - secretRef: name: open-telemetry-collector mode: deployment ports: metrics: enabled: true presets: kubernetesAttributes: enabled: true extractAllPodLabels: true config: receivers: prometheus: config: scrape_configs: - job_name: 'gpu-metrics' scrape_interval: 10s scheme: http kubernetes_sd_configs: - role: endpoints namespaces: names: - gpu-operator - job_name: 'milvus' scrape_interval: 15s metrics_path: '/metrics' static_configs: - targets: ['MILVUS_SERVICE_NAME.SUSE_AI_NAMESPACE.svc.cluster.local:9091'] 1 exporters: otlp: endpoint: https://OPEN_TELEMETRY_COLLECTOR_NAME.suse-observability.svc.cluster.local:4317 2 headers: Authorization: "SUSEObservability ${env:API_KEY}" tls: insecure: true processors: tail_sampling: decision_wait: 10s policies: - name: rate-limited-composite type: composite composite: max_total_spans_per_second: 500 policy_order: [errors, slow-traces, rest] composite_sub_policy: - name: errors type: status_code status_code: status_codes: [ ERROR ] - name: slow-traces type: latency latency: threshold_ms: 1000 - name: rest type: always_sample rate_allocation: - policy: errors percent: 33 - policy: slow-traces percent: 33 - policy: rest percent: 34 resource: attributes: - key: k8s.cluster.name action: upsert value: CLUSTER_NAME 3 - key: service.instance.id from_attribute: k8s.pod.uid action: insert filter/dropMissingK8sAttributes: error_mode: ignore traces: span: - resource.attributes["k8s.node.name"] == nil - resource.attributes["k8s.pod.uid"] == nil - resource.attributes["k8s.namespace.name"] == nil - resource.attributes["k8s.pod.name"] == nil connectors: spanmetrics: metrics_expiration: 5m namespace: otel_span routing/traces: error_mode: ignore table: - statement: route() pipelines: [traces/sampling, traces/spanmetrics] service: extensions: - health_check pipelines: traces: receivers: [otlp, jaeger] processors: [filter/dropMissingK8sAttributes, memory_limiter, resource] exporters: [routing/traces] traces/spanmetrics: receivers: [routing/traces] processors: [] exporters: [spanmetrics] traces/sampling: receivers: [routing/traces] processors: [tail_sampling, batch] exporters: [debug, otlp] metrics: receivers: [otlp, spanmetrics, prometheus] processors: [memory_limiter, resource, batch] exporters: [debug, otlp]Configure the Milvus service and namespace for the Prometheus scraper. Because Milvus will be installed in subsequent steps, you can return to this step and edit the endpoint if necessary.
Set the exporter to your exposed SUSE Observability collector. Remember that the value can be distinct, depending on the deployment pattern. For production usage, we recommend using TLS communication.
Replace CLUSTER_NAME with the cluster's name.
Finally, run the installation command.
>helm upgrade --install opentelemetry-collector \ oci://dp.apps.rancher.io/charts/opentelemetry-collector \ -f otel-values.yaml --namespace observabilityVerify the installation by checking the existence of a new deployment and service in the observability namespace.
The GPU metrics scraper that we configure in the OTEL Collector requires custom RBAC rules. Create a file named
otel-rbac.yamlwith the following content:apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: suse-observability-otel-scraper rules: - apiGroups: - "" resources: - services - endpoints verbs: - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: suse-observability-otel-scraper roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: suse-observability-otel-scraper subjects: - kind: ServiceAccount name: opentelemetry-collector namespace: observabilityThen apply the configuration by running the following command.
>kubectl apply -n gpu-operator -f otel-rbac.yamlInstall the SUSE Observability Agent.
>helm upgrade --install \ --namespace suse-observability --create-namespace \ --set-string 'stackstate.apiKey'='YOUR_API_KEY'1 \ --set-string 'stackstate.cluster.name'='CLUSTER_NAME2' \ --set-string 'stackstate.url'='http://suse-observability-router.suse-observability.svc.cluster.local:8080/receiver/stsAgent'3 \ --set 'nodeAgent.skipKubeletTLSVerify'=true suse-observability-agent \ suse-observability/suse-observability-agentInstall SUSE AI. Refer to Section 4, “Installing applications from AI Library” for the complete procedure.
3.5.6 Instrument applications #
Instrumentation is the act of configuring your applications for telemetry data acquisition. Our stack employs OpenTelemetry standards as a vendor-neutral and open base for our telemetry. For a comprehensive guide on how to set up your instrumentation, please refer to Monitoring SUSE AI with OpenTelemetry and SUSE Observability.
By following the instructions in the document referenced above, you will be able to retrieve all relevant telemetry data from Open WebUI, Ollama and Milvus by simply applying specific configuration to their Helm chart values. You can find links for advanced use cases (auto-instrumentation with OTEL Operator) at the end of the document.
4 Installing applications from AI Library #
SUSE AI is delivered as a set of components that you can combine to meet specific use cases. To enable the full integrated stack, you need to deploy multiple applications in sequence. Applications with the fewest dependencies must be installed first, followed by dependent applications once their required dependencies are in place within the cluster.
You can either install required AI Library components manually using their Helm charts, or use SUSE AI Deployer to include all the dependencies in one step.
4.1 Installation procedure #
This procedure includes steps to install AI Library applications.
Purchase the SUSE AI entitlement. It is a separate entitlement from SUSE Rancher Prime.
Access SUSE AI via the SUSE Application Collection at https://apps.rancher.io/ to perform the check for the SUSE AI entitlement.
If the entitlement check is successful, you are given access to the SUSE AI-related Helm charts and container images, and can deploy directly from the SUSE Application Collection.
Visit the SUSE Application Collection, sign in and get the user access token as described in https://docs.apps.rancher.io/get-started/authentication/.
Create a Kubernetes namespace if it does not already exist. The steps in this procedure assume that all containers are deployed into the same namespace referred to as SUSE_AI_NAMESPACE. Replace its name to match your preferences.
>kubectl create namespace SUSE_AI_NAMESPACECreate the SUSE Application Collection secret.
>kubectl create secret docker-registry application-collection \ --docker-server=dp.apps.rancher.io \ --docker-username=APPCO_USERNAME \ --docker-password=APPCO_USER_TOKEN \ -n SUSE_AI_NAMESPACELog in to the Helm registry.
>helm registry login dp.apps.rancher.io/charts \ -u APPCO_USERNAME \ -p APPCO_USER_TOKENInstall cert-manager as described in Section 4.2, “Installing cert-manager”.
Install AI Library components. You can either install each component separately, or use the SUSE AI Deployer chart to install the components together as described in Section 4.7, “Installing AI Library components using SUSE AI Deployer”.
Install Milvus as described in Section 4.3, “Installing Milvus”.
(Optional) Install Ollama as described in Section 4.4, “Installing Ollama”.
Install Open WebUI as described in Section 4.5, “Installing Open WebUI”.
Install vLLM as described in Section 4.6, “Installing vLLM”.
4.2 Installing cert-manager #
cert-manager is an extensible X.509 certificate controller for Kubernetes workloads. It supports certificates from popular public issuers as well as private issuers. cert-manager ensures that the certificates are valid and up-to-date, and attempts to renew certificates at a configured time before expiry.
In previous releases, cert-manager was automatically installed together with Open WebUI. Currently, cert-manager is no longer part of the Open WebUI Helm chart and you need to install it separately.
4.2.1 Details about the cert-manager application #
Before deploying cert-manager, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:
helm show values oci://dp.apps.rancher.io/charts/cert-manager
Alternatively, you can also refer to the cert-manager Helm chart page on the SUSE Application Collection site at https://apps.rancher.io/applications/cert-manager. It contains available versions and the link to pull the cert-manager container image.
4.2.2 cert-manager installation procedure #
Before the installation, you need to get user access to the SUSE Application Collection, create a Kubernetes namespace, and log in to the Helm registry as described in Section 4.1, “Installation procedure”.
Install the cert-manager chart.
>helm upgrade --install cert-manager \ oci://dp.apps.rancher.io/charts/cert-manager \ -n CERT_MANAGER_NAMESPACE \ --set crds.enabled=true \ --set 'global.imagePullSecrets[0].name'=application-collection
4.2.3 Upgrading cert-manager #
To upgrade cert-manager to a specific new version, run the following command:
>helm upgrade --install cert-manager \ oci://dp.apps.rancher.io/charts/cert-manager \ -n CERT_MANAGER_NAMESPACE \ --version VERSION_NUMBER
To upgrade cert-manager to the latest version, run the following command:
>helm upgrade --install cert-manager \ oci://dp.apps.rancher.io/charts/cert-manager \ -n CERT_MANAGER_NAMESPACE
4.2.4 Uninstalling cert-manager #
To uninstall cert-manager, run the following command:
>helm uninstall cert-manager -n CERT_MANAGER_NAMESPACE
4.3 Installing Milvus #
Milvus is a scalable, high-performance vector database designed for AI applications. It enables efficient organization and searching of massive unstructured datasets, including text, images and multi-modal content. This procedure walks you through the installation of Milvus and its dependencies.
4.3.1 Details about the Milvus application #
Before deploying Milvus, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:
helm show values oci://dp.apps.rancher.io/charts/milvus
Alternatively, you can also refer to the Milvus Helm chart page on the SUSE Application Collection site at https://apps.rancher.io/applications/milvus. It contains Milvus dependencies, available versions and the link to pull the Milvus container image.
4.3.2 Milvus installation procedure #
Before the installation, you need to get user access to the SUSE Application Collection, create a Kubernetes namespace, and log in to the Helm registry as described in Section 4.1, “Installation procedure”.
When installed as part of SUSE AI, Milvus depends on etcd, MinIO and Apache Kafka. Because the Milvus chart uses a non-default configuration, create an override file
milvus_custom_overrides.yamlwith the following content.TipAs a template, you can download the Milvus Helm chart that includes the
values.yamlfile with the default configuration by running the following command:>helm pull oci://dp.apps.rancher.io/charts/milvus --version 4.2.2global: imagePullSecrets: - application-collection cluster: enabled: True standalone: persistence: persistentVolumeClaim: storageClassName: "local-path" etcd: replicaCount: 1 persistence: storageClassName: "local-path" minio: mode: distributed replicas: 4 rootUser: "admin" rootPassword: "adminminio" persistence: storageClass: "local-path" resources: requests: memory: 1024Mi kafka: enabled: true name: kafka replicaCount: 3 broker: enabled: true cluster: listeners: client: protocol: 'PLAINTEXT' controller: protocol: 'PLAINTEXT' persistence: enabled: true annotations: {} labels: {} existingClaim: "" accessModes: - ReadWriteOnce resources: requests: storage: 8Gi storageClassName: "local-path" extraConfigFiles:1 user.yaml: |+ trace: exporter: jaeger sampleFraction: 1 jaeger: url: "http://opentelemetry-collector.observability.svc.cluster.local:14268/api/traces"2The
extraConfigFilessection is optional, required only to receive telemetry data from Open WebUI.The URL of the OpenTelemetry Collector installed by the user.
TipThe above example uses local storage. For production environments, we recommend using an enterprise class storage solution such as SUSE Storage in which case the
storageClassNameoption must be set tolonghorn.Install the Milvus Helm chart using the
milvus_custom_overrides.yamloverride file.>helm upgrade --install \ milvus oci://dp.apps.rancher.io/charts/milvus \ -n SUSE_AI_NAMESPACE \ --version 4.2.2 -f milvus_custom_overrides.yaml
4.3.2.1 Using Apache Kafka with SUSE Storage #
When Milvus is deployed in cluster mode, it uses Apache Kafka as a message queue. If Apache Kafka uses SUSE Storage as a storage back-end, you need to create an XFS storage class and make it available for the Apache Kafka deployment. Otherwise deploying Apache Kafka with a storage class of an Ext4 file system fails with the following error:
"Found directory /mnt/kafka/logs/lost+found, 'lost+found' is not in the form of topic-partition or topic-partition.uniqueId-delete (if marked for deletion)"
To introduce the XFS storage class, follow these steps:
Create a file named
longhorn-xfs.yamlwith the following content:apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: longhorn-xfs provisioner: driver.longhorn.io allowVolumeExpansion: true reclaimPolicy: Delete volumeBindingMode: Immediate parameters: numberOfReplicas: "3" staleReplicaTimeout: "30" fromBackup: "" fsType: "xfs" dataLocality: "disabled" unmapMarkSnapChainRemoved: "ignored"
Create the new storage class using the
kubectlcommand.>kubectl apply -f longhorn-xfs.yamlUpdate the Milvus overrides YAML file to reference the Apache Kafka storage class, as in the following example:
[...] kafka: enabled: true persistence: storageClassName: longhorn-xfs
4.3.3 Upgrading Milvus #
The Milvus chart receives application updates and updates of the Helm
chart templates. New versions may include changes that require manual
steps. These steps are listed in the corresponding
README file. All Milvus dependencies are updated
automatically during Milvus upgrade.
To upgrade Milvus, identify the new version number and run the following command below:
>helm upgrade --install \ milvus oci://dp.apps.rancher.io/charts/milvus \ -n SUSE_AI_NAMESPACE \ --version VERSION_NUMBER \ -f milvus_custom_overrides.yaml
4.3.4 Uninstalling Milvus #
To uninstall Milvus, run the following command:
>helm uninstall milvus -n SUSE_AI_NAMESPACE
4.4 Installing Ollama #
Ollama is a tool for running and managing language models locally on your computer. It offers a simple interface to download, run and interact with models without relying on cloud resources.
When installing SUSE AI, Ollama is installed by the Open WebUI installation by default. If you decide to install Ollama separately, disable its installation during the installation of Open WebUI as outlined in Example 6, “Open WebUI override file with Ollama installed separately”.
4.4.1 Details about the Ollama application #
Before deploying Ollama, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:
helm show values oci://dp.apps.rancher.io/charts/ollama
Alternatively, you can also refer to the Ollama Helm chart page on the SUSE Application Collection site at https://apps.rancher.io/applications/ollama. It contains the available versions and a link to pull the Ollama container image.
4.4.2 Ollama installation procedure #
Before the installation, you need to get user access to the SUSE Application Collection, create a Kubernetes namespace, and log in to the Helm registry as described in Section 4.1, “Installation procedure”.
Create the
ollama_custom_overrides.yamlfile to override the values of the parent Helm chart. Refer to Section 4.4.5, “Values for the Ollama Helm chart” for more details.Install the Ollama Helm chart using the
ollama-custom-overrides.yamloverride file.>helm upgrade \ --install ollama oci://dp.apps.rancher.io/charts/ollama \ -n SUSE_AI_NAMESPACE \ -f ollama_custom_overrides.yamlTip: Hugging Face modelsModels downloaded from Hugging Face need to be converted before they can be used by Ollama. Refer to https://github.com/ollama/ollama/blob/main/docs/import.md for more details.
4.4.3 Uninstalling Ollama #
To uninstall Ollama, run the following command:
>helm uninstall ollama -n SUSE_AI_NAMESPACE
4.4.4 Upgrading Ollama #
You can upgrade Ollama to a specific version by running the following command:
>helm upgrade ollama oci://dp.apps.rancher.io/charts/ollama \ -n SUSE_AI_NAMESPACE \ --version OLLAMA_VERSION_NUMBER -f ollama_custom_overrides.yaml
If you omit the --version option, Ollama gets upgraded
to the latest available version.
4.4.4.1 Upgrading from version 0.x.x to 1.x.x #
The version 1.x.x introduces the ability to load models in memory at
startup. To reflect this, change ollama.models to
ollama.models.pull in the Ollama Helm chart to
avoid errors before upgrading, for example:
[...]
ollama:
models:
- "gemma:2b"
- "llama3.1"[...]
ollama:
models:
pull:
- "gemma:2b"
- "llama3.1"Without this change you may experience the following error when trying to upgrade from 0.x.x to 1.x.x.
coalesce.go:286: warning: cannot overwrite table with non table for
ollama.ollama.models (map[pull:[] run:[]])
Error: UPGRADE FAILED: template: ollama/templates/deployment.yaml:145:27:
executing "ollama/templates/deployment.yaml" at <.Values.ollama.models.pull>:
can't evaluate field pull in type interface {}4.4.5 Values for the Ollama Helm chart #
To override the default values during the Helm chart installation or update,
you can create an override YAML file with custom values. Then, apply these
values by specifying the path to the override file with the
-f option of the helm command.
Ollama can run optimized for NVIDIA GPUs if the following conditions are fulfilled:
The NVIDIA driver and NVIDIA GPU Operator are installed as described in Installing NVIDIA GPU Drivers on SLES or Installing NVIDIA GPU Drivers on SUSE Linux Micro.
The workloads are set to run on NVIDIA-enabled nodes as described in https://documentation.suse.com/suse-ai/1.0/html/AI-deployment-intro/index.html#ai-gpu-nodes-assigning.
If you do not want to use the NVIDIA GPU, remove the
gpu section from
ollama_custom_overrides.yaml or disable it.
ollama:
[...]
gpu:
enabled: false
type: 'nvidia'
number: 1global:
imagePullSecrets:
- application-collection
ingress:
enabled: false
defaultModel: "gemma:2b"
runtimeClassName: nvidia
ollama:
models:
pull:
- "gemma:2b"
- "llama3.1"
run:
- "gemma:2b"
- "llama3.1"
gpu:
enabled: true
type: 'nvidia'
number: 1
nvidiaResource: "nvidia.com/gpu"
persistentVolume:1
enabled: true
storageClass: local-path2
Without the | |
Use |
ollama:
models:
pull:
- llama2
run:
- llama2
persistentVolume:
enabled: true
storageClass: local-path1
ingress:
enabled: true
hosts:
- host: OLLAMA_API_URL
paths:
- path: /
pathType: Prefix
Use |
|
Key |
Type |
Default |
Description |
|---|---|---|---|
|
affinity |
object |
{} |
Affinity for pod assignment |
|
autoscaling.enabled |
bool |
false |
Enable autoscaling |
|
autoscaling.maxReplicas |
int |
100 |
Number of maximum replicas |
|
autoscaling.minReplicas |
int |
1 |
Number of minimum replicas |
|
autoscaling.targetCPUUtilizationPercentage |
int |
80 |
CPU usage to target replica |
|
extraArgs |
list |
[] |
Additional arguments on the output Deployment definition. |
|
extraEnv |
list |
[] |
Additional environment variables on the output Deployment definition. |
|
fullnameOverride |
string |
"" |
String to fully override template |
|
global.imagePullSecrets |
list |
[] |
Global override for container image registry pull secrets |
|
global.imageRegistry |
string |
"" |
Global override for container image registry |
|
hostIPC |
bool |
false |
Use the host’s IPC namespace |
|
hostNetwork |
bool |
false |
Use the host's network namespace |
|
hostPID |
bool |
false |
Use the host's PID namespace. |
|
image.pullPolicy |
string |
"IfNotPresent" |
Image pull policy to use for the Ollama container |
|
image.registry |
string |
"dp.apps.rancher.io" |
Image registry to use for the Ollama container |
|
image.repository |
string |
"containers/ollama" |
Image repository to use for the Ollama container |
|
image.tag |
string |
"0.3.6" |
Image tag to use for the Ollama container |
|
imagePullSecrets |
list |
[] |
Docker registry secret names as an array |
|
ingress.annotations |
object |
{} |
Additional annotations for the Ingress resource |
|
ingress.className |
string |
"" |
IngressClass that is used to implement the Ingress (Kubernetes 1.18+) |
|
ingress.enabled |
bool |
false |
Enable Ingress controller resource |
|
ingress.hosts[0].host |
string |
"ollama.local" | |
|
ingress.hosts[0].paths[0].path |
string |
"/" | |
|
ingress.hosts[0].paths[0].pathType |
string |
"Prefix" | |
|
ingress.tls |
list |
[] |
The TLS configuration for host names to be covered with this Ingress record |
|
initContainers |
list |
[] |
Init containers to add to the pod |
|
knative.containerConcurrency |
int |
0 |
Knative service container concurrency |
|
knative.enabled |
bool |
false |
Enable Knative integration |
|
knative.idleTimeoutSeconds |
int |
300 |
Knative service idle timeout seconds |
|
knative.responseStartTimeoutSeconds |
int |
300 |
Knative service response start timeout seconds |
|
knative.timeoutSeconds |
int |
300 |
Knative service timeout seconds |
|
livenessProbe.enabled |
bool |
true |
Enable livenessProbe |
|
livenessProbe.failureThreshold |
int |
6 |
Failure threshold for livenessProbe |
|
livenessProbe.initialDelaySeconds |
int |
60 |
Initial delay seconds for livenessProbe |
|
livenessProbe.path |
string |
"/" |
Request path for livenessProbe |
|
livenessProbe.periodSeconds |
int |
10 |
Period seconds for livenessProbe |
|
livenessProbe.successThreshold |
int |
1 |
Success threshold for livenessProbe |
|
livenessProbe.timeoutSeconds |
int |
5 |
Timeout seconds for livenessProbe |
|
nameOverride |
string |
"" |
String to partially override template (maintains the release name) |
|
nodeSelector |
object |
{} |
Node labels for pod assignment |
|
ollama.gpu.enabled |
bool |
false |
Enable GPU integration |
|
ollama.gpu.number |
int |
1 |
Specify the number of GPUs |
|
ollama.gpu.nvidiaResource |
string |
"nvidia.com/gpu" |
Only for NVIDIA cards; change to
|
|
ollama.gpu.type |
string |
"nvidia" |
GPU type: “nvidia” or “amd.” If “ollama.gpu.enabled” is enabled, the default value is “nvidia.” If set to “amd,” this adds the “rocm” suffix to the image tag if “image.tag” is not override. This is because AMD and CPU/CUDA are different images. |
|
ollama.insecure |
bool |
false |
Add insecure flag for pulling at container startup |
|
ollama.models |
list |
[] |
List of models to pull at container startup. The more you add, the longer the container takes to start if models are not present models: - llama2 - mistral |
|
ollama.mountPath |
string |
"" |
Override ollama-data volume mount path, default: "/root/.ollama" |
|
persistentVolume.accessModes |
list |
["ReadWriteOnce"] |
Ollama server data Persistent Volume access modes. Must match those of existing PV or dynamic provisioner, see https://kubernetes.io/docs/concepts/storage/persistent-volumes/. |
|
persistentVolume.annotations |
object |
{} |
Ollama server data Persistent Volume annotations |
|
persistentVolume.enabled |
bool |
false |
Enable persistence using PVC |
|
persistentVolume.existingClaim |
string |
"" |
If you want to bring your own PVC for persisting Ollama state,
pass the name of the created + ready PVC here. If set, this Chart
does not create the default PVC. Requires
|
|
persistentVolume.size |
string |
"30Gi" |
Ollama server data Persistent Volume size |
|
persistentVolume.storageClass |
string |
"" |
If persistentVolume.storageClass is present, and is set to either a dash (“-”) or empty string (“ /”), dynamic provisioning is disabled. Otherwise, the storageClassName for persistent volume claim is set to the given value specified by persistentVolume.storageClass. If persistentVolume.storageClass is absent, the default storage class is used for dynamic provisioning whenever possible. See https://kubernetes.io/docs/concepts/storage/storage-classes/ for more details. |
|
persistentVolume.subPath |
string |
"" |
Subdirectory of Ollama server data Persistent Volume to mount. Useful if the volume's root directory is not empty. |
|
persistentVolume.volumeMode |
string |
"" |
Ollama server data Persistent Volume Binding Mode. If empty (the default) or set to null, no volumeBindingMode specification is set, choosing the default mode. |
|
persistentVolume.volumeName |
string |
"" |
Ollama server Persistent Volume name. It can be used to force-attach the created PVC to a specific PV. |
|
podAnnotations |
object |
{} |
Map of annotations to add to the pods |
|
podLabels |
object |
{} |
Map of labels to add to the pods |
|
podSecurityContext |
object |
{} |
Pod Security Context |
|
readinessProbe.enabled |
bool |
true |
Enable readinessProbe |
|
readinessProbe.failureThreshold |
int |
6 |
Failure threshold for readinessProbe |
|
readinessProbe.initialDelaySeconds |
int |
30 |
Initial delay seconds for readinessProbe |
|
readinessProbe.path |
string |
"/" |
Request path for readinessProbe |
|
readinessProbe.periodSeconds |
int |
5 |
Period seconds for readinessProbe |
|
readinessProbe.successThreshold |
int |
1 |
Success threshold for readinessProbe |
|
readinessProbe.timeoutSeconds |
int |
3 |
Timeout seconds for readinessProbe |
|
replicaCount |
int |
1 |
Number of replicas |
|
resources.limits |
object |
{} |
Pod limit |
|
resources.requests |
object |
{} |
Pod requests |
|
runtimeClassName |
string |
"" |
Specify runtime class |
|
securityContext |
object |
{} |
Container Security Context |
|
service.annotations |
object |
{} |
Annotations to add to the service |
|
service.nodePort |
int |
31434 |
Service node port when service type is “NodePort” |
|
service.port |
int |
11434 |
Service port |
|
service.type |
string |
"ClusterIP" |
Service type |
|
serviceAccount.annotations |
object |
{} |
Annotations to add to the service account |
|
serviceAccount.automount |
bool |
true |
Whether to automatically mount a ServiceAccount's API credentials |
|
serviceAccount.create |
bool |
true |
Whether a service account should be created |
|
serviceAccount.name |
string |
"" |
The name of the service account to use. If not set and “create” is “true”, a name is generated using the full name template. |
|
tolerations |
list |
[] |
Tolerations for pod assignment |
|
topologySpreadConstraints |
object |
{} |
Topology Spread Constraints for pod assignment |
|
updateStrategy |
object |
{"type":""} |
How to replace existing pods. |
|
updateStrategy.type |
string |
"" |
Can be “Recreate” or “RollingUpdate”; default is “RollingUpdate” |
|
volumeMounts |
list |
[] |
Additional volumeMounts on the output Deployment definition |
|
volumes |
list |
[] |
Additional volumes on the output Deployment definition |
4.5 Installing Open WebUI #
Open WebUI is a Web-based user interface designed for interacting with AI models.
4.5.1 Details about the Open WebUI application #
Before deploying Open WebUI, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:
helm show values oci://dp.apps.rancher.io/charts/open-webui
Alternatively, you can also refer to the Open WebUI Helm chart page on the SUSE Application Collection site at https://apps.rancher.io/applications/open-webui. It contains available versions and the link to pull the Open WebUI container image.
4.5.2 Open WebUI installation procedure #
Before the installation, you need to get user access to the SUSE Application Collection, create a Kubernetes namespace, and log in to the Helm registry as described in Section 4.1, “Installation procedure”.
To install Open WebUI, you need to have the following:
An installed cert-manager. If cert-manager is not installed from previous Open WebUI releases, install it by following the steps in Section 4.2, “Installing cert-manager”.
Create the
owui_custom_overrides.yamlfile to override the values of the parent Helm chart. The file contains URLs for Milvus and Ollama and specifies whether a stand-alone Ollama deployment is used or whether Ollama is installed as part of the Open WebUI installation. Find more details in Section 4.5.5, “Examples of Open WebUI Helm chart override files”. For a list of all installation options with examples, refer to Section 4.5.6, “Values for the Open WebUI Helm chart”.Install the Open WebUI Helm chart using the
owui_custom_overrides.yamloverride file.>helm upgrade --install \ open-webui charts/open-webui-X.Y.Z.tgz \ -n SUSE_AI_NAMESPACE \ --version X.Y.Z -f owui_custom_overrides.yaml
4.5.3 Upgrading Open WebUI #
To upgrade Open WebUI to a specific new version, run the following command:
>helm upgrade --install open-webui \ oci://dp.apps.rancher.io/charts/open-webui \ -n SUSE_AI_NAMESPACE \ --version VERSION_NUMBER \ -f owui_custom_overrides.yaml
To upgrade Open WebUI to the latest version, run the following command:
>helm upgrade --install open-webui \ oci://dp.apps.rancher.io/charts/open-webui \ -n SUSE_AI_NAMESPACE \ -f owui_custom_overrides.yaml
4.5.4 Uninstalling Open WebUI #
To uninstall Open WebUI, run the following command:
>helm uninstall open-webui -n SUSE_AI_NAMESPACE
4.5.5 Examples of Open WebUI Helm chart override files #
To override the default values during the Helm chart installation or update,
you can create an override YAML file with custom values. Then, apply these
values by specifying the path to the override file with the
-f option of the helm command.
The following override file installs Ollama during the Open WebUI installation. Replace SUSE_AI_NAMESPACE with your Kubernetes namespace.
global: imagePullSecrets: - application-collection ollamaUrls: - http://open-webui-ollama.SUSE_AI_NAMESPACE.svc.cluster.local:11434 persistence: enabled: true storageClass: local-path1 ollama: enabled: true ingress: enabled: false defaultModel: "gemma:2b" ollama: models:2 - "gemma:2b" - "llama3.1" gpu:3 enabled: true type: 'nvidia' number: 1 persistentVolume:4 enabled: true storageClass: local-path5 pipelines: enabled: False persistence: storageClass: local-path6 ingress: enabled: true class: "" annotations: nginx.ingress.kubernetes.io/ssl-redirect: "true" host: suse-ollama-webui7 tls: true extraEnvVars: - name: DEFAULT_MODELS8 value: "gemma:2b" - name: DEFAULT_USER_ROLE value: "user" - name: WEBUI_NAME value: "SUSE AI" - name: GLOBAL_LOG_LEVEL value: INFO - name: RAG_EMBEDDING_MODEL value: "sentence-transformers/all-MiniLM-L6-v2" - name: VECTOR_DB value: "milvus" - name: MILVUS_URI value: http://milvus.SUSE_AI_NAMESPACE.svc.cluster.local:19530 - name: INSTALL_NLTK_DATASETS9 value: "true"
Specifies that two large language models (LLM) will be loaded in Ollama when the container starts. | |
Enables GPU support for Ollama. The | |
Without the | |
Use | |
Specifies the default LLM for Ollama. | |
Specifies the host name for the Open WebUI Web UI. | |
Installs the natural language toolkit (NLTK) datasets for Ollama. Refer to https://www.nltk.org/index.html for licensing information. |
The following override file installs Ollama separately from the Open WebUI installation. Replace SUSE_AI_NAMESPACE with your Kubernetes namespace.
global: imagePullSecrets: - application-collection ollamaUrls: - http://ollama.SUSE_AI_NAMESPACE.svc.cluster.local:11434 persistence: enabled: true storageClass: local-path1 ollama: enabled: false pipelines: enabled: False persistence: storageClass: local-path2 ingress: enabled: true class: "" annotations: nginx.ingress.kubernetes.io/ssl-redirect: "true" host: suse-ollama-webui tls: true extraEnvVars: - name: DEFAULT_MODELS3 value: "gemma:2b" - name: DEFAULT_USER_ROLE value: "user" - name: WEBUI_NAME value: "SUSE AI" - name: GLOBAL_LOG_LEVEL value: INFO - name: RAG_EMBEDDING_MODEL value: "sentence-transformers/all-MiniLM-L6-v2" - name: VECTOR_DB value: "milvus" - name: MILVUS_URI value: http://milvus.SUSE_AI_NAMESPACE.svc.cluster.local:19530 - name: ENABLE_OTEL4 value: "true" - name: OTEL_EXPORTER_OTLP_ENDPOINT5 value: http://opentelemetry-collector.observability.svc.cluster.local:43176
Use | |
Specifies the default LLM for Ollama. | |
These values are optional, required only to receive telemetry data from Open WebUI. | |
The URL of the OpenTelemetry Collector installed by the user. |
The following example shows how to extend the
extraEnvVars section of the Open WebUI override file to
connect to vLLM. Replace SUSE_AI_NAMESPACE
with your Kubernetes namespace.
extraEnvVars:
[...]
- name: OPENAI_API_BASE_URL
value: "http://vllm-router-service.SUSE_AI_NAMESPACE.svc.cluster.local:80/v1"
- name: OPENAI_API_KEY
value: "dummy" 1Open WebUI will require you to provide the OpenAI API key. |
If the Open WebUI installation has pipelines enabled besides the vLLM
deployment, you can extend the extraEnvVars section as
follows.
extraEnvVars: [...] - name: OPENAI_API_BASE_URLS value: "http://open-webui-pipelines.SUSE_AI_NAMESPACE.svc.cluster.local:9099;http://vllm-router-service.SUSE_AI_NAMESPACE.svc.cluster.local:80/v1" - name: OPENAI_API_KEYS value: "0p3n-w3bu!;dummy"
4.5.6 Values for the Open WebUI Helm chart #
To override the default values during the Helm chart installation or update,
you can create an override YAML file with custom values. Then, apply these
values by specifying the path to the override file with the
-f option of the helm command.
|
Key |
Type |
Default |
Description |
|---|---|---|---|
|
affinity |
object |
{} |
Affinity for pod assignment |
|
annotations |
object |
{} | |
|
cert-manager.enabled |
bool |
true | |
|
clusterDomain |
string |
"cluster.local" |
Value of cluster domain |
|
containerSecurityContext |
object |
{} |
Configure container security context, see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-containe. |
|
extraEnvVars |
list |
[{"name":"OPENAI_API_KEY", "value":"0p3n-w3bu!"}] |
Environment variables added to the Open WebUI deployment. Most up-to-date environment variables can be found in https://docs.openwebui.com/getting-started/env-configuration/. |
|
extraEnvVars[0] |
object |
{"name":"OPENAI_API_KEY","value":"0p3n-w3bu!"} |
Default API key value for Pipelines. It should be updated in a production deployment and changed to the required API key if not using Pipelines. |
|
global.imagePullSecrets |
list |
[] |
Global override for container image registry pull secrets |
|
global.imageRegistry |
string |
"" |
Global override for container image registry |
|
global.tls.additionalTrustedCAs |
bool |
false | |
|
global.tls.issuerName |
string |
"suse-private-ai" | |
|
global.tls.letsEncrypt.email |
string |
"none@example.com" | |
|
global.tls.letsEncrypt.environment |
string |
"staging" | |
|
global.tls.letsEncrypt.ingress.class |
string |
"" | |
|
global.tls.source |
string |
"suse-private-ai" |
The source of Open WebUI TLS keys, see Section 4.5.6.1, “TLS sources”. |
|
image.pullPolicy |
string |
"IfNotPresent" |
Image pull policy to use for the Open WebUI container |
|
image.registry |
string |
"dp.apps.rancher.io" |
Image registry to use for the Open WebUI container |
|
image.repository |
string |
"containers/open-webui" |
Image repository to use for the Open WebUI container |
|
image.tag |
string |
"0.3.32" |
Image tag to use for the Open WebUI container |
|
imagePullSecrets |
list |
[] |
Configure imagePullSecrets to use private registry, see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry. |
|
ingress.annotations |
object |
{"nginx.ingress.kubernetes.io/ssl-redirect":"true"} |
Use appropriate annotations for your Ingress controller, such as
|
|
ingress.class |
string |
"" | |
|
ingress.enabled |
bool |
true | |
|
ingress.existingSecret |
string |
"" | |
|
ingress.host |
string |
"" | |
|
ingress.tls |
bool |
true | |
|
nameOverride |
string |
"" | |
|
nodeSelector |
object |
{} |
Node labels for pod assignment |
|
ollama.enabled |
bool |
true |
Automatically install Ollama Helm chart from https://otwld.github.io/ollama-helm/. Configure the following Helm values. |
|
ollama.fullnameOverride |
string |
"open-webui-ollama" |
If enabling embedded Ollama, update fullnameOverride to your desired Ollama name value, or else it will use the default ollama.name value from the Ollama chart. |
|
ollamaUrls |
list |
[] |
A list of Ollama API endpoints. These can be added instead of automatically installing the Ollama Helm chart, or in addition to it. |
|
openaiBaseApiUrl |
string |
"" |
OpenAI base API URL to use. Defaults to the Pipelines service
endpoint when Pipelines are enabled, or to
|
|
persistence.accessModes |
list |
["ReadWriteOnce"] |
If using multiple replicas, you must update accessModes to ReadWriteMany. |
|
persistence.annotations |
object |
{} | |
|
persistence.enabled |
bool |
true | |
|
persistence.existingClaim |
string |
"" |
Use existingClaim to reuse an existing Open WebUI PVC instead of creating a new one. |
|
persistence.selector |
object |
{} | |
|
persistence.size |
string |
"2Gi" | |
|
persistence.storageClass |
string |
"" | |
|
pipelines.enabled |
bool |
false |
Automatically install Pipelines chart to extend Open WebUI functionality using Pipelines, see https://github.com/open-webui/pipelines. |
|
pipelines.extraEnvVars |
list |
[] |
This section can be used to pass the required environment variables to your pipelines (such as the Langfuse host name). |
|
podAnnotations |
object |
{} | |
|
podSecurityContext |
object |
{} |
Configure pod security context, see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-containe. |
|
replicaCount |
int |
1 | |
|
resources |
object |
{} | |
|
service |
object |
{"annotations":{},"containerPort":8080, "labels":{},"loadBalancerClass":"", "nodePort":"","port":80,"type":"ClusterIP"} |
Service values to expose Open WebUI pods to cluster |
|
tolerations |
list |
[] |
Tolerations for pod assignment |
|
topologySpreadConstraints |
list |
[] |
Topology Spread Constraints for pod assignment |
4.5.6.1 TLS sources #
There are three recommended options where Open WebUI can obtain TLS certificates for secure communication.
- Self-Signed TLS certificate
This is the default method. You need to install
cert-manageron the cluster to issue and maintain the certificates. This method generates a CA and signs the Open WebUI certificate using the CA.cert-managerthen manages the signed certificate.For this method, use the following Helm chart option:
global.tls.source=suse-private-ai
- Let's Encrypt
This method also uses
cert-manager, but it is combined with a special issuer for Let's Encrypt that performs all actions—including request and validation—to get the Let's Encrypt certificate issued. This configuration uses HTTP validation (HTTP-01) and therefore the load balancer must have a public DNS record and be accessible from the Internet.For this method, use the following Helm chart option:
global.tls.source=letsEncrypt
- Provide your own certificate
This method allows you to bring your own signed certificate to secure the HTTPS traffic. In this case, you must upload this certificate and associated key as PEM-encoded files named
tls.crtandtls.key.For this method, use the following Helm chart option:
global.tls.source=secret
4.6 Installing vLLM #
vLLM is an open-source high-performance inference and serving engine for large language models (LLMs). It is designed to maximize throughput and reduce latency by using an efficient memory management system that handles dynamic batching and streaming outputs. In short, vLLM makes running LLMs cheaper and faster in production.
Deploying vLLM on Kubernetes is a scalable and efficient way to serve machine learning models. This guide walks you through deploying vLLM using its Helm chart, which is part of AI Library. The Helm chart deploys the full vLLM production stack and enables you to run optimized LLM inference workloads on NVIDIA GPU in your Kubernetes cluster. It consists of the following components:
Serving Engine runs the model inference.
Router handles OpenAI-compatible API requests.
LMCache (optional) improves caching efficiency.
CacheServer (optional) is a distributed KV cache back-end.
4.6.1 Details about the vLLM application #
Before deploying vLLM, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:
helm show values oci://dp.apps.rancher.io/charts/vllm
Alternatively, you can also refer to the vLLM Helm chart page on the SUSE Application Collection site at https://apps.rancher.io/applications/vllm. It contains vLLM dependencies, available versions and the link to pull the vLLM container image.
4.6.2 vLLM installation procedure #
Before the installation, you need to get user access to the SUSE Application Collection, create a Kubernetes namespace, and log in to the Helm registry as described in Section 4.1, “Installation procedure”.
NVIDIA GPUs must be available in your Kubernetes cluster to successfully deploy and run vLLM.
The current release of SUSE AI vLLM does not support Ray and LoraController.
Create a
vllm_custom_overrides.yamlfile to override the default values of the Helm chart. Find examples of override files in Section 4.6.6, “Examples of vLLM Helm chart override files”.After saving the override file as
vllm_custom_overrides.yaml, apply its configuration with the following command.>helm upgrade --install \ vllm oci://dp.apps.rancher.io/charts/vllm \ -n SUSE_AI_NAMESPACE \ -f vllm_custom_overrides.yaml
4.6.3 Integrating vLLM with Open WebUI #
You can integrate vLLM in Open WebUI either using the Open WebUI Web user interface, or updating Open WebUI override file during Open WebUI deployment (see Example 7, “Open WebUI override file with a connection to vLLM”).
You must have Open WebUI administrator privileges to access configuration screens or settings mentioned in this section.
In the bottom left of the Open WebUI window, click your avatar icon to open the user menu and select .
Click the tab and select from the left menu.
In the section, add a new connection URL to the vLLM router service, for example:
http://vllm-router-service.SUSE_AI_NAMESPACE.svc.cluster.local:80/v1
Confirm with .
Figure 20: Adding a vLLM connection to Open WebUI #
4.6.4 Upgrading vLLM #
The vLLM chart receives application updates and updates of the Helm
chart templates. New versions may include changes that require manual
steps. These steps are listed in the corresponding
README file. All vLLM dependencies are updated
automatically during a vLLM upgrade.
To upgrade vLLM, identify the new version number and run the following command below:
>helm upgrade --install \ vllm oci://dp.apps.rancher.io/charts/vllm \ -n SUSE_AI_NAMESPACE \ --version VERSION_NUMBER \ -f vllm_custom_overrides.yaml
If you omit the --version option, vLLM gets upgraded
to the latest available version.
The helm upgrade command performs a rolling update on
Deployments or StatefulSets with the following conditions:
The old pod stays running until the new pod passes readiness checks.
If the cluster is already at GPU capacity, the new pod cannot start because there is no GPU left to schedule it. This requires patching the deployment using the Recreate update strategy. The following commands identify the vLLM deployment name and patch its deployment.
>kubectl get deployments -n SUSE_AI_NAMESPACE>kubectl patch deployment VLLM_DEPLOYMENT_NAME \ -n SUSE_AI_NAMESPACE \ -p '{"spec": {"strategy": {"type": "Recreate", "rollingUpdate": null}}}'
4.6.5 Uninstalling vLLM #
To uninstall vLLM, run the following command:
>helm uninstall vllm -n SUSE_AI_NAMESPACE
4.6.6 Examples of vLLM Helm chart override files #
To override the default values during the Helm chart installation or update,
you can create an override YAML file with custom values. Then, apply these
values by specifying the path to the override file with the
-f option of the helm command.
The following override file installs vLLM using a model that is publicly available.
global:
imagePullSecrets:
- application-collection
servingEngineSpec:
modelSpec:
- name: "phi3-mini-4k"
registry: "dp.apps.rancher.io"
repository: "containers/vllm-openai"
tag: "0.9.1"
imagePullPolicy: "IfNotPresent"
modelURL: "microsoft/Phi-3-mini-4k-instruct"
replicaCount: 1
requestCPU: 6
requestMemory: "16Gi"
requestGPU: 1Pulling the images can take a long time. You can monitor the status of the vLLM installation by running the following command:
>kubectl get pods -n SUSE_AI_NAMESPACENAME READY STATUS RESTARTS AGE [...] vllm-deployment-router-7588bf995c-5jbkf 1/1 Running 0 8m9s vllm-phi3-mini-4k-deployment-vllm-79d6fdc-tx7 1/1 Running 0 8m9sPods for the vLLM deployment should transition to the states
ReadyandRunning.
Expose the
vllm-router-serviceport to the host machine:>kubectl port-forward svc/vllm-router-service \ -n SUSE_AI_NAMESPACE 30080:80Query the OpenAI-compatible API to list the available models:
>curl -o- http://localhost:30080/v1/modelsSend a query to the OpenAI
/completionendpoint to generate a completion for a prompt:>curl -X POST http://localhost:30080/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "microsoft/Phi-3-mini-4k-instruct", "prompt": "Once upon a time,", "max_tokens": 10 }'# example output of generated completions { "id": "cmpl-3dd11a3624654629a3828c37bac3edd2", "object": "text_completion", "created": 1757530703, "model": "microsoft/Phi-3-mini-4k-instruct", "choices": [ { "index": 0, "text": " in a bustling city full of concrete and", "logprobs": null, "finish_reason": "length", "stop_reason": null, "prompt_logprobs": null } ], "usage": { "prompt_tokens": 5, "total_tokens": 15, "completion_tokens": 10, "prompt_tokens_details": null }, "kv_transfer_params": null }
The following vLLM override file includes basic configuration options.
Access to a Hugging Face token (
HF_TOKEN).The model
meta-llama/Llama-3.1-8B-Instructfrom this example is a gated model that requires you to accept the agreement to access it. For more information, see https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct.The
runtimeClassNamespecified here isnvidia.Update the
storageClass:entry for eachmodelSpec.
# vllm_custom_overrides.yaml global: imagePullSecrets: - application-collection servingEngineSpec: runtimeClassName: "nvidia" modelSpec: - name: "llama3" 1 registry: "dp.apps.rancher.io" 2 repository: "containers/vllm-openai" 3 tag: "0.9.1" 4 imagePullPolicy: "IfNotPresent" modelURL: "meta-llama/Llama-3.1-8B-Instruct" 5 replicaCount: 1 6 requestCPU: 10 7 requestMemory: "16Gi" 8 requestGPU: 1 9 storageClass: STORAGE_CLASS pvcStorage: "50Gi" 10 pvcAccessMode: - ReadWriteOnce vllmConfig: enableChunkedPrefill: false 11 enablePrefixCaching: false 12 maxModelLen: 4096 13 dtype: "bfloat16" 14 extraArgs: ["--disable-log-requests", "--gpu-memory-utilization", "0.8"] 15 hf_token: HF_TOKEN 16
The unique identifier for your model deployment. | |
The Docker image registry containing the model's serving engine image. | |
The Docker image repository containing the model's serving engine image. | |
The version of the model image to use. | |
The URL pointing to the model on Hugging Face or another hosting service. | |
The number of replicas for the deployment, which allows scaling for load. | |
The amount of CPU resources requested per replica. | |
Memory allocation for the deployment. Sufficient memory is required to load the model. | |
The number of GPUs to allocate for the deployment. | |
The Persistent Volume Claim (PVC) size for model storage. | |
Optimizes performance by prefetching model chunks. | |
Enables caching of prompt prefixes to speed up inference for repeated prompts. | |
The maximum sequence length the model can handle. | |
The data type for model weights, such as | |
Additional command-line arguments for vLLM, such as disabling request logging or setting GPU memory utilization. | |
Your Hugging Face token for accessing gated models. Replace HF_TOKEN with your actual token. |
Prefetching models to a Persistent Volume Claim (PVC) prevents repeated
downloads from Hugging Face during pod startup. The process involves
creating a PVC and a job to fetch the model. This PVC is mounted at
/models, where the prefetch job stores the model
weights. Subsequently, the vLLM modelURL is set to
this path, which ensures that the model is loaded locally instead of being
downloaded when the pod starts.
Define a PVC for model weights using the following YAML specification.
# pvc-models.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: models-pvc namespace: SUSE_AI_NAMESPACE spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 50Gi # Adjust size based on your model storageClassName: STORAGE_CLASSSave it as
pvc-models.yamland apply withkubectl apply -f pvc-models.yaml.Create a secret resource for the Hugging Face token.
>kubectl create secret -n SUSE_AI_NAMESPACE \ generic huggingface-credentials \ --from-literal=HUGGING_FACE_HUB_TOKEN=HF_TOKENCreate a YAML specification for prefetching the model and save it as
job-prefetch-llama3.1-8b.yaml.# job-prefetch-llama3.1-8b.yaml apiVersion: batch/v1 kind: Job metadata: name: prefetch-llama3.1-8b namespace: SUSE_AI_NAMESPACE spec: template: spec: restartPolicy: OnFailure containers: - name: hf-download image: python:3.10-slim env: - name: HF_TOKEN valueFrom: { secretKeyRef: { name: huggingface-credentials, key: HUGGING_FACE_HUB_TOKEN } } - name: HF_HUB_ENABLE_HF_TRANSFER value: "1" - name: HF_HUB_DOWNLOAD_TIMEOUT value: "60" command: ["bash","-lc"] args: - | set -e echo "Logging in..." echo "Installing Hugging Face CLI..." pip install "huggingface_hub[cli]" pip install "hf_transfer" hf auth login --token "${HF_TOKEN}" echo "Downloading Llama 3.1 8B Instruct to /models/llama-3.1-8b-it ..." hf download meta-llama/Llama-3.1-8B-Instruct --local-dir /models/llama-3.1-8b-it volumeMounts: - name: models mountPath: /models volumes: - name: models persistentVolumeClaim: claimName: models-pvcApply the specification with the following commands:
>kubectl apply -f job-prefetch-llama3.1-8b.yaml>kubectl -n SUSE_AI_NAMESPACE \ wait --for=condition=complete job/prefetch-llama3.1-8bUpdate the custom vLLM override file with support for PVC.
# vllm_custom_overrides.yaml global: imagePullSecrets: - application-collection servingEngineSpec: runtimeClassName: "nvidia" modelSpec: - name: "llama3" registry: "dp.apps.rancher.io" repository: "containers/vllm-openai" tag: "0.9.1" imagePullPolicy: "IfNotPresent" modelURL: "/models/llama-3.1-8b-it" replicaCount: 1 requestCPU: 10 requestMemory: "16Gi" requestGPU: 1 extraVolumes: - name: models-pvc persistentVolumeClaim: claimName: models-pvc 1 extraVolumeMounts: - name: models-pvc mountPath: /models 2 vllmConfig: maxModelLen: 4096 hf_token: HF_TOKENSpecify your PVC name.
The mount path must match the base directory of the
servingEngineSpec.modelSpec.modeURLvalue specified above.Save it as
vllm_custom_overrides.yamland apply withkubectl apply -f vllm_custom_overrides.yaml.The following example lists mounted PVCs for a pod.
>kubectl exec -it vllm-llama3-deployment-vllm-858bd967bd-w26f7 \ -n SUSE_AI_NAMESPACE -- ls -l /modelsdrwxr-xr-x 1 root root 608 Aug 22 16:29 llama-3.1-8b-it
This example shows how to configure multiple models to run on different
GPUs. Remember to update the entries hf_token and
storageClass.
Ray is currently not supported. Therefore, sharding a single large model across multiple GPUs is not supported.
# vllm_custom_overrides.yaml
global:
imagePullSecrets:
- application-collection
servingEngineSpec:
modelSpec:
- name: "llama3"
registry: "dp.apps.rancher.io"
repository: "containers/vllm-openai"
tag: "0.9.1"
imagePullPolicy: "IfNotPresent"
modelURL: "meta-llama/Llama-3.1-8B-Instruct"
replicaCount: 1
requestCPU: 10
requestMemory: "16Gi"
requestGPU: 1
pvcStorage: "50Gi"
storageClass: STORAGE_CLASS
vllmConfig:
maxModelLen: 4096
hf_token: HF_TOKEN_FOR_LLAMA_31
- name: "mistral"
registry: "dp.apps.rancher.io"
repository: "containers/vllm-openai"
tag: "0.9.1"
imagePullPolicy: "IfNotPresent"
modelURL: "mistralai/Mistral-7B-Instruct-v0.2"
replicaCount: 1
requestCPU: 10
requestMemory: "16Gi"
requestGPU: 1
pvcStorage: "50Gi"
storageClass: STORAGE_CLASS
vllmConfig:
maxModelLen: 4096
hf_token: HF_TOKEN_FOR_MISTRAL
This example demonstrates how to enable KV cache offloading to the CPU
using LMCache in a vLLM deployment. You can enable LMCache and set
the CPU offloading buffer size using the lmcacheConfig
field. In the following example, the buffer is set to 20 GB, but you
can adjust this value based on your workload. Remember to update the
entries hf_token and storageClass.
Setting lmcacheConfig.enabled to
true implicitly enables the
LMCACHE_USE_EXPERIMENTAL flag for LMCache. These
experimental features are only supported on newer GPU generations. It is
not recommended to enable them without a compelling reason.
# vllm_custom_overrides.yaml
global:
imagePullSecrets:
- application-collection
servingEngineSpec:
runtimeClassName: "nvidia"
modelSpec:
- name: "mistral"
registry: "dp.apps.rancher.io"
repository: "containers/lmcache-vllm-openai"
tag: "0.3.2"
imagePullPolicy: "IfNotPresent"
modelURL: "mistralai/Mistral-7B-Instruct-v0.2"
replicaCount: 1
requestCPU: 10
requestMemory: "40Gi"
requestGPU: 1
pvcStorage: "50Gi"
storageClass: STORAGE_CLASS
pvcAccessMode:
- ReadWriteOnce
vllmConfig:
maxModelLen: 32000
lmcacheConfig:
enabled: false
cpuOffloadingBufferSize: "20"
hf_token: HF_TOKEN
This example shows how to enable remote KV cache storage using LMCache
in a vLLM deployment. The configuration defines a
cacheserverSpec and uses two replicas. Remember to
replace the placeholder values for hf_token and
storageClass before applying the configuration.
Setting lmcacheConfig.enabled to
true implicitly enables the
LMCACHE_USE_EXPERIMENTAL flag for LMCache. These
experimental features are only supported on newer GPU generations. It is
not recommended to enable them without a compelling reason.
# vllm_custom_overrides.yaml
global:
imagePullSecrets:
- application-collection
servingEngineSpec:
runtimeClassName: "nvidia"
modelSpec:
- name: "mistral"
registry: "dp.apps.rancher.io"
repository: "containers/lmcache-vllm-openai"
tag: "0.3.2"
imagePullPolicy: "IfNotPresent"
modelURL: "mistralai/Mistral-7B-Instruct-v0.2"
replicaCount: 2
requestCPU: 10
requestMemory: "40Gi"
requestGPU: 1
pvcStorage: "50Gi"
storageClass: STORAGE_CLASS
vllmConfig:
enablePrefixCaching: true
maxModelLen: 16384
lmcacheConfig:
enabled: false
cpuOffloadingBufferSize: "20"
hf_token: HF_TOKEN
initContainer:
name: "wait-for-cache-server"
image: "dp.apps.rancher.io/containers/lmcache-vllm-openai:0.3.2"
command: ["/bin/sh", "-c"]
args:
- |
timeout 60 bash -c '
while true; do
/opt/venv/bin/python3 /workspace/LMCache/examples/kubernetes/health_probe.py $(RELEASE_NAME)-cache-server-service $(LMCACHE_SERVER_SERVICE_PORT) && exit 0
echo "Waiting for LMCache server..."
sleep 2
done'
cacheserverSpec:
replicaCount: 1
containerPort: 8080
servicePort: 81
serde: "naive"
registry: "dp.apps.rancher.io"
repository: "containers/lmcache-vllm-openai"
tag: "0.3.2"
resources:
requests:
cpu: "4"
memory: "8G"
limits:
cpu: "4"
memory: "10G"
labels:
environment: "cacheserver"
release: "cacheserver"
routerSpec:
resources:
requests:
cpu: "1"
memory: "2G"
limits:
cpu: "1"
memory: "2G"
routingLogic: "session"
sessionKey: "x-user-id"4.7 Installing AI Library components using SUSE AI Deployer #
SUSE AI Deployer consists of a meta Helm chart that takes care of downloading and installing individual AI Library components required by SUSE AI on a Kubernetes cluster.
The following procedure describes how to customize and use the SUSE AI Deployer to install AI Library components. It assumes that you already completed steps described in Section 4.1, “Installation procedure” including the installation of cert-manager.
Pull the SUSE AI Deployer Helm chart with the relevant chart version and untar it. You can find the latest of the chart on the SUSE Application Collection page at https://apps.rancher.io/applications/suse-ai-deployer.
>helm pull oci://dp.apps.rancher.io/charts/suse-ai-deployer \ --version 1.0.0 --untar>cd suse-ai-deployerInspect the downloaded chart and its default values.
>helm show chart .>helm show values .TipTo see default values for the charts of the individual components within the meta chart, run the following commands.
>helm show values charts/ollama/>helm show values charts/open-webui/>helm show values charts/milvus/>helm show values charts/pytorchExplore downloaded example override files in the
suse-ai-deployer/examplessubdirectory. It typically includes the following files:suse-gen-ai-minimal.yamlBasic configuration to get started with GenAI. It deploys Ollama without GPU support, Open WebUI, and Milvus in stand-alone mode using local storage. PyTorch is disabled.
suse-gen-ai.yamlConfiguration optimized for production usage. It deploys Ollama with GPU support, Open WebUI, and Milvus in cluster mode using Longhorn storage. PyTorch is disabled.
suse-ml-stack.yamlBasic configuration that enables deployment of PyTorch with no GPU support with Longhorn storage. It deploys PyTorch but disables Ollama, Open WebUI and Milvus.
Create
custom-overrides.yamloverride file based one of the above examples. The examples use self-signed certificates for TLS communication. To use other option (see Section 4.5.6.1, “TLS sources”), copy theglobalsection from thevalues.yamlfile into yourcustom-overrides.yamland update itstlssection as needed.Install the SUSE AI Deployer Helm chart with while overriding values from the
custom-overrides.yamlfile. Use the appropriate RELEASE_NAME and SUSE_AI_NAMESPACE based the configuration incustom-overrides.yaml.>helm upgrade --install \ RELEASE_NAME \ --namespace SUSE_AI_NAMESPACE \ --create-namespace \ --values ./custom-overrides.yaml \ --version 1.0.0 \ oci://dp.apps.rancher.io/charts/suse-ai-deployer
5 Steps after the installation is complete #
Once the SUSE AI installation is finished, follow these tasks to complete the initial setup and configuration.
Log in to SUSE AI Open WebUI using the default credentials.
After you have logged in, update the administrator password for SUSE AI.
From the available language models, configure the one you prefer. Optionally, install a custom language model. Refer to the section Setting base AI models and Setting the default AI model for more details
Configure user management with role-base access control (RBAC) as described in https://documentation.suse.com/suse-ai/1.0/html/openwebui-configuring/index.html#openwebui-managing-user-roles
Integrate single sign-on authentication manager—such as Okta—with Open WebUI as described in https://documentation.suse.com/suse-ai/1.0/html/openwebui-configuring/index.html#openwebui-authentication-via-okta.
Configure retrieval-augmented generation (RAG) to let the model process content relevant to the customer.
6 Legal Notice #
Copyright© 2006–2025 SUSE LLC and contributors. All rights reserved.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”.
For SUSE trademarks, see https://www.suse.com/company/legal/. All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.
All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors, nor the translators shall be held liable for possible errors or the consequences thereof.



















