Deploying and Installing SUSE AI

Publication Date: 17 Apr 2025

WHAT?: Basic information about SUSE AI deployment workflow.
WHY?: To better understand SUSE AI deployment process.
EFFORT: Less than 30 minutes of reading and a basic knowledge of Linux deployment.

SUSE AI is a versatile product consisting of multiple software layers and components. This document outlines the complete workflow for deploying and installing all SUSE AI dependencies, as well as SUSE AI itself. You can also find references to recommended hardware and software requirements, as well as steps to take after the product installation.

Tip: Hardware and software requirements

For hardware, software and application-specific requirements, refer to SUSE AI requirements.

Revision History: Deploying and Installing SUSE AI

1 Installing SUSE Rancher Prime: RKE2 #

This procedure includes steps to install the base operating system and Kubernetes distribution for users who start deploying on cluster nodes from scratch. If you already have a Kubernetes cluster installed and running, you can skip this procedure and continue with Section 3.1, “Installation procedure”.

Install and register SUSE Linux Micro 6.0 or later on each cluster node. Refer to https://documentation.suse.com/sle-micro/6.0/ for details.
Install the NVIDIA GPU driver on cluster nodes with GPUs. Refer to https://documentation.suse.com/suse-ai/1.0/html/NVIDIA-GPU-driver-on-SL-Micro/index.html for details.
Install SUSE Rancher Prime: RKE2 Kubernetes distribution on the cluster nodes. Refer to https://docs.rke2.io/ for details.

2 Preparing the cluster for AI Library #

This procedure assumes that you already have the base operating system installed on cluster nodes as well as the SUSE Rancher Prime: RKE2 Kubernetes distribution installed and operational. If you are installing from scratch, refer to Section 1, “Installing SUSE Rancher Prime: RKE2” first.

Install SUSE Rancher Prime.
Install the NVIDIA GPU Operator on the cluster. Refer to https://documentation.suse.com/suse-ai/1.0/html/NVIDIA-Operator-installation/index.html#nvidia-operator-installation.
Connect the SUSE Rancher Prime: RKE2 cluster to SUSE Rancher Prime. Refer to https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/register-existing-clusters for details.
Configure the GPU-enabled nodes so that the SUSE AI containers are assigned to Pods that run on nodes equipped with NVIDIA GPU hardware. Find more details assigning Pods to nodes in Section 2.1, “Assigning GPU nodes to applications”.
Install and configure SUSE Security to scan the nodes used for SUSE AI. Although this step is not required, we strongly encourage it to ensure the security in the production environment.
Install and configure SUSE Observability to observe the nodes used for SUSE AI application.

2.1 Assigning GPU nodes to applications #

When deploying a containerized application to Kubernetes, you need to ensure that containers requiring GPU resources are run on appropriate worker nodes. For example, Ollama, a core component of SUSE AI, can deeply benefit from the use of GPU acceleration. This topic describes how to satisfy this requirement by explicitly requesting GPU resources and labeling worker nodes for configuring the node selector.

Requirements #

Kubernetes cluster—such as SUSE Rancher Prime: RKE2—must be available and configured with more than one worker node in which certain nodes have NVIDIA GPU resources and others do not.
This document assumes that any kind of deployment to the Kubernetes cluster is done using Helm charts.

2.1.1 Labeling GPU nodes #

To distinguish nodes with the GPU support from non-GPU nodes, Kubernetes uses labels. Labels are used for relevant metadata and should not be confused with annotations that provide simple information about a resource. It is possible to manipulate labels with the kubectl command, as well as by tweaking configuration files from the nodes. If an IaC tool such as Terraform is used, labels can be inserted in the node resource configuration files.

To label a single node, use the following command:

> kubectl label node GPU_NODE_NAME accelerator=nvidia-gpu

To achieve the same result by tweaking the node.yaml node configuration, add the following content and apply the changes with kubectl apply -f node.yaml:

apiVersion: v1
kind: Node
metadata:
  name: node-name
  labels:
    accelerator: nvidia-gpu

Tip: Labeling multiple nodes

To label multiple nodes, use the following command:

> kubectl label node \
  GPU_NODE_NAME1 \
  GPU_NODE_NAME2 ... \
  accelerator=nvidia-gpu

Tip

If Terraform is being used as an IaC tool, you can add labels to a group of nodes by editing the .tf files and adding the following values to a resource:

resource "node_group" "example" {
  labels = {
    "accelerator" = "nvidia-gpu"
  }
}

To check if the labels are correctly applied, use the following command:

> kubectl get nodes --show-labels

2.1.2 Assigning GPU nodes #

The matching between a container and a node is configured by the explicit resource allocation and the use of labels and node selectors. The use cases described below focus on NVIDIA GPUs.

2.1.2.1 Enable GPU passthrough #

Containers are isolated from the host environment by default. For the containers that rely on the allocation of GPU resources, their Helm charts must enable GPU passthrough so that the container can access and use the GPU resource. Without enabling the GPU passthrough, the container may still run, but it can only use the main CPU for all computations. Refer to Ollama Helm chart for an example of the configuration required for GPU acceleration.

2.1.2.2 Assignment by resource request #

After the NVIDIA GPU Operator is configured on a node, you can instantiate applications requesting the resource nvidia.com/gpu provided by the operator. Add the following content to your values.yaml file. Specify the number of GPUs according to your setup.

resources:
  requests:
    nvidia.com/gpu: 1
  limits:
    nvidia.com/gpu: 1

2.1.2.3 Assignment by labels and node selectors #

If affected cluster nodes are labeled with a label such as accelerator=nvidia-gpu, you can configure the node selector to check for the label. In this case, use the following values in your values.yaml file.

nodeSelector:
  accelerator: nvidia-gpu

2.1.3 Verifying Ollama GPU assignment #

If the GPU is correctly detected, the Ollama container logs this event:

| [...] source=routes.go:1172 msg="Listening on :11434 (version 0.0.0)"                                              │
│ [...] source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2502346830/runners                       │
│ [...] source=payload.go:44 msg="Dynamic LLM libraries [cuda_v12 cpu cpu_avx cpu_avx2]"                             │
│ [...] source=gpu.go:204 msg="looking for compatible GPUs"                                                          │
│ [...] source=types.go:105 msg="inference compute" id=GPU-c9ad37d0-d304-5d2a-c2e6-d3788cd733a7 library=cuda compute │

3 Installing AI Library #

SUSE AI is delivered as a set of components that you can combine to meet specific use cases. This provides extraordinary flexibility but means that there is not a single Helm chart that installs the whole stack. To enable the full integrated stack, you need to deploy multiple applications in sequence. Applications with the fewest dependencies must be installed first, followed by dependent applications once their required dependencies are in place within the cluster.

3.1 Installation procedure #

Purchase the SUSE AI entitlement. It is a separate entitlement from SUSE Rancher Prime.
Access SUSE AI via the SUSE Application Collection at https://apps.rancher.io/ to perform the check for the SUSE AI entitlement.
If the entitlement check is successful, you are given access to the SUSE AI-related Helm charts and container images, and can deploy directly from the SUSE Application Collection.
Visit the SUSE Application Collection, sign in and get the user access token as described in https://docs.apps.rancher.io/get-started/authentication/.
Create a Kubernetes namespace if it does not already exist. The steps in this procedure assume that all containers are deployed into the same namespace referred to as SUSE_AI_NAMESPACE. Replace its name to match your preferences.
```
> kubectl create namespace SUSE_AI_NAMESPACE
```

Create the SUSE Application Collection secret.

> kubectl create secret docker-registry application-collection \
  --docker-server=dp.apps.rancher.io \
  --docker-username=APPCO_USERNAME \
  --docker-password=APPCO_USER_TOKEN \
  -n SUSE_AI_NAMESPACE

> helm registry login dp.apps.rancher.io/charts \
  -u APPCO_USERNAME \
  -p APPCO_USER_TOKEN

Install Milvus as described in Section 3.2, “Installing Milvus”.
(Optional) Install Ollama as described in Section 3.3, “Installing Ollama”.
Install cert-manager as described in Section 3.4, “Installing cert-manager”.
Install Open WebUI as described in Section 3.5, “Installing Open WebUI”.

3.2 Installing Milvus #

Milvus is a scalable, high-performance vector database designed for AI applications. It enables efficient organization and searching of massive unstructured datasets, including text, images and multi-modal content. This procedure walks you through the installation of Milvus and its dependencies.

3.2.1 Details about the Milvus application #

Before deploying Milvus, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:

helm show values oci://dp.apps.rancher.io/charts/milvus

Alternatively, you can also refer to the Milvus Helm chart page on the SUSE Application Collection site at https://apps.rancher.io/applications/milvus. It contains Milvus dependencies, available versions and the link to pull the Milvus container image.

Figure 1: Milvus page in the SUSE Application Collection #

3.2.2 Milvus installation procedure #

Tip

Before the installation, you need to get user access to the SUSE Application Collection, create a Kubernetes namespace, and log in to the Helm registry as described in Section 3.1, “Installation procedure”.

When installed as part of SUSE AI, Milvus depends on etcd, MinIO and Apache Kafka. Because the Milvus chart uses a non-default configuration, create an override file milvus_custom_overrides.yaml with the following content.

Tip

As a template, you can download the Milvus Helm chart that includes the values.yaml file with the default configuration by running the following command:

> helm pull oci://dp.apps.rancher.io/charts/milvus --version 4.2.2

global:
  imagePullSecrets:
  - application-collection
cluster:
  enabled: True
standalone:
  persistence:
    persistentVolumeClaim:
      storageClass: local-path
etcd:
  replicaCount: 1
  persistence:
    storageClassName: local-path
minio:
  mode: distributed
  replicas: 4
  rootUser: "admin"
  rootPassword: "adminminio"
  persistence:
    storageClass: local-path
  resources:
    requests:
      memory: 1024Mi
kafka:
  enabled: true
  name: kafka
  replicaCount: 3
  broker:
    enabled: true
  cluster:
    listeners:
      client:
        protocol: 'PLAINTEXT'
      controller:
        protocol: 'PLAINTEXT'
  persistence:
    enabled: true
    annotations: {}
    labels: {}
    existingClaim: ""
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 8Gi
    storageClassName: "local-path"

Tip

The above example uses local storage. For production environments, we recommend using an enterprise class storage solution such as SUSE Storage in which case the storageClassName option must be set to longhorn.

Install the Milvus Helm chart using the milvus_custom_overrides.yaml override file.

> helm upgrade --install \
  milvus oci://dp.apps.rancher.io/charts/milvus \
  -n SUSE_AI_NAMESPACE \
  --version 4.2.2 -f milvus_custom_overrides.yaml

3.2.2.1 Using Apache Kafka with SUSE Storage #

When Milvus is deployed in cluster mode, it uses Apache Kafka as a message queue. If Apache Kafka uses SUSE Storage as a storage back-end, you need to create an XFS storage class and make it available for the Apache Kafka deployment. Otherwise deploying Apache Kafka with a storage class of an Ext4 file system fails with the following error:

"Found directory /mnt/kafka/logs/lost+found, 'lost+found' is not
  in the form of topic-partition or topic-partition.uniqueId-delete
  (if marked for deletion)"

To introduce the XFS storage class, follow these steps:

Create a file named longhorn-xfs.yaml with the following content:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-xfs
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "30"
  fromBackup: ""
  fsType: "xfs"
  dataLocality: "disabled"
  unmapMarkSnapChainRemoved: "ignored"

Create the new storage class using the kubectl command.
```
> kubectl apply -f longhorn-xfs.yaml
```
Update the Milvus overrides YAML file to reference the Apache Kafka storage class, as in the following example:
```
  [...]
    kafka:
    enabled: true
    persistence:
      storageClassName: longhorn-xfs
```

3.2.3 Upgrading Milvus #

The Milvus chart receives application updates and updates of the Helm chart templates. New versions may include changes that require manual steps. These steps are listed in the corresponding README file. All Milvus dependencies are updated automatically during Milvus upgrade.

To upgrade Milvus, identify the new version number and run the following command below:

> helm upgrade --install \
  milvus oci://dp.apps.rancher.io/charts/milvus \
  -n SUSE_AI_NAMESPACE \
  --version VERSION_NUMBER \
  -f milvus_custom_overrides.yaml

3.2.4 Uninstalling Milvus #

To uninstall Milvus, run the following command:

> helm uninstall milvus -n SUSE_AI_NAMESPACE

3.3 Installing Ollama #

Ollama is a tool for running and managing language models locally on your computer. It offers a simple interface to download, run and interact with models without relying on cloud resources.

Tip

When installing SUSE AI, Ollama is installed by the Open WebUI installation by default. If you decide to install Ollama separately, disable its installation during the installation of Open WebUI as outlined in Example 6, “Open WebUI override file with Ollama installed separately”.

3.3.1 Details about the Ollama application #

Before deploying Ollama, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:

helm show values oci://dp.apps.rancher.io/charts/ollama

Alternatively, you can also refer to the Ollama Helm chart page on the SUSE Application Collection site at https://apps.rancher.io/applications/ollama. It contains the available versions and a link to pull the Ollama container image.

3.3.2 Ollama installation procedure #

Tip

Create the ollama_custom_overrides.yaml file to override the values of the parent Helm chart. Refer to Section 3.3.5, “Values for the Ollama Helm chart” for more details.

Install the Ollama Helm chart using the ollama-custom-overrides.yaml override file.

> helm upgrade --install ollama oci://dp.apps.rancher.io/charts/ollama \
  -n SUSE_AI_NAMESPACE -f ollama_custom_overrides.yaml

3.3.3 Uninstalling Ollama #

To uninstall Ollama, run the following command:

> helm uninstall ollama -n SUSE_AI_NAMESPACE

3.3.4 Upgrading Ollama #

You can upgrade Ollama to a specific version by running the following command:

> helm upgrade ollama oci://dp.apps.rancher.io/charts/ollama \
  -n SUSE_AI_NAMESPACE \
  --version OLLAMA_VERSION_NUMBER -f ollama_custom_overrides.yaml

If you omit the --version option, Ollama gets upgraded to the latest available version.

3.3.4.1 Upgrading from version 0.x.x to 1.x.x #

The version 1.x.x introduces the ability to load models in memory at startup. To reflect this, change ollama.models to ollama.models.pull in the Ollama Helm chart to avoid errors before upgrading, for example:

Example 1: Ollama Helm chart version 0.x.x #

[...]
ollama:
  models:
    - "gemma:2b"
    - "llama3.1"

Example 2: Ollama Helm chart version 1.x.x #

[...]
ollama:
  models:
    pull:
      - "gemma:2b"
      - "llama3.1"

Without this change you may experience the following error when trying to upgrade from 0.x.x to 1.x.x.

coalesce.go:286: warning: cannot overwrite table with non table for
ollama.ollama.models (map[pull:[] run:[]])
Error: UPGRADE FAILED: template: ollama/templates/deployment.yaml:145:27:
executing "ollama/templates/deployment.yaml" at <.Values.ollama.models.pull>:
can't evaluate field pull in type interface {}

3.3.5 Values for the Ollama Helm chart #

To override the default values during the Helm chart installation or update, you can create an override YAML file with custom values. Then, apply these values by specifying the path to the override file with the -f option of the helm command.

Important: GPU section

Ollama can run optimized for NVIDIA GPUs if the following conditions are fulfilled:

The NVIDIA driver and NVIDIA GPU Operator are installed as described in https://documentation.suse.com/suse-ai/1.0/html/NVIDIA-GPU-driver-on-SL-Micro/index.html.
The workloads are set to run on NVIDIA-enabled nodes as described in https://documentation.suse.com/suse-ai/1.0/html/AI-deployment-intro/index.html#ai-gpu-nodes-assigning.

If you do not want to use the NVIDIA GPU, remove the gpu section from ollama_custom_overrides.yaml.

 ollama:
  [...]
  gpu:
    enabled: true
    type: 'nvidia'
    number: 1

Example 3: Basic override file with GPU and two models pulled at startup #

global:
  imagePullSecrets:
  - APPCO_SECRET
ingress:
  enabled: false
defaultModel: "gemma:2b"
ollama:
  models:
    pull:
      - "gemma:2b"
      - "llama3.1"
    run:
      - "gemma:2b"
      - "llama3.1"
  gpu:
    enabled: true
    type: 'nvidia'
    number: 1
persistentVolume:1
  enabled: true
  storageClass: local-path2

1	Without the `persistentVolume` option enabled, changes made to Ollama—such as downloading other LLM— are lost when the container is restarted.
2	Use `local-path` storage only for testing purposes. For production use, we recommend using a storage solution suitable for persistent storage, such as SUSE Storage.

Example 4: Basic override file with Ingress and no GPU #

ollama:
  models:
    pull:
      - llama2
    run:
      - llama2
  persistentVolume:
    enabled: true
    storageClass: local-path1
ingress:
  enabled: true
  hosts:
  - host: OLLAMA_API_URL
    paths:
      - path: /
        pathType: Prefix

1	Use `local-path` storage only for testing purposes. For production use, we recommend using a storage solution suitable for persistent storage, such as SUSE Storage.

Table 1: Override file options for the Ollama Helm chart #

Key	Type	Default	Description
affinity	object	{}	Affinity for pod assignment
autoscaling.enabled	bool	false	Enable autoscaling
autoscaling.maxReplicas	int	100	Number of maximum replicas
autoscaling.minReplicas	int	1	Number of minimum replicas
autoscaling.targetCPUUtilizationPercentage	int	80	CPU usage to target replica
extraArgs	list	[]	Additional arguments on the output Deployment definition.
extraEnv	list	[]	Additional environment variables on the output Deployment definition.
fullnameOverride	string	""	String to fully override template
global.imagePullSecrets	list	[]	Global override for container image registry pull secrets
global.imageRegistry	string	""	Global override for container image registry
hostIPC	bool	false	Use the host’s IPC namespace
hostNetwork	bool	false	Use the host's network namespace
hostPID	bool	false	Use the host's PID namespace.
image.pullPolicy	string	"IfNotPresent"	Image pull policy to use for the Ollama container
image.registry	string	"dp.apps.rancher.io"	Image registry to use for the Ollama container
image.repository	string	"containers/ollama"	Image repository to use for the Ollama container
image.tag	string	"0.3.6"	Image tag to use for the Ollama container
imagePullSecrets	list	[]	Docker registry secret names as an array
ingress.annotations	object	{}	Additional annotations for the Ingress resource
ingress.className	string	""	IngressClass that is used to implement the Ingress (Kubernetes 1.18+)
ingress.enabled	bool	false	Enable Ingress controller resource
ingress.hosts[0].host	string	"ollama.local"
ingress.hosts[0].paths[0].path	string	"/"
ingress.hosts[0].paths[0].pathType	string	"Prefix"
ingress.tls	list	[]	The TLS configuration for host names to be covered with this Ingress record
initContainers	list	[]	Init containers to add to the pod
knative.containerConcurrency	int	0	Knative service container concurrency
knative.enabled	bool	false	Enable Knative integration
knative.idleTimeoutSeconds	int	300	Knative service idle timeout seconds
knative.responseStartTimeoutSeconds	int	300	Knative service response start timeout seconds
knative.timeoutSeconds	int	300	Knative service timeout seconds
livenessProbe.enabled	bool	true	Enable livenessProbe
livenessProbe.failureThreshold	int	6	Failure threshold for livenessProbe
livenessProbe.initialDelaySeconds	int	60	Initial delay seconds for livenessProbe
livenessProbe.path	string	"/"	Request path for livenessProbe
livenessProbe.periodSeconds	int	10	Period seconds for livenessProbe
livenessProbe.successThreshold	int	1	Success threshold for livenessProbe
livenessProbe.timeoutSeconds	int	5	Timeout seconds for livenessProbe
nameOverride	string	""	String to partially override template (maintains the release name)
nodeSelector	object	{}	Node labels for pod assignment
ollama.gpu.enabled	bool	false	Enable GPU integration
ollama.gpu.number	int	1	Specify the number of GPUs
ollama.gpu.nvidiaResource	string	"nvidia.com/gpu"	Only for NVIDIA cards; change to `nvidia.com/mig-1g.10gb` to use MIG slice
ollama.gpu.type	string	"nvidia"	GPU type: “nvidia” or “amd.” If “ollama.gpu.enabled” is enabled, the default value is “nvidia.” If set to “amd,” this adds the “rocm” suffix to the image tag if “image.tag” is not override. This is because AMD and CPU/CUDA are different images.
ollama.insecure	bool	false	Add insecure flag for pulling at container startup
ollama.models	list	[]	List of models to pull at container startup. The more you add, the longer the container takes to start if models are not present models: - llama2 - mistral
ollama.mountPath	string	""	Override ollama-data volume mount path, default: "/root/.ollama"
persistentVolume.accessModes	list	["ReadWriteOnce"]	Ollama server data Persistent Volume access modes. Must match those of existing PV or dynamic provisioner, see https://kubernetes.io/docs/concepts/storage/persistent-volumes/.
persistentVolume.annotations	object	{}	Ollama server data Persistent Volume annotations
persistentVolume.enabled	bool	false	Enable persistence using PVC
persistentVolume.existingClaim	string	""	If you want to bring your own PVC for persisting Ollama state, pass the name of the created + ready PVC here. If set, this Chart does not create the default PVC. Requires `server.persistentVolume.enabled: true`
persistentVolume.size	string	"30Gi"	Ollama server data Persistent Volume size
persistentVolume.storageClass	string	""	If persistentVolume.storageClass is present, and is set to either a dash (“-”) or empty string (“ /”), dynamic provisioning is disabled. Otherwise, the storageClassName for persistent volume claim is set to the given value specified by persistentVolume.storageClass. If persistentVolume.storageClass is absent, the default storage class is used for dynamic provisioning whenever possible. See https://kubernetes.io/docs/concepts/storage/storage-classes/ for more details.
persistentVolume.subPath	string	""	Subdirectory of Ollama server data Persistent Volume to mount. Useful if the volume's root directory is not empty.
persistentVolume.volumeMode	string	""	Ollama server data Persistent Volume Binding Mode. If empty (the default) or set to null, no volumeBindingMode specification is set, choosing the default mode.
persistentVolume.volumeName	string	""	Ollama server Persistent Volume name. It can be used to force-attach the created PVC to a specific PV.
podAnnotations	object	{}	Map of annotations to add to the pods
podLabels	object	{}	Map of labels to add to the pods
podSecurityContext	object	{}	Pod Security Context
readinessProbe.enabled	bool	true	Enable readinessProbe
readinessProbe.failureThreshold	int	6	Failure threshold for readinessProbe
readinessProbe.initialDelaySeconds	int	30	Initial delay seconds for readinessProbe
readinessProbe.path	string	"/"	Request path for readinessProbe
readinessProbe.periodSeconds	int	5	Period seconds for readinessProbe
readinessProbe.successThreshold	int	1	Success threshold for readinessProbe
readinessProbe.timeoutSeconds	int	3	Timeout seconds for readinessProbe
replicaCount	int	1	Number of replicas
resources.limits	object	{}	Pod limit
resources.requests	object	{}	Pod requests
runtimeClassName	string	""	Specify runtime class
securityContext	object	{}	Container Security Context
service.annotations	object	{}	Annotations to add to the service
service.nodePort	int	31434	Service node port when service type is “NodePort”
service.port	int	11434	Service port
service.type	string	"ClusterIP"	Service type
serviceAccount.annotations	object	{}	Annotations to add to the service account
serviceAccount.automount	bool	true	Whether to automatically mount a ServiceAccount's API credentials
serviceAccount.create	bool	true	Whether a service account should be created
serviceAccount.name	string	""	The name of the service account to use. If not set and “create” is “true”, a name is generated using the full name template.
tolerations	list	[]	Tolerations for pod assignment
topologySpreadConstraints	object	{}	Topology Spread Constraints for pod assignment
updateStrategy	object	{"type":""}	How to replace existing pods.
updateStrategy.type	string	""	Can be “Recreate” or “RollingUpdate”; default is “RollingUpdate”
volumeMounts	list	[]	Additional volumeMounts on the output Deployment definition
volumes	list	[]	Additional volumes on the output Deployment definition

3.4 Installing cert-manager #

cert-manager is an extensible X.509 certificate controller for Kubernetes workloads. It supports certificates from popular public issuers as well as private issuers. cert-manager ensures that the certificates are valid and up-to-date, and attempts to renew certificates at a configured time before expiry.

In previous releases, cert-manager was automatically installed together with Open WebUI. Currently, cert-manager is no longer part of the Open WebUI Helm chart and you need to install it separately.

3.4.1 Details about the cert-manager application #

Before deploying cert-manager, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:

helm show values oci://dp.apps.rancher.io/charts/cert-manager

Alternatively, you can also refer to the cert-manager Helm chart page on the SUSE Application Collection site at https://apps.rancher.io/applications/cert-manager. It contains available versions and the link to pull the cert-manager container image.

3.4.2 cert-manager installation procedure #

Tip

Install the cert-manager chart.

> helm upgrade --install cert-manager \
  oci://dp.apps.rancher.io/charts/cert-manager \
  -n CERT_MANAGER_NAMESPACE \
  --set crds.enabled=true \
  --set 'global.imagePullSecrets[0].name'=application-collection

3.4.3 Upgrading cert-manager #

To upgrade cert-manager to a specific new version, run the following command:

> helm upgrade --install cert-manager \
  oci://dp.apps.rancher.io/charts/cert-manager \
  -n CERT_MANAGER_NAMESPACE \
  --version VERSION_NUMBER

To upgrade cert-manager to the latest version, run the following command:

> helm upgrade --install cert-manager \
  oci://dp.apps.rancher.io/charts/cert-manager \
  -n CERT_MANAGER_NAMESPACE

3.4.4 Uninstalling cert-manager #

To uninstall cert-manager, run the following command:

> helm uninstall cert-manager -n CERT_MANAGER_NAMESPACE

3.5 Installing Open WebUI #

Open WebUI is a Web-based user interface designed for interacting with AI models.

3.5.1 Details about the Open WebUI application #

Before deploying Open WebUI, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:

helm show values oci://dp.apps.rancher.io/charts/open-webui

Alternatively, you can also refer to the Open WebUI Helm chart page on the SUSE Application Collection site at https://apps.rancher.io/applications/open-webui. It contains available versions and the link to pull the Open WebUI container image.

3.5.2 Open WebUI installation procedure #

Tip

Requirements #

To install Open WebUI, you need to have the following:

An installed cert-manager. If cert-manager is not installed from previous Open WebUI releases, install it by following the steps in Section 3.4, “Installing cert-manager”.

Create the owui_custom_overrides.yaml file to override the values of the parent Helm chart. The file contains URLs for Milvus and Ollama and specifies whether a stand-alone Ollama deployment is used or whether Ollama is installed as part of the Open WebUI installation. Find more details in Section 3.5.5, “Examples of Open WebUI Helm chart override files”. For a list of all installation options with examples, refer to Section 3.5.6, “Values for the Open WebUI Helm chart”.

Install the Open WebUI Helm chart using the owui_custom_overrides.yaml override file.

> helm upgrade --install open-webui \
  oci://dp.apps.rancher.io/charts/open-webui \
  -n SUSE_AI_NAMESPACE \
  --version 3.3.2 -f owui_custom_overrides.yaml

3.5.3 Upgrading Open WebUI #

To upgrade Open WebUI to a specific new version, run the following command:

> helm upgrade --install open-webui \
  oci://dp.apps.rancher.io/charts/open-webui \
  -n SUSE_AI_NAMESPACE \
  --version VERSION_NUMBER \
  -f owui_custom_overrides.yaml

To upgrade Open WebUI to the latest version, run the following command:

> helm upgrade --install open-webui \
  oci://dp.apps.rancher.io/charts/open-webui \
  -n SUSE_AI_NAMESPACE \
  -f owui_custom_overrides.yaml

3.5.4 Uninstalling Open WebUI #

To uninstall Open WebUI, run the following command:

> helm uninstall open-webui -n SUSE_AI_NAMESPACE

3.5.5 Examples of Open WebUI Helm chart override files #

Example 5: Open WebUI override file with Ollama included #

The following override file installs Ollama during the Open WebUI installation. Replace SUSE_AI_NAMESPACE with your Kubernetes namespace.

global:
  imagePullSecrets:
  - application-collection
ollamaUrls:
- http://open-webui-ollama.SUSE_AI_NAMESPACE.svc.cluster.local:11434
persistence:
  enabled: true
  storageClass: local-path1
ollama:
  enabled: true
  ingress:
    enabled: false
  defaultModel: "gemma:2b"
  ollama:
    models:2
      - "gemma:2b"
      - "llama3.1"
    gpu:3
      enabled: true
      type: 'nvidia'
      number: 1
    persistentVolume:4
      enabled: true
      storageClass: local-path5
pipelines:
  enabled: False
  persistence:
    storageClass: local-path6
ingress:
  enabled: true
  class: ""
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  host: suse-ollama-webui7
  tls: true
extraEnvVars:
- name: DEFAULT_MODELS8
  value: "gemma:2b"
- name: DEFAULT_USER_ROLE
  value: "user"
- name: WEBUI_NAME
  value: "SUSE AI"
- name: GLOBAL_LOG_LEVEL
  value: INFO
- name: RAG_EMBEDDING_MODEL
  value: "sentence-transformers/all-MiniLM-L6-v2"
- name: VECTOR_DB
  value: "milvus"
- name: MILVUS_URI
  value: http://milvus.SUSE_AI_NAMESPACE.svc.cluster.local:19530
- name: INSTALL_NLTK_DATASETS9
  value: "true"

2	Specifies that two large language models (LLM) will be loaded in Ollama when the container starts.
3	Enables GPU support for Ollama. The `type` must be `nvidia` because NVIDIA GPUs are the only supported devices. `number` must be between 1 and the number of NVIDIA GPUs present on the system.
4	Without the `persistentVolume` option enabled, changes made to Ollama—such as downloading other LLM— are lost when the container is restarted.
1 5 6	Use `local-path` storage only for testing purposes. For production use, we recommend using a storage solution suitable for persistent storage, such as SUSE Storage.
8	Specifies the default LLM for Ollama.
7	Specifies the host name for the Open WebUI Web UI.
9	Installs the natural language toolkit (NLTK) datasets for Ollama. Refer to https://www.nltk.org/index.html for licensing information.

Example 6: Open WebUI override file with Ollama installed separately #

The following override file installs Ollama separately from the Open WebUI installation. Replace SUSE_AI_NAMESPACE with your Kubernetes namespace.

global:
  imagePullSecrets:
  - application-collection
ollamaUrls:
- http://ollama.SUSE_AI_NAMESPACE.svc.cluster.local:11434
persistence:
  enabled: true
  storageClass: local-path1
ollama:
  enabled: false
pipelines:
  enabled: False
  persistence:
    storageClass: local-path2
ingress:
  enabled: true
  class: ""
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  host: suse-ollama-webui
  tls: true
extraEnvVars:
- name: DEFAULT_MODELS3
  value: "gemma:2b"
- name: DEFAULT_USER_ROLE
  value: "user"
- name: WEBUI_NAME
  value: "SUSE AI"
- name: GLOBAL_LOG_LEVEL
  value: INFO
- name: RAG_EMBEDDING_MODEL
  value: "sentence-transformers/all-MiniLM-L6-v2"
- name: VECTOR_DB
  value: "milvus"
- name: MILVUS_URI
  value: http://milvus.SUSE_AI_NAMESPACE.svc.cluster.local:19530

1 2	Use `local-path` storage only for testing purposes. For production use, we recommend using a storage solution suitable for persistent storage, such as SUSE Storage.
3	Specifies the default LLM for Ollama.

3.5.6 Values for the Open WebUI Helm chart #

Table 2: Available options for the Open WebUI Helm chart #

Key	Type	Default	Description
affinity	object	{}	Affinity for pod assignment
annotations	object	{}
cert-manager.enabled	bool	true
clusterDomain	string	"cluster.local"	Value of cluster domain
containerSecurityContext	object	{}	Configure container security context, see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-containe.
extraEnvVars	list	[{"name":"OPENAI_API_KEY", "value":"0p3n-w3bu!"}]	Environment variables added to the Open WebUI deployment. Most up-to-date environment variables can be found in https://docs.openwebui.com/getting-started/env-configuration/.
extraEnvVars[0]	object	{"name":"OPENAI_API_KEY","value":"0p3n-w3bu!"}	Default API key value for Pipelines. It should be updated in a production deployment and changed to the required API key if not using Pipelines.
global.imagePullSecrets	list	[]	Global override for container image registry pull secrets
global.imageRegistry	string	""	Global override for container image registry
global.tls.additionalTrustedCAs	bool	false
global.tls.issuerName	string	"suse-private-ai"
global.tls.letsEncrypt.email	string	"none@example.com"
global.tls.letsEncrypt.environment	string	"staging"
global.tls.letsEncrypt.ingress.class	string	""
global.tls.source	string	"suse-private-ai"	The source of Open WebUI TLS keys, see Section 3.5.6.1, “TLS sources”.
image.pullPolicy	string	"IfNotPresent"	Image pull policy to use for the Open WebUI container
image.registry	string	"dp.apps.rancher.io"	Image registry to use for the Open WebUI container
image.repository	string	"containers/open-webui"	Image repository to use for the Open WebUI container
image.tag	string	"0.3.32"	Image tag to use for the Open WebUI container
imagePullSecrets	list	[]	Configure imagePullSecrets to use private registry, see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry.
ingress.annotations	object	{"nginx.ingress.kubernetes.io/ssl-redirect":"true"}	Use appropriate annotations for your Ingress controller, such as `nginx.ingress.kubernetes.io/rewrite-target: /` for NGINX.
ingress.class	string	""
ingress.enabled	bool	true
ingress.existingSecret	string	""
ingress.host	string	""
ingress.tls	bool	true
nameOverride	string	""
nodeSelector	object	{}	Node labels for pod assignment
ollama.enabled	bool	true	Automatically install Ollama Helm chart from https://otwld.github.io/ollama-helm/. Configure the following Helm values.
ollama.fullnameOverride	string	"open-webui-ollama"	If enabling embedded Ollama, update fullnameOverride to your desired Ollama name value, or else it will use the default ollama.name value from the Ollama chart.
ollamaUrls	list	[]	A list of Ollama API endpoints. These can be added instead of automatically installing the Ollama Helm chart, or in addition to it.
openaiBaseApiUrl	string	""	OpenAI base API URL to use. Defaults to the Pipelines service endpoint when Pipelines are enabled, or to `https://api.openai.com/v1` if Pipelines are not enabled and this value is blank.
persistence.accessModes	list	["ReadWriteOnce"]	If using multiple replicas, you must update accessModes to ReadWriteMany.
persistence.annotations	object	{}
persistence.enabled	bool	true
persistence.existingClaim	string	""	Use existingClaim to reuse an existing Open WebUI PVC instead of creating a new one.
persistence.selector	object	{}
persistence.size	string	"2Gi"
persistence.storageClass	string	""
pipelines.enabled	bool	false	Automatically install Pipelines chart to extend Open WebUI functionality using Pipelines, see https://github.com/open-webui/pipelines.
pipelines.extraEnvVars	list	[]	This section can be used to pass the required environment variables to your pipelines (such as the Langfuse host name).
podAnnotations	object	{}
podSecurityContext	object	{}	Configure pod security context, see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-containe.
replicaCount	int	1
resources	object	{}
service	object	{"annotations":{},"containerPort":8080, "labels":{},"loadBalancerClass":"", "nodePort":"","port":80,"type":"ClusterIP"}	Service values to expose Open WebUI pods to cluster
tolerations	list	[]	Tolerations for pod assignment
topologySpreadConstraints	list	[]	Topology Spread Constraints for pod assignment

3.5.6.1 TLS sources #

There are three recommended options where Open WebUI can obtain TLS certificates for secure communication.

Self-Signed TLS certificate

This is the default method. You need to install cert-manager on the cluster to issue and maintain the certificates. This method generates a CA and signs the Open WebUI certificate using the CA. cert-manager then manages the signed certificate.

For this method, use the following Helm chart option:

global.tls.source=suse-private-ai

Let's Encrypt

This method also uses cert-manager, but it is combined with a special issuer for Let's Encrypt that performs all actions—including request and validation—to get the Let's Encrypt certificate issued. This configuration uses HTTP validation (HTTP-01) and therefore the load balancer must have a public DNS record and be accessible from the Internet.

For this method, use the following Helm chart option:

global.tls.source=letsEncrypt

Provide your own certificate

This method allows you to bring your own signed certificate to secure the HTTPS traffic. In this case, you must upload this certificate and associated key as PEM-encoded files named tls.crt and tls.key.

For this method, use the following Helm chart option:

global.tls.source=secret

4 Steps after the installation is complete #

Once the SUSE AI installation is finished, follow these tasks to complete the initial setup and configuration.

Log in to SUSE AI Open WebUI using the default credentials.
After you have logged in, update the administrator password for SUSE AI.
From the available language models, configure the one you prefer. Optionally, install a custom language model.
Configure user management with role-base access control (RBAC) as described in https://documentation.suse.com/suse-ai/1.0/html/openwebui-configuring/index.html#openwebui-managing-user-roles
Integrate single sign-on authentication manager—such as Okta—with Open WebUI as described in https://documentation.suse.com/suse-ai/1.0/html/openwebui-configuring/index.html#openwebui-authentication-via-okta.
Configure retrieval-augmented generation (RAG) to let the model process content relevant to the customer.

5 Legal Notice #

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”.

For SUSE trademarks, see https://www.suse.com/company/legal/. All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.

All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors, nor the translators shall be held liable for possible errors or the consequences thereof.