Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / An overview of SUSE AI Deployment

An overview of SUSE AI Deployment

Publication Date: 19 Dec 2024
WHAT?

Basic information about SUSE AI deployment workflow.

WHY?

To better understand SUSE AI deployment process.

EFFORT

Less than 30 minutes of reading and a basic knowledge of Linux deployment.

1 Deployment procedures

SUSE AI is a complex product consisting of multiple software layers and components. This document outlines the complete workflow for deploying and installing all SUSE AI dependencies, as well as SUSE AI itself. You can also find references to recommended hardware and software requirements, as well as steps to take after the product installation.

Tip
Tip: Hardware and software requirements

For hardware, software and application-specific requirements, refer to SUSE AI requirements.

1.1 Prerequisites for customers who are not yet running a Rancher cluster

  1. Purchase the SUSE Rancher Prime entitlement.

  2. Install SUSE Rancher Prime.

  3. Deploy and configure SUSE Security.

  4. Deploy and configure SUSE Observability.

1.2 Cluster preparation

  1. Install and register SUSE Linux Micro 6.0 or later on each SUSE Rancher Prime: RKE2 cluster node. Refer to https://documentation.suse.com/sle-micro/6.0/ for details.

  2. Install the NVIDIA GPU driver on cluster nodes with GPUs. Refer to https://documentation.suse.com/suse-ai/1.0/html/NVIDIA-GPU-driver-on-SL-Micro/index.html for details.

  3. Install SUSE Rancher Prime: RKE2 Kubernetes distribution on the cluster nodes. Refer to https://docs.rke2.io/ for details.

  4. Install the NVIDIA GPU Operator with the additional option --set driver.enabled=false. Refer to https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#rancher-kubernetes-engine-2.

  5. Connect the SUSE Rancher Prime: RKE2 cluster to SUSE Rancher Prime. Refer to https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/register-existing-clusters for details.

  6. Configure the GPU-enabled nodes so that the SUSE AI containers are assigned to Pods that run on nodes equipped with NVIDIA GPU hardware. Find more details assigning Pods to nodes in Section 2, “Assigning GPU nodes to applications”.

  7. Configure SUSE Security to scan the nodes used for SUSE AI. Although this step is not required, we strongly encourage it to ensure the security in the production environment.

  8. Configure SUSE Observability to observe the nodes used for SUSE AI application.

1.3 SUSE AI installation

SUSE AI is being delivered as a set of components that you can combine to meet specific use cases. This provides extraordinary flexibility but means that there is not a single Helm chart that installs the whole stack, for example, for using the Open WebUI chatbot style application. To enable the full integrated stack, you need to deploy multiple applications in sequence. Applications with the fewest dependencies must be installed first, followed by dependent applications once their required dependencies are in place within the cluster.

  1. Purchase the SUSE AI entitlement. It is a separate entitlement from SUSE Rancher Prime.

  2. Access SUSE AI via the SUSE Rancher Prime Application Collection at https://apps.rancher.io/ to perform the check for the SUSE AI entitlement.

  3. If the entitlement check is successful, you are given access to the SUSE AI-related Helm charts and container images, and can deploy directly from the SUSE Rancher Prime Application Collection.

    Tip
    Tip

    Any overrides of the default values in the Helm charts—such as Open WebUI password and URL customizations—occur at this step.

  4. Install Milvus as described in Section 3, “Installing Milvus”.

  5. (Optional) Install Ollama as described in Section 4, “Installing Ollama”.

  6. Install Open WebUI as described in Section 5, “Installing Open WebUI”.

1.3.1 Ollama and Open WebUI fail to deploy reporting a cert-manager error

By default, public endpoints such as Open WebUI are protected by self-signed SSL/TLS certificates. You also have the option to either have the certificate issued by a public Certification Authority (CA), or bring your own certificate (BYO). Except for the BYO case, cert-manager is required to facilitate the certificate management. Therefore, you must verify that the cert-manager custom resource definitions (CRDs) are installed using the kubectl utility, for example:

> kubectl get crds | grep cert-manager
certificaterequests.cert-manager.io                   2024-12-05T22:22:56Z
certificates.cert-manager.io                          2024-12-05T22:22:56Z
challenges.acme.cert-manager.io                       2024-12-05T22:22:57Z
clusterissuers.cert-manager.io                        2024-12-05T22:22:57Z
issuers.cert-manager.io                               2024-12-05T22:22:57Z
orders.acme.cert-manager.io                           2024-12-05T22:22:57Z

If the cert-manager CRDs are not installed, install them by running the following command:

kubectl apply -f \
https://github.com/cert-manager/cert-manager/releases/download/v1.6.2/cert-manager.crds.yaml
Tip
Tip: The version of cert-manager

In the above command, replace v1.6.2 with the version of the cert-manager dependency included in the Chart.lock file from the unpacked Open WebUI Helm chart.

For BYO certificates, cert-manager is not required. Therefore, you may disable it by adding the following option when installing the SUSE AI components via Helm:

--set cert-manager.enabled=false

1.4 Steps after the installation is complete

  1. Log in to SUSE AI Open WebUI using the default credentials.

  2. After you have logged in, update the administrator password for SUSE AI.

  3. From the available language models, configure the one you prefer. Optionally, install a custom language model.

  4. Configure user management with role-base access control (RBAC) as described in https://documentation.suse.com/suse-ai/1.0/html/openwebui-configuring/index.html#openwebui-managing-user-roles

  5. Integrate single sign-on authentication manager—such as Okta—with Open WebUI as described in https://documentation.suse.com/suse-ai/1.0/html/openwebui-configuring/index.html#openwebui-authentication-via-okta.

  6. Configure retrieval-augmented generation(RAG) to let the model process content relevant to the customer.

2 Assigning GPU nodes to applications

When deploying a containerized application to Kubernetes, you need to ensure that containers requiring GPU resources are run on appropriate worker nodes. For example, Ollama, a core component of SUSE AI, can deeply benefit from the use of GPU acceleration. This topic describes how to satisfy this requirement by explicitly requesting GPU resources and labeling worker nodes for configuring the node selector.

Requirements
  • Kubernetes cluster—such as SUSE Rancher Prime: RKE2—must be available and configured with more than one worker node in which certain nodes have NVIDIA GPU resources and others do not.

  • This document assumes that any kind of deployment to the Kubernetes cluster is done using Helm charts.

2.1 Labeling GPU nodes

To distinguish nodes with the GPU support from non-GPU nodes, Kubernetes uses labels. Labels are used for relevant metadata and should not be confused with annotations that provide simple information about a resource. It is possible to manipulate labels with the kubectl command, as well as by tweaking configuration files from the nodes. If an IaC tool such as Terraform is used, labels can be inserted in the node resource configuration files.

To label a single node, use the following command:

> kubectl label node GPU_NODE_NAME accelerator=nvidia-gpu

To achieve the same result by tweaking the node.yaml node configuration, add the following content and apply the changes with kubectl apply -f node.yaml:

apiVersion: v1
kind: Node
metadata:
  name: node-name
  labels:
    accelerator: nvidia-gpu
Tip
Tip: Labeling multiple nodes

To label multiple nodes, use the following command:

> kubectl label node \
  GPU_NODE_NAME1 \
  GPU_NODE_NAME2 ... \
  accelerator=nvidia-gpu
Tip
Tip

If Terraform is being used as an IaC tool, you can add labels to a group of nodes by editing the .tf files and adding the following values to a resource:

resource "node_group" "example" {
  labels = {
    "accelerator" = "nvidia-gpu"
  }
}

To check if the labels are correctly applied, use the following command:

> kubectl get nodes --show-labels

2.2 Assigning GPU nodes

The matching between a container and a node is configured by the explicit resource allocation and the use of labels and node selectors. The use cases described below focus on NVIDIA GPUs.

2.2.1 Enable GPU passthrough

Containers are isolated from the host environment by default. For the containers that rely on the allocation of GPU resources, their Helm charts must enable GPU passthrough so that the container can access and use the GPU resource. Without enabling the GPU passthrough, the container may still run, but it can only use the main CPU for all computations. Refer to Ollama Helm chart for an example of the configuration required for GPU acceleration.

2.2.2 Assignment by resource request

After the NVIDIA GPU Operator is configured on a node, you can instantiate applications requesting the resource nvidia.com/gpu provided by the operator. Add the following content to your values.yaml file. Specify the number of GPUs according to your setup.

resources:
  requests:
    nvidia.com/gpu: 1
  limits:
    nvidia.com/gpu: 1

2.2.3 Assignment by labels and node selectors

If affected cluster nodes are labeled with a label such as accelerator=nvidia-gpu, you can configure the node selector to check for the label. In this case, use the following values in your values.yaml file.

nodeSelector:
  accelerator: nvidia-gpu

2.3 Verifying Ollama GPU assignment

If the GPU is correctly detected, the Ollama container logs this event:

| [...] source=routes.go:1172 msg="Listening on :11434 (version 0.0.0)"                                              │
│ [...] source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2502346830/runners                       │
│ [...] source=payload.go:44 msg="Dynamic LLM libraries [cuda_v12 cpu cpu_avx cpu_avx2]"                             │
│ [...] source=gpu.go:204 msg="looking for compatible GPUs"                                                          │
│ [...] source=types.go:105 msg="inference compute" id=GPU-c9ad37d0-d304-5d2a-c2e6-d3788cd733a7 library=cuda compute │

3 Installing Milvus

Milvus is a scalable, high-performance vector database designed for AI applications. It enables efficient organization and searching of massive unstructured datasets, including text, images and multi-modal content. This procedure walks you through the installation of Milvus and its dependencies.

3.1 Details about the Milvus application

Before deploying Milvus, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:

helm show values oci://dp.apps.rancher.io/charts/milvus

Alternatively, you can also refer to the Milvus Helm chart page on the SUSE Rancher Prime Application Collection site at https://apps.rancher.io/applications/milvus. It contains Milvus dependencies, available versions and the link to pull the Milvus container image.

Milvus page in the SUSE Rancher Prime Application Collection
Figure 1: Milvus page in the SUSE Rancher Prime Application Collection

3.2 Milvus installation procedure

Requirements

To install Milvus, you need to have the following:

  • The helm command properly installed.

  • A running Kubernetes cluster, such as SUSE Rancher Prime: K3s.

  1. Visit the SUSE Rancher Prime Application Collection, sign in and get the user access token as described in https://docs.apps.rancher.io/get-started/authentication/.

  2. (Optional) Create a Kubernetes namespace if it does not already exist. The steps in this procedure assume that all containers are deployed into the same namespace referred to as SUSE_AI_NAMESPACE. Replace its name to match your preferences.

    > kubectl create namespace SUSE_AI_NAMESPACE
  3. Create the SUSE Rancher Prime Application Collection secret.

    > kubectl create secret docker-registry application-collection \
      --docker-server=dp.apps.rancher.io \
      --docker-username=APPCO_USERNAME \
      --docker-password=APPCO_USER_TOKEN \
      -n SUSE_AI_NAMESPACE
  4. Log in to the Helm registry.

    > helm registry login dp.apps.rancher.io/charts \
      -u APPCO_USERNAME \
      -p APPCO_USER_TOKEN
  5. When installed as part of SUSE AI, Milvus depends on etcd, MinIO and Apache Kafka. Because the Milvus chart uses a non-default configuration, create an override file milvus_custom_overrides.yaml with the following content:

    global:
      imagePullSecrets:
      - application-collection
    cluster:
      enabled: True
    standalone:
      persistence:
        persistentVolumeClaim:
          storageClass: local-path
    etcd:
      replicaCount: 1
      persistence:
        storageClassName: local-path
    minio:
      mode: distributed
      replicas: 4
      rootUser: "admin"
      rootPassword: "adminminio"
      persistence:
        storageClass: local-path
      resources:
        requests:
          memory: 1024Mi
    kafka:
      enabled: true
      name: kafka
      replicaCount: 3
      broker:
        enabled: true
      cluster:
        listeners:
          client:
            protocol: 'PLAINTEXT'
          controller:
            protocol: 'PLAINTEXT'
      persistence:
        enabled: true
        annotations: {}
        labels: {}
        existingClaim: ""
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 8Gi
        storageClassName: "local-path"
    Tip
    Tip

    The above example uses local storage. For production environments, we recommend using an enterprise class storage solution such as SUSE Storage.

  6. Install the Milvus Helm chart using the milvus_custom_overrides.yaml override file.

    > helm upgrade --install milvus oci://dp.apps.rancher.io/charts/milvus \
      -n SUSE_AI_NAMESPACE \
      --version 4.2.2 -f milvus_custom_overrides.yaml

3.3 Upgrading Milvus

The Milvus chart receives application updates and updates of the Helm chart templates. New versions may include changes that require manual steps. These steps are listed in the corresponding README file. All Milvus dependencies are updated automatically during Milvus upgrade.

To upgrade Milvus, identify the new version number and run the following command below:

> helm upgrade --install milvus oci://dp.apps.rancher.io/charts/milvus \
  -n SUSE_AI_NAMESPACE \
  --version VERSION_NUMBER \
  -f milvus_custom_overrides.yaml

3.4 Uninstalling Milvus

To uninstall Milvus, run the following command:

> helm uninstall milvus -n SUSE_AI_NAMESPACE

4 Installing Ollama

Ollama is a tool for running and managing language models locally on your computer. It offers a simple interface to download, run and interact with models without relying on cloud resources.

Tip
Tip

When installing SUSE AI, Ollama is installed by the Open WebUI installation by default. If you decide to install Ollama separately, disable its installation during the installation of Open WebUI.

4.1 Details about the Ollama application

Before deploying Ollama, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:

helm show values oci://dp.apps.rancher.io/charts/ollama

Alternatively, you can also refer to the Ollama Helm chart page on the SUSE Rancher Prime Application Collection site at https://apps.rancher.io/applications/ollama. It contains the available versions and a link to pull the Ollama container image.

4.2 Ollama installation procedure

Requirements

To install Ollama, you need to have the following:

  • The helm command properly installed.

  • A running Kubernetes cluster, such as SUSE Rancher Prime: K3s.

  1. Visit the SUSE Rancher Prime Application Collection, sign in, and get the access token as described in https://docs.apps.rancher.io/get-started/authentication/.

  2. (Optional) Create a Kubernetes namespace if it does not already exist. The steps in this procedure assume that all containers are deployed into the same namespace referred to as SUSE_AI_NAMESPACE. Replace its name to match your preferences.

    > kubectl create namespace SUSE_AI_NAMESPACE
  3. Create the SUSE Rancher Prime Application Collection secret.

    > kubectl create secret docker-registry application-collection \
      --docker-server=dp.apps.rancher.io \
      --docker-username=APPCO_USERNAME \
      --docker-password=APPCO_USER_TOKEN \
      -n SUSE_AI_NAMESPACE
  4. Log in to the Helm registry.

    > helm registry login dp.apps.rancher.io/charts \
      -u APPCO_USERNAME \
      -p APPCO_USER_TOKEN
  5. Create the ollama_custom_overrides.yaml file to override the values of the parent Helm chart. Refer to Section 4.4, “Values for the Ollama Helm chart” for more details.

  6. Install the Ollama Helm chart using the ollama-custom-overrides.yaml override file.

    > helm upgrade --install ollama oci://dp.apps.rancher.io/charts/ollama \
      -n SUSE_AI_NAMESPACE \
      --version 0.54.0 -f ollama_custom_overrides.yaml

4.3 Uninstalling Ollama

To uninstall Ollama, run the following command:

> helm uninstall ollama -n SUSE_AI_NAMESPACE

4.4 Values for the Ollama Helm chart

To override the default values during the Helm chart installation or update, you can create an override YAML file with custom values. Then, apply these values by specifying the path to the override file with the -f option of the helm command.

Important
Important: GPU section

Ollama can run optimized for NVIDIA GPUs if the following conditions are fulfilled:

If you do not want to use the NVIDIA GPU, remove the gpu section from ollama_custom_overrides.yaml.

 ollama:
  [...]
  gpu:
    enabled: true
    type: 'nvidia'
    number: 1
Example 1: Basic override file with GPU and two models pulled at startup
global:
  imagePullSecrets:
  - APPCO_SECRET
ingress:
  enabled: false
defaultModel: "gemma:2b"
ollama:
  models:
    - "gemma:2b"
    - "llama3.1"
  gpu:
    enabled: true
    type: 'nvidia'
    number: 1
persistentVolume:1
  enabled: true
  storageClass: local-path2

1

Without the persistentVolume option enabled, changes made to Ollama—such as downloading other LLM— are lost when the container is restarted.

2

Use local-path storage only for testing purposes. For production use, we recommend using a storage solution suitable for persistent storage, such as SUSE Storage.

Example 2: Basic override file with Ingress and no GPU
ollama:
  models:
    - llama2
  persistentVolume:
    enabled: true
    storageClass: local-path1
ingress:
  enabled: true
    hosts:
    - host: OLLAMA_API_URL
      paths:
        - path: /
          pathType: Prefix

1

Use local-path storage only for testing purposes. For production use, we recommend using a storage solution suitable for persistent storage, such as SUSE Storage.

Table 1: Override file options for the Ollama Helm chart

Key

Type

Default

Description

affinity

object

{}

Affinity for pod assignment

autoscaling.enabled

bool

false

Enable autoscaling

autoscaling.maxReplicas

int

100

Number of maximum replicas

autoscaling.minReplicas

int

1

Number of minimum replicas

autoscaling.targetCPUUtilizationPercentage

int

80

CPU usage to target replica

extraArgs

list

[]

Additional arguments on the output Deployment definition.

extraEnv

list

[]

Additional environment variables on the output Deployment definition.

fullnameOverride

string

""

String to fully override template

global.imagePullSecrets

list

[]

Global override for container image registry pull secrets

global.imageRegistry

string

""

Global override for container image registry

hostIPC

bool

false

Use the host’s IPC namespace

hostNetwork

bool

false

Use the host's network namespace

hostPID

bool

false

Use the host's PID namespace.

image.pullPolicy

string

"IfNotPresent"

Image pull policy to use for the Ollama container

image.registry

string

"dp.apps.rancher.io"

Image registry to use for the Ollama container

image.repository

string

"containers/ollama"

Image repository to use for the Ollama container

image.tag

string

"0.3.6"

Image tag to use for the Ollama container

imagePullSecrets

list

[]

Docker registry secret names as an array

ingress.annotations

object

{}

Additional annotations for the Ingress resource

ingress.className

string

""

IngressClass that is used to implement the Ingress (Kubernetes 1.18+)

ingress.enabled

bool

false

Enable Ingress controller resource

ingress.hosts[0].host

string

"ollama.local"

ingress.hosts[0].paths[0].path

string

"/"

ingress.hosts[0].paths[0].pathType

string

"Prefix"

ingress.tls

list

[]

The TLS configuration for host names to be covered with this Ingress record

initContainers

list

[]

Init containers to add to the pod

knative.containerConcurrency

int

0

Knative service container concurrency

knative.enabled

bool

false

Enable Knative integration

knative.idleTimeoutSeconds

int

300

Knative service idle timeout seconds

knative.responseStartTimeoutSeconds

int

300

Knative service response start timeout seconds

knative.timeoutSeconds

int

300

Knative service timeout seconds

livenessProbe.enabled

bool

true

Enable livenessProbe

livenessProbe.failureThreshold

int

6

Failure threshold for livenessProbe

livenessProbe.initialDelaySeconds

int

60

Initial delay seconds for livenessProbe

livenessProbe.path

string

"/"

Request path for livenessProbe

livenessProbe.periodSeconds

int

10

Period seconds for livenessProbe

livenessProbe.successThreshold

int

1

Success threshold for livenessProbe

livenessProbe.timeoutSeconds

int

5

Timeout seconds for livenessProbe

nameOverride

string

""

String to partially override template (maintains the release name)

nodeSelector

object

{}

Node labels for pod assignment

ollama.gpu.enabled

bool

false

Enable GPU integration

ollama.gpu.number

int

1

Specify the number of GPUs

ollama.gpu.nvidiaResource

string

"nvidia.com/gpu"

Only for NVIDIA cards; change to nvidia.com/mig-1g.10gb to use MIG slice

ollama.gpu.type

string

"nvidia"

GPU type: nvidia or amd. If ollama.gpu.enabled is enabled, the default value is nvidia. If set to amd, this adds the rocm suffix to the image tag if image.tag is not override. This is because AMD and CPU/CUDA are different images.

ollama.insecure

bool

false

Add insecure flag for pulling at container startup

ollama.models

list

[]

List of models to pull at container startup. The more you add, the longer the container takes to start if models are not present models: - llama2 - mistral

ollama.mountPath

string

""

Override ollama-data volume mount path, default: "/root/.ollama"

persistentVolume.accessModes

list

["ReadWriteOnce"]

Ollama server data Persistent Volume access modes. Must match those of existing PV or dynamic provisioner, see http://kubernetes.io/docs/user-guide/persistent-volumes/

persistentVolume.annotations

object

{}

Ollama server data Persistent Volume annotations

persistentVolume.enabled

bool

false

Enable persistence using PVC

persistentVolume.existingClaim

string

""

If you want to bring your own PVC for persisting Ollama state, pass the name of the created + ready PVC here. If set, this Chart does not create the default PVC. Requires server.persistentVolume.enabled: true

persistentVolume.size

string

"30Gi"

Ollama server data Persistent Volume size

persistentVolume.storageClass

string

""

If persistentVolume.storageClass is present, and is set to either a dash (-) or empty string (), dynamic provisioning is disabled. Otherwise, the storageClassName for persistent volume claim is set to the given value specified by persistentVolume.storageClass. If persistentVolume.storageClass is absent, default storage class is be used for dynamic provisioning whenever possible. See https://kubernetes.io/docs/concepts/storage/storage-classes/ for more details.

persistentVolume.subPath

string

""

Subdirectory of Ollama server data Persistent Volume to mount. Useful if the volume's root directory is not empty.

persistentVolume.volumeMode

string

""

Ollama server data Persistent Volume Binding Mode. If empty (the default) or set to null, no volumeBindingMode specification is set, choosing the default mode.

persistentVolume.volumeName

string

""

Ollama server Persistent Volume name. It can be used to force-attach the created PVC to a specific PV.

podAnnotations

object

{}

Map of annotations to add to the pods

podLabels

object

{}

Map of labels to add to the pods

podSecurityContext

object

{}

Pod Security Context

readinessProbe.enabled

bool

true

Enable readinessProbe

readinessProbe.failureThreshold

int

6

Failure threshold for readinessProbe

readinessProbe.initialDelaySeconds

int

30

Initial delay seconds for readinessProbe

readinessProbe.path

string

"/"

Request path for readinessProbe

readinessProbe.periodSeconds

int

5

Period seconds for readinessProbe

readinessProbe.successThreshold

int

1

Success threshold for readinessProbe

readinessProbe.timeoutSeconds

int

3

Timeout seconds for readinessProbe

replicaCount

int

1

Number of replicas

resources.limits

object

{}

Pod limit

resources.requests

object

{}

Pod requests

runtimeClassName

string

""

Specify runtime class

securityContext

object

{}

Container Security Context

service.annotations

object

{}

Annotations to add to the service

service.nodePort

int

31434

Service node port when service type is NodePort

service.port

int

11434

Service port

service.type

string

"ClusterIP"

Service type

serviceAccount.annotations

object

{}

Annotations to add to the service account

serviceAccount.automount

bool

true

Whether to automatically mount a ServiceAccount's API credentials

serviceAccount.create

bool

true

Whether a service account should be created

serviceAccount.name

string

""

The name of the service account to use. If not set and create is true, a name is generated using the full name template.

tolerations

list

[]

Tolerations for pod assignment

topologySpreadConstraints

object

{}

Topology Spread Constraints for pod assignment

updateStrategy

object

{"type":""}

How to replace existing pods.

updateStrategy.type

string

""

Can be Recreate or RollingUpdate; default is RollingUpdate

volumeMounts

list

[]

Additional volumeMounts on the output Deployment definition

volumes

list

[]

Additional volumes on the output Deployment definition

5 Installing Open WebUI

Open WebUI is a Web-based user interface designed for interacting with AI models.

5.1 Details about the Open WebUI application

Before deploying Open WebUI, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:

helm show values oci://dp.apps.rancher.io/charts/open-webui

Alternatively, you can also refer to the Open WebUI Helm chart page on the SUSE Rancher Prime Application Collection site at https://apps.rancher.io/applications/open-webui. It contains available versions and the link to pull the Open WebUI container image.

5.2 Open WebUI installation procedure

Requirements

To install Open WebUI, you need to have the following:

  • The helm command properly installed.

  • A running Kubernetes cluster, such as SUSE Rancher Prime: K3s.

  1. Visit the SUSE Rancher Prime Application Collection, sign in and get the user access token as described in https://docs.apps.rancher.io/get-started/authentication/.

  2. (Optional) Create a Kubernetes namespace if it does not already exist. The steps in this procedure assume that all containers are deployed into the same namespace referred to as SUSE_AI_NAMESPACE. Replace its name to match your preferences.

    > kubectl create namespace SUSE_AI_NAMESPACE
  3. Create the SUSE Rancher Prime Application Collection secret.

    > kubectl create secret docker-registry application-collection \
      --docker-server=dp.apps.rancher.io \
      --docker-username=APPCO_USERNAME \
      --docker-password=APPCO_USER_TOKEN \
      -n SUSE_AI_NAMESPACE
  4. Log in to the Helm registry.

    > helm registry login dp.apps.rancher.io/charts \
      -u APPCO_USERNAME \
      -p APPCO_USER_TOKEN
  5. Create the owui_custom_overrides.yaml file to override the values of the parent Helm chart. The file contains URLs for Milvus and Ollama and specifies whether a stand-alone Ollama deployment is used or whether Ollama is installed as part of the Open WebUI installation. Find more details in Section 5.5, “Examples of Open WebUI Helm chart override files”. For a list of all installation options with examples, refer to Section 5.6, “Values for the Open WebUI Helm chart”.

  6. Install the Open WebUI Helm chart using the owui_custom_overrides.yaml override file.

    > helm upgrade --install open-webui oci://dp.apps.rancher.io/charts/open-webui \
      -n SUSE_AI_NAMESPACE \
      --version 3.3.2 -f owui_custom_overrides.yaml

5.3 Upgrading Open WebUI

To upgrade Open WebUI to a specific new version, run the following command:

> helm upgrade --install open-webui oci://dp.apps.rancher.io/charts/open-webui \
  -n SUSE_AI_NAMESPACE \
  --version VERSION_NUMBER \
  -f owui_custom_overrides.yaml

To upgrade Open WebUI to the latest version, run the following command:

> helm upgrade --install open-webui oci://dp.apps.rancher.io/charts/open-webui \
  -n SUSE_AI_NAMESPACE \
  -f owui_custom_overrides.yaml

5.4 Uninstalling Open WebUI

To uninstall Open WebUI, run the following command:

> helm uninstall open-webui -n SUSE_AI_NAMESPACE

5.5 Examples of Open WebUI Helm chart override files

To override the default values during the Helm chart installation or update, you can create an override YAML file with custom values. Then, apply these values by specifying the path to the override file with the -f option of the helm command.

Example 3: Open WebUI override file with Ollama included

The following override file installs Ollama during the Open WebUI installation. Replace SUSE_AI_NAMESPACE with your Kubernetes namespace.

global:
  imagePullSecrets:
  - application-collection
image:
  registry: dp.apps.rancher.io
  repository: containers/open-webui
  tag: 0.3.32
  pullPolicy: IfNotPresent
ollamaUrls:
- http://open-webui-ollama.SUSE_AI_NAMESPACE.svc.cluster.local:11434
persistence:
  enabled: true
  storageClass: local-path1
ollama:
  enabled: true
  image:
    registry: dp.apps.rancher.io
    repository: containers/ollama
    tag: 0.3.6
    pullPolicy: IfNotPresent
  ingress:
    enabled: false
  defaultModel: "gemma:2b"
  ollama:
    models:2
      - "gemma:2b"
      - "llama3.1"
    gpu:3
      enabled: true
      type: 'nvidia'
      number: 1
    persistentVolume:4
      enabled: true
      storageClass: local-path5
pipelines:
  enabled: False
  persistence:
    storageClass: local-path6
ingress:
  enabled: true
  class: ""
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  host: suse-ollama-webui
  tls: true
extraEnvVars:
- name: DEFAULT_MODELS7
  value: "gemma:2b"
- name: DEFAULT_USER_ROLE
  value: "user"
- name: WEBUI_NAME
  value: "SUSE AI"
- name: GLOBAL_LOG_LEVEL
  value: INFO
- name: RAG_EMBEDDING_MODEL
  value: "sentence-transformers/all-MiniLM-L6-v2"
- name: VECTOR_DB
  value: "milvus"
- name: MILVUS_URI
  value: http://milvus.SUSE_AI_NAMESPACE.svc.cluster.local:19530

2

Specifies that two large language models (LLM) will be loaded in Ollama when the container starts.

3

Enables GPU support for Ollama. The type must be nvidia because NVIDIA GPUs are the only supported devices. number must be between 1 and the number of NVIDIA GPUs present on the system.

4

Without the persistentVolume option enabled, changes made to Ollama—such as downloading other LLM— are lost when the container is restarted.

1 5 6

Use local-path storage only for testing purposes. For production use, we recommend using a storage solution suitable for persistent storage, such as SUSE Storage.

7

Specifies the default LLM for Ollama.

Example 4: Open WebUI override file with Ollama installed separately

The following override file installs Ollama separately from the Open WebUI installation. Replace SUSE_AI_NAMESPACE with your Kubernetes namespace.

global:
  imagePullSecrets:
  - application-collection
image:
  registry: dp.apps.rancher.io
  repository: containers/open-webui
  tag: 0.3.32
  pullPolicy: IfNotPresent
ollamaUrls:
- http://ollama.SUSE_AI_NAMESPACE.svc.cluster.local:11434
persistence:
  enabled: true
  storageClass: local-path1
ollama:
  enabled: false
pipelines:
  enabled: False
  persistence:
    storageClass: local-path2
ingress:
  enabled: true
  class: ""
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  host: suse-ollama-webui
  tls: true
extraEnvVars:
- name: DEFAULT_MODELS3
  value: "gemma:2b"
- name: DEFAULT_USER_ROLE
  value: "user"
- name: WEBUI_NAME
  value: "SUSE AI"
- name: GLOBAL_LOG_LEVEL
  value: INFO
- name: RAG_EMBEDDING_MODEL
  value: "sentence-transformers/all-MiniLM-L6-v2"
- name: VECTOR_DB
  value: "milvus"
- name: MILVUS_URI
  value: http://milvus.SUSE_AI_NAMESPACE.svc.cluster.local:19530

1 2

Use local-path storage only for testing purposes. For production use, we recommend using a storage solution suitable for persistent storage, such as SUSE Storage.

3

Specifies the default LLM for Ollama.

5.6 Values for the Open WebUI Helm chart

To override the default values during the Helm chart installation or update, you can create an override YAML file with custom values. Then, apply these values by specifying the path to the override file with the -f option of the helm command.

Table 2: Available options for the Open WebUI Helm chart

Key

Type

Default

Description

affinity

object

{}

Affinity for pod assignment

annotations

object

{}

cert-manager.enabled

bool

true

clusterDomain

string

"cluster.local"

Value of cluster domain

containerSecurityContext

object

{}

Configure container security context, see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-containe.

extraEnvVars

list

[{"name":"OPENAI_API_KEY", "value":"0p3n-w3bu!"}]

Environment variables added to the Open WebUI deployment. Most up-to-date environment variables can be found in https://docs.openwebui.com/getting-started/env-configuration/.

extraEnvVars[0]

object

{"name":"OPENAI_API_KEY","value":"0p3n-w3bu!"}

Default API key value for Pipelines. It should be updated in a production deployment and changed to the required API key if not using Pipelines.

global.imagePullSecrets

list

[]

Global override for container image registry pull secrets

global.imageRegistry

string

""

Global override for container image registry

global.tls.additionalTrustedCAs

bool

false

global.tls.issuerName

string

"suse-private-ai"

global.tls.letsEncrypt.email

string

"none@example.com"

global.tls.letsEncrypt.environment

string

"staging"

global.tls.letsEncrypt.ingress.class

string

""

global.tls.source

string

"suse-private-ai"

The source of Open WebUI TLS keys, see Section 5.6.1, “TLS sources”.

image.pullPolicy

string

"IfNotPresent"

Image pull policy to use for the Open WebUI container

image.registry

string

"dp.apps.rancher.io"

Image registry to use for the Open WebUI container

image.repository

string

"containers/open-webui"

Image repository to use for the Open WebUI container

image.tag

string

"0.3.32"

Image tag to use for the Open WebUI container

imagePullSecrets

list

[]

Configure imagePullSecrets to use private registry, see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry.

ingress.annotations

object

{"nginx.ingress.kubernetes.io/ssl-redirect":"true"}

Use appropriate annotations for your Ingress controller, such as nginx.ingress.kubernetes.io/rewrite-target: / for NGINX.

ingress.class

string

""

ingress.enabled

bool

true

ingress.existingSecret

string

""

ingress.host

string

""

ingress.tls

bool

true

nameOverride

string

""

nodeSelector

object

{}

Node labels for pod assignment

ollama.enabled

bool

true

Automatically install Ollama Helm chart from https://otwld.github.io/ollama-helm/. Configure the following Helm values.

ollama.fullnameOverride

string

"open-webui-ollama"

If enabling embedded Ollama, update fullnameOverride to your desired Ollama name value, or else it will use the default ollama.name value from the Ollama chart.

ollamaUrls

list

[]

A list of Ollama API endpoints. These can be added instead of automatically installing the Ollama Helm chart, or in addition to it.

openaiBaseApiUrl

string

""

OpenAI base API URL to use. Defaults to the Pipelines service endpoint when Pipelines are enabled, or to https://api.openai.com/v1 if Pipelines are not enabled and this value is blank.

persistence.accessModes

list

["ReadWriteOnce"]

If using multiple replicas, you must update accessModes to ReadWriteMany.

persistence.annotations

object

{}

persistence.enabled

bool

true

persistence.existingClaim

string

""

Use existingClaim to reuse an existing Open WebUI PVC instead of creating a new one.

persistence.selector

object

{}

persistence.size

string

"2Gi"

persistence.storageClass

string

""

pipelines.enabled

bool

false

Automatically install Pipelines chart to extend Open WebUI functionality using Pipelines, see https://github.com/open-webui/pipelines.

pipelines.extraEnvVars

list

[]

This section can be used to pass the required environment variables to your pipelines (such as the Langfuse host name).

podAnnotations

object

{}

podSecurityContext

object

{}

Configure pod security context, see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-containe.

replicaCount

int

1

resources

object

{}

service

object

{"annotations":{},"containerPort":8080, "labels":{},"loadBalancerClass":"", "nodePort":"","port":80,"type":"ClusterIP"}

Service values to expose Open WebUI pods to cluster

tolerations

list

[]

Tolerations for pod assignment

topologySpreadConstraints

list

[]

Topology Spread Constraints for pod assignment

5.6.1 TLS sources

There are three recommended options where Open WebUI can obtain TLS certificates for secure communication.

Self-Signed TLS certificate

This is the default method. You need to install cert-manager on the cluster to issue and maintain the certificates. This method generates a CA and signs the Open WebUI certificate using the CA. cert-manager then manages the signed certificate.

For this method, use the following Helm chart option:

global.tls.source=suse-private-ai
Let's Encrypt

This method also uses cert-manager, but it is combined with a special issuer for Let's Encrypt that performs all actions—including request and validation—to get the Let's Encrypt certificate issued. This configuration uses HTTP validation (HTTP-01) and therefore the load balancer must have a public DNS record and be accessible from the Internet.

For this method, use the following Helm chart option:

global.tls.source=letsEncrypt
Provide your own certificate

This method allows you to bring your own signed certificate to secure the HTTPS traffic. In this case, you must upload this certificate and associated key as PEM-encoded files named tls.crt and tls.key.

For this method, use the following Helm chart option:

global.tls.source=secret