An overview of SUSE AI Deployment
- WHAT?
Basic information about SUSE AI deployment workflow.
- WHY?
To better understand SUSE AI deployment process.
- EFFORT
Less than 30 minutes of reading and a basic knowledge of Linux deployment.
1 Deployment procedures #
SUSE AI is a complex product consisting of multiple software layers and components. This document outlines the complete workflow for deploying and installing all SUSE AI dependencies, as well as SUSE AI itself. You can also find references to recommended hardware and software requirements, as well as steps to take after the product installation.
For hardware, software and application-specific requirements, refer to SUSE AI requirements.
1.1 Prerequisites for customers who are not yet running a Rancher cluster #
Purchase the SUSE Rancher Prime entitlement.
Install SUSE Rancher Prime.
Deploy and configure SUSE Security.
Deploy and configure SUSE Observability.
1.2 Cluster preparation #
Install and register SUSE Linux Micro 6.0 or later on each SUSE Rancher Prime: RKE2 cluster node. Refer to https://documentation.suse.com/sle-micro/6.0/ for details.
Install the NVIDIA GPU driver on cluster nodes with GPUs. Refer to https://documentation.suse.com/suse-ai/1.0/html/NVIDIA-GPU-driver-on-SL-Micro/index.html for details.
Install SUSE Rancher Prime: RKE2 Kubernetes distribution on the cluster nodes. Refer to https://docs.rke2.io/ for details.
Install the NVIDIA GPU Operator with the additional option
--set driver.enabled=false
. Refer to https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#rancher-kubernetes-engine-2.Connect the SUSE Rancher Prime: RKE2 cluster to SUSE Rancher Prime. Refer to https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/register-existing-clusters for details.
Configure the GPU-enabled nodes so that the SUSE AI containers are assigned to Pods that run on nodes equipped with NVIDIA GPU hardware. Find more details assigning Pods to nodes in Section 2, “Assigning GPU nodes to applications”.
Configure SUSE Security to scan the nodes used for SUSE AI. Although this step is not required, we strongly encourage it to ensure the security in the production environment.
Configure SUSE Observability to observe the nodes used for SUSE AI application.
1.3 SUSE AI installation #
SUSE AI is being delivered as a set of components that you can combine to meet specific use cases. This provides extraordinary flexibility but means that there is not a single Helm chart that installs the whole stack, for example, for using the Open WebUI chatbot style application. To enable the full integrated stack, you need to deploy multiple applications in sequence. Applications with the fewest dependencies must be installed first, followed by dependent applications once their required dependencies are in place within the cluster.
Purchase the SUSE AI entitlement. It is a separate entitlement from SUSE Rancher Prime.
Access SUSE AI via the SUSE Rancher Prime Application Collection at https://apps.rancher.io/ to perform the check for the SUSE AI entitlement.
If the entitlement check is successful, you are given access to the SUSE AI-related Helm charts and container images, and can deploy directly from the SUSE Rancher Prime Application Collection.
TipAny overrides of the default values in the Helm charts—such as Open WebUI password and URL customizations—occur at this step.
Install Milvus as described in Section 3, “Installing Milvus”.
(Optional) Install Ollama as described in Section 4, “Installing Ollama”.
Install Open WebUI as described in Section 5, “Installing Open WebUI”.
1.3.1 Ollama and Open WebUI fail to deploy reporting a cert-manager error #
By default, public endpoints such as Open WebUI are protected by self-signed
SSL/TLS certificates. You also have the option to either have the
certificate issued by a public Certification Authority (CA), or bring
your own certificate (BYO). Except for the BYO case,
cert-manager
is required to facilitate the
certificate management. Therefore, you must verify that the
cert-manager
custom resource
definitions (CRDs) are installed using the
kubectl
utility, for example:
>
kubectl get crds | grep cert-manager
certificaterequests.cert-manager.io 2024-12-05T22:22:56Z certificates.cert-manager.io 2024-12-05T22:22:56Z challenges.acme.cert-manager.io 2024-12-05T22:22:57Z clusterissuers.cert-manager.io 2024-12-05T22:22:57Z issuers.cert-manager.io 2024-12-05T22:22:57Z orders.acme.cert-manager.io 2024-12-05T22:22:57Z
If the cert-manager
CRDs are not installed, install
them by running the following command:
kubectl apply -f \ https://github.com/cert-manager/cert-manager/releases/download/v1.6.2/cert-manager.crds.yaml
cert-manager
In the above command, replace v1.6.2
with the
version of the cert-manager
dependency
included in the Chart.lock
file from the unpacked
Open WebUI Helm chart.
For BYO certificates, cert-manager
is not
required. Therefore, you may disable it by adding the following option
when installing the SUSE AI components via Helm:
--set cert-manager.enabled=false
1.4 Steps after the installation is complete #
Log in to SUSE AI Open WebUI using the default credentials.
After you have logged in, update the administrator password for SUSE AI.
From the available language models, configure the one you prefer. Optionally, install a custom language model.
Configure user management with role-base access control (RBAC) as described in https://documentation.suse.com/suse-ai/1.0/html/openwebui-configuring/index.html#openwebui-managing-user-roles
Integrate single sign-on authentication manager—such as Okta—with Open WebUI as described in https://documentation.suse.com/suse-ai/1.0/html/openwebui-configuring/index.html#openwebui-authentication-via-okta.
Configure retrieval-augmented generation(RAG) to let the model process content relevant to the customer.
2 Assigning GPU nodes to applications #
When deploying a containerized application to Kubernetes, you need to ensure that containers requiring GPU resources are run on appropriate worker nodes. For example, Ollama, a core component of SUSE AI, can deeply benefit from the use of GPU acceleration. This topic describes how to satisfy this requirement by explicitly requesting GPU resources and labeling worker nodes for configuring the node selector.
Kubernetes cluster—such as SUSE Rancher Prime: RKE2—must be available and configured with more than one worker node in which certain nodes have NVIDIA GPU resources and others do not.
This document assumes that any kind of deployment to the Kubernetes cluster is done using Helm charts.
2.1 Labeling GPU nodes #
To distinguish nodes with the GPU support from non-GPU nodes, Kubernetes uses
labels. Labels are used for relevant metadata and
should not be confused with annotations that provide simple information
about a resource. It is possible to manipulate labels with the
kubectl
command, as well as by tweaking configuration
files from the nodes. If an IaC tool such as Terraform is used, labels can
be inserted in the node resource configuration files.
To label a single node, use the following command:
>
kubectl label node GPU_NODE_NAME accelerator=nvidia-gpu
To achieve the same result by tweaking the node.yaml
node configuration, add the following content and apply the changes with
kubectl apply -f node.yaml
:
apiVersion: v1 kind: Node metadata: name: node-name labels: accelerator: nvidia-gpu
To label multiple nodes, use the following command:
>
kubectl label node \ GPU_NODE_NAME1 \ GPU_NODE_NAME2 ... \ accelerator=nvidia-gpu
If Terraform is being used as an IaC tool, you can add labels to a group
of nodes by editing the .tf
files and adding the
following values to a resource:
resource "node_group" "example" { labels = { "accelerator" = "nvidia-gpu" } }
To check if the labels are correctly applied, use the following command:
>
kubectl get nodes --show-labels
2.2 Assigning GPU nodes #
The matching between a container and a node is configured by the explicit resource allocation and the use of labels and node selectors. The use cases described below focus on NVIDIA GPUs.
2.2.1 Enable GPU passthrough #
Containers are isolated from the host environment by default. For the containers that rely on the allocation of GPU resources, their Helm charts must enable GPU passthrough so that the container can access and use the GPU resource. Without enabling the GPU passthrough, the container may still run, but it can only use the main CPU for all computations. Refer to Ollama Helm chart for an example of the configuration required for GPU acceleration.
2.2.2 Assignment by resource request #
After the NVIDIA GPU Operator is configured on a node, you can instantiate
applications requesting the resource nvidia.com/gpu
provided by the operator. Add the following content to your
values.yaml
file. Specify the number of GPUs
according to your setup.
resources: requests: nvidia.com/gpu: 1 limits: nvidia.com/gpu: 1
2.2.3 Assignment by labels and node selectors #
If affected cluster nodes are labeled with a label such as
accelerator=nvidia-gpu
, you can configure the node
selector to check for the label. In this case, use the following values
in your values.yaml
file.
nodeSelector: accelerator: nvidia-gpu
2.3 Verifying Ollama GPU assignment #
If the GPU is correctly detected, the Ollama container logs this event:
| [...] source=routes.go:1172 msg="Listening on :11434 (version 0.0.0)" │ │ [...] source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2502346830/runners │ │ [...] source=payload.go:44 msg="Dynamic LLM libraries [cuda_v12 cpu cpu_avx cpu_avx2]" │ │ [...] source=gpu.go:204 msg="looking for compatible GPUs" │ │ [...] source=types.go:105 msg="inference compute" id=GPU-c9ad37d0-d304-5d2a-c2e6-d3788cd733a7 library=cuda compute │
3 Installing Milvus #
Milvus is a scalable, high-performance vector database designed for AI applications. It enables efficient organization and searching of massive unstructured datasets, including text, images and multi-modal content. This procedure walks you through the installation of Milvus and its dependencies.
3.1 Details about the Milvus application #
Before deploying Milvus, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:
helm show values oci://dp.apps.rancher.io/charts/milvus
Alternatively, you can also refer to the Milvus Helm chart page on the SUSE Rancher Prime Application Collection site at https://apps.rancher.io/applications/milvus. It contains Milvus dependencies, available versions and the link to pull the Milvus container image.
3.2 Milvus installation procedure #
To install Milvus, you need to have the following:
The
helm
command properly installed.A running Kubernetes cluster, such as SUSE Rancher Prime: K3s.
Visit the SUSE Rancher Prime Application Collection, sign in and get the user access token as described in https://docs.apps.rancher.io/get-started/authentication/.
(Optional) Create a Kubernetes namespace if it does not already exist. The steps in this procedure assume that all containers are deployed into the same namespace referred to as SUSE_AI_NAMESPACE. Replace its name to match your preferences.
>
kubectl create namespace SUSE_AI_NAMESPACE
Create the SUSE Rancher Prime Application Collection secret.
>
kubectl create secret docker-registry application-collection \ --docker-server=dp.apps.rancher.io \ --docker-username=APPCO_USERNAME \ --docker-password=APPCO_USER_TOKEN \ -n SUSE_AI_NAMESPACE
Log in to the Helm registry.
>
helm registry login dp.apps.rancher.io/charts \ -u APPCO_USERNAME \ -p APPCO_USER_TOKEN
When installed as part of SUSE AI, Milvus depends on etcd, MinIO and Apache Kafka. Because the Milvus chart uses a non-default configuration, create an override file
milvus_custom_overrides.yaml
with the following content:global: imagePullSecrets: - application-collection cluster: enabled: True standalone: persistence: persistentVolumeClaim: storageClass: local-path etcd: replicaCount: 1 persistence: storageClassName: local-path minio: mode: distributed replicas: 4 rootUser: "admin" rootPassword: "adminminio" persistence: storageClass: local-path resources: requests: memory: 1024Mi kafka: enabled: true name: kafka replicaCount: 3 broker: enabled: true cluster: listeners: client: protocol: 'PLAINTEXT' controller: protocol: 'PLAINTEXT' persistence: enabled: true annotations: {} labels: {} existingClaim: "" accessModes: - ReadWriteOnce resources: requests: storage: 8Gi storageClassName: "local-path"
TipThe above example uses local storage. For production environments, we recommend using an enterprise class storage solution such as SUSE Storage.
Install the Milvus Helm chart using the
milvus_custom_overrides.yaml
override file.>
helm upgrade --install milvus oci://dp.apps.rancher.io/charts/milvus \ -n SUSE_AI_NAMESPACE \ --version 4.2.2 -f milvus_custom_overrides.yaml
3.3 Upgrading Milvus #
The Milvus chart receives application updates and updates of the Helm
chart templates. New versions may include changes that require manual
steps. These steps are listed in the corresponding
README
file. All Milvus dependencies are updated
automatically during Milvus upgrade.
To upgrade Milvus, identify the new version number and run the following command below:
>
helm upgrade --install milvus oci://dp.apps.rancher.io/charts/milvus \ -n SUSE_AI_NAMESPACE \ --version VERSION_NUMBER \ -f milvus_custom_overrides.yaml
3.4 Uninstalling Milvus #
To uninstall Milvus, run the following command:
>
helm uninstall milvus -n SUSE_AI_NAMESPACE
4 Installing Ollama #
Ollama is a tool for running and managing language models locally on your computer. It offers a simple interface to download, run and interact with models without relying on cloud resources.
When installing SUSE AI, Ollama is installed by the Open WebUI installation by default. If you decide to install Ollama separately, disable its installation during the installation of Open WebUI.
4.1 Details about the Ollama application #
Before deploying Ollama, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:
helm show values oci://dp.apps.rancher.io/charts/ollama
Alternatively, you can also refer to the Ollama Helm chart page on the SUSE Rancher Prime Application Collection site at https://apps.rancher.io/applications/ollama. It contains the available versions and a link to pull the Ollama container image.
4.2 Ollama installation procedure #
To install Ollama, you need to have the following:
The
helm
command properly installed.A running Kubernetes cluster, such as SUSE Rancher Prime: K3s.
Visit the SUSE Rancher Prime Application Collection, sign in, and get the access token as described in https://docs.apps.rancher.io/get-started/authentication/.
(Optional) Create a Kubernetes namespace if it does not already exist. The steps in this procedure assume that all containers are deployed into the same namespace referred to as SUSE_AI_NAMESPACE. Replace its name to match your preferences.
>
kubectl create namespace SUSE_AI_NAMESPACE
Create the SUSE Rancher Prime Application Collection secret.
>
kubectl create secret docker-registry application-collection \ --docker-server=dp.apps.rancher.io \ --docker-username=APPCO_USERNAME \ --docker-password=APPCO_USER_TOKEN \ -n SUSE_AI_NAMESPACE
Log in to the Helm registry.
>
helm registry login dp.apps.rancher.io/charts \ -u APPCO_USERNAME \ -p APPCO_USER_TOKEN
Create the
ollama_custom_overrides.yaml
file to override the values of the parent Helm chart. Refer to Section 4.4, “Values for the Ollama Helm chart” for more details.Install the Ollama Helm chart using the
ollama-custom-overrides.yaml
override file.>
helm upgrade --install ollama oci://dp.apps.rancher.io/charts/ollama \ -n SUSE_AI_NAMESPACE \ --version 0.54.0 -f ollama_custom_overrides.yaml
4.3 Uninstalling Ollama #
To uninstall Ollama, run the following command:
>
helm uninstall ollama -n SUSE_AI_NAMESPACE
4.4 Values for the Ollama Helm chart #
To override the default values during the Helm chart installation or update,
you can create an override YAML file with custom values. Then, apply these
values by specifying the path to the override file with the
-f
option of the helm
command.
Ollama can run optimized for NVIDIA GPUs if the following conditions are fulfilled:
The NVIDIA driver and NVIDIA GPU Operator are installed as described in https://documentation.suse.com/suse-ai/1.0/html/NVIDIA-GPU-driver-on-SL-Micro/index.html.
The workloads are set to run on NVIDIA-enabled nodes as described in https://documentation.suse.com/suse-ai/1.0/html/AI-deployment-intro/index.html#ai-gpu-nodes-assigning.
If you do not want to use the NVIDIA GPU, remove the
gpu
section from
ollama_custom_overrides.yaml
.
ollama: [...] gpu: enabled: true type: 'nvidia' number: 1
global: imagePullSecrets: - APPCO_SECRET ingress: enabled: false defaultModel: "gemma:2b" ollama: models: - "gemma:2b" - "llama3.1" gpu: enabled: true type: 'nvidia' number: 1 persistentVolume:1 enabled: true storageClass: local-path2
Without the | |
Use |
ollama:
models:
- llama2
persistentVolume:
enabled: true
storageClass: local-path1
ingress:
enabled: true
hosts:
- host: OLLAMA_API_URL
paths:
- path: /
pathType: Prefix
Use |
Key |
Type |
Default |
Description |
---|---|---|---|
affinity |
object |
{} |
Affinity for pod assignment |
autoscaling.enabled |
bool |
false |
Enable autoscaling |
autoscaling.maxReplicas |
int |
100 |
Number of maximum replicas |
autoscaling.minReplicas |
int |
1 |
Number of minimum replicas |
autoscaling.targetCPUUtilizationPercentage |
int |
80 |
CPU usage to target replica |
extraArgs |
list |
[] |
Additional arguments on the output Deployment definition. |
extraEnv |
list |
[] |
Additional environment variables on the output Deployment definition. |
fullnameOverride |
string |
"" |
String to fully override template |
global.imagePullSecrets |
list |
[] |
Global override for container image registry pull secrets |
global.imageRegistry |
string |
"" |
Global override for container image registry |
hostIPC |
bool |
false |
Use the host’s IPC namespace |
hostNetwork |
bool |
false |
Use the host's network namespace |
hostPID |
bool |
false |
Use the host's PID namespace. |
image.pullPolicy |
string |
"IfNotPresent" |
Image pull policy to use for the Ollama container |
image.registry |
string |
"dp.apps.rancher.io" |
Image registry to use for the Ollama container |
image.repository |
string |
"containers/ollama" |
Image repository to use for the Ollama container |
image.tag |
string |
"0.3.6" |
Image tag to use for the Ollama container |
imagePullSecrets |
list |
[] |
Docker registry secret names as an array |
ingress.annotations |
object |
{} |
Additional annotations for the Ingress resource |
ingress.className |
string |
"" |
IngressClass that is used to implement the Ingress (Kubernetes 1.18+) |
ingress.enabled |
bool |
false |
Enable Ingress controller resource |
ingress.hosts[0].host |
string |
"ollama.local" | |
ingress.hosts[0].paths[0].path |
string |
"/" | |
ingress.hosts[0].paths[0].pathType |
string |
"Prefix" | |
ingress.tls |
list |
[] |
The TLS configuration for host names to be covered with this Ingress record |
initContainers |
list |
[] |
Init containers to add to the pod |
knative.containerConcurrency |
int |
0 |
Knative service container concurrency |
knative.enabled |
bool |
false |
Enable Knative integration |
knative.idleTimeoutSeconds |
int |
300 |
Knative service idle timeout seconds |
knative.responseStartTimeoutSeconds |
int |
300 |
Knative service response start timeout seconds |
knative.timeoutSeconds |
int |
300 |
Knative service timeout seconds |
livenessProbe.enabled |
bool |
true |
Enable livenessProbe |
livenessProbe.failureThreshold |
int |
6 |
Failure threshold for livenessProbe |
livenessProbe.initialDelaySeconds |
int |
60 |
Initial delay seconds for livenessProbe |
livenessProbe.path |
string |
"/" |
Request path for livenessProbe |
livenessProbe.periodSeconds |
int |
10 |
Period seconds for livenessProbe |
livenessProbe.successThreshold |
int |
1 |
Success threshold for livenessProbe |
livenessProbe.timeoutSeconds |
int |
5 |
Timeout seconds for livenessProbe |
nameOverride |
string |
"" |
String to partially override template (maintains the release name) |
nodeSelector |
object |
{} |
Node labels for pod assignment |
ollama.gpu.enabled |
bool |
false |
Enable GPU integration |
ollama.gpu.number |
int |
1 |
Specify the number of GPUs |
ollama.gpu.nvidiaResource |
string |
"nvidia.com/gpu" |
Only for NVIDIA cards; change to
|
ollama.gpu.type |
string |
"nvidia" |
GPU type: “nvidia” or “amd.” If “ollama.gpu.enabled” is enabled, the default value is “nvidia.” If set to “amd,” this adds the “rocm” suffix to the image tag if “image.tag” is not override. This is because AMD and CPU/CUDA are different images. |
ollama.insecure |
bool |
false |
Add insecure flag for pulling at container startup |
ollama.models |
list |
[] |
List of models to pull at container startup. The more you add, the longer the container takes to start if models are not present models: - llama2 - mistral |
ollama.mountPath |
string |
"" |
Override ollama-data volume mount path, default: "/root/.ollama" |
persistentVolume.accessModes |
list |
["ReadWriteOnce"] |
Ollama server data Persistent Volume access modes. Must match those of existing PV or dynamic provisioner, see http://kubernetes.io/docs/user-guide/persistent-volumes/ |
persistentVolume.annotations |
object |
{} |
Ollama server data Persistent Volume annotations |
persistentVolume.enabled |
bool |
false |
Enable persistence using PVC |
persistentVolume.existingClaim |
string |
"" |
If you want to bring your own PVC for persisting Ollama state,
pass the name of the created + ready PVC here. If set, this Chart
does not create the default PVC. Requires
|
persistentVolume.size |
string |
"30Gi" |
Ollama server data Persistent Volume size |
persistentVolume.storageClass |
string |
"" |
If persistentVolume.storageClass is present, and is set to either a dash (“-”) or empty string (“”), dynamic provisioning is disabled. Otherwise, the storageClassName for persistent volume claim is set to the given value specified by persistentVolume.storageClass. If persistentVolume.storageClass is absent, default storage class is be used for dynamic provisioning whenever possible. See https://kubernetes.io/docs/concepts/storage/storage-classes/ for more details. |
persistentVolume.subPath |
string |
"" |
Subdirectory of Ollama server data Persistent Volume to mount. Useful if the volume's root directory is not empty. |
persistentVolume.volumeMode |
string |
"" |
Ollama server data Persistent Volume Binding Mode. If empty (the default) or set to null, no volumeBindingMode specification is set, choosing the default mode. |
persistentVolume.volumeName |
string |
"" |
Ollama server Persistent Volume name. It can be used to force-attach the created PVC to a specific PV. |
podAnnotations |
object |
{} |
Map of annotations to add to the pods |
podLabels |
object |
{} |
Map of labels to add to the pods |
podSecurityContext |
object |
{} |
Pod Security Context |
readinessProbe.enabled |
bool |
true |
Enable readinessProbe |
readinessProbe.failureThreshold |
int |
6 |
Failure threshold for readinessProbe |
readinessProbe.initialDelaySeconds |
int |
30 |
Initial delay seconds for readinessProbe |
readinessProbe.path |
string |
"/" |
Request path for readinessProbe |
readinessProbe.periodSeconds |
int |
5 |
Period seconds for readinessProbe |
readinessProbe.successThreshold |
int |
1 |
Success threshold for readinessProbe |
readinessProbe.timeoutSeconds |
int |
3 |
Timeout seconds for readinessProbe |
replicaCount |
int |
1 |
Number of replicas |
resources.limits |
object |
{} |
Pod limit |
resources.requests |
object |
{} |
Pod requests |
runtimeClassName |
string |
"" |
Specify runtime class |
securityContext |
object |
{} |
Container Security Context |
service.annotations |
object |
{} |
Annotations to add to the service |
service.nodePort |
int |
31434 |
Service node port when service type is “NodePort” |
service.port |
int |
11434 |
Service port |
service.type |
string |
"ClusterIP" |
Service type |
serviceAccount.annotations |
object |
{} |
Annotations to add to the service account |
serviceAccount.automount |
bool |
true |
Whether to automatically mount a ServiceAccount's API credentials |
serviceAccount.create |
bool |
true |
Whether a service account should be created |
serviceAccount.name |
string |
"" |
The name of the service account to use. If not set and “create” is “true”, a name is generated using the full name template. |
tolerations |
list |
[] |
Tolerations for pod assignment |
topologySpreadConstraints |
object |
{} |
Topology Spread Constraints for pod assignment |
updateStrategy |
object |
{"type":""} |
How to replace existing pods. |
updateStrategy.type |
string |
"" |
Can be “Recreate” or “RollingUpdate”; default is “RollingUpdate” |
volumeMounts |
list |
[] |
Additional volumeMounts on the output Deployment definition |
volumes |
list |
[] |
Additional volumes on the output Deployment definition |
5 Installing Open WebUI #
Open WebUI is a Web-based user interface designed for interacting with AI models.
5.1 Details about the Open WebUI application #
Before deploying Open WebUI, it is important to know more about the supported configurations and documentation. The following command provides the corresponding details:
helm show values oci://dp.apps.rancher.io/charts/open-webui
Alternatively, you can also refer to the Open WebUI Helm chart page on the SUSE Rancher Prime Application Collection site at https://apps.rancher.io/applications/open-webui. It contains available versions and the link to pull the Open WebUI container image.
5.2 Open WebUI installation procedure #
To install Open WebUI, you need to have the following:
The
helm
command properly installed.A running Kubernetes cluster, such as SUSE Rancher Prime: K3s.
Visit the SUSE Rancher Prime Application Collection, sign in and get the user access token as described in https://docs.apps.rancher.io/get-started/authentication/.
(Optional) Create a Kubernetes namespace if it does not already exist. The steps in this procedure assume that all containers are deployed into the same namespace referred to as SUSE_AI_NAMESPACE. Replace its name to match your preferences.
>
kubectl create namespace SUSE_AI_NAMESPACE
Create the SUSE Rancher Prime Application Collection secret.
>
kubectl create secret docker-registry application-collection \ --docker-server=dp.apps.rancher.io \ --docker-username=APPCO_USERNAME \ --docker-password=APPCO_USER_TOKEN \ -n SUSE_AI_NAMESPACE
Log in to the Helm registry.
>
helm registry login dp.apps.rancher.io/charts \ -u APPCO_USERNAME \ -p APPCO_USER_TOKEN
Create the
owui_custom_overrides.yaml
file to override the values of the parent Helm chart. The file contains URLs for Milvus and Ollama and specifies whether a stand-alone Ollama deployment is used or whether Ollama is installed as part of the Open WebUI installation. Find more details in Section 5.5, “Examples of Open WebUI Helm chart override files”. For a list of all installation options with examples, refer to Section 5.6, “Values for the Open WebUI Helm chart”.Install the Open WebUI Helm chart using the
owui_custom_overrides.yaml
override file.>
helm upgrade --install open-webui oci://dp.apps.rancher.io/charts/open-webui \ -n SUSE_AI_NAMESPACE \ --version 3.3.2 -f owui_custom_overrides.yaml
5.3 Upgrading Open WebUI #
To upgrade Open WebUI to a specific new version, run the following command:
>
helm upgrade --install open-webui oci://dp.apps.rancher.io/charts/open-webui \ -n SUSE_AI_NAMESPACE \ --version VERSION_NUMBER \ -f owui_custom_overrides.yaml
To upgrade Open WebUI to the latest version, run the following command:
>
helm upgrade --install open-webui oci://dp.apps.rancher.io/charts/open-webui \ -n SUSE_AI_NAMESPACE \ -f owui_custom_overrides.yaml
5.4 Uninstalling Open WebUI #
To uninstall Open WebUI, run the following command:
>
helm uninstall open-webui -n SUSE_AI_NAMESPACE
5.5 Examples of Open WebUI Helm chart override files #
To override the default values during the Helm chart installation or update,
you can create an override YAML file with custom values. Then, apply these
values by specifying the path to the override file with the
-f
option of the helm
command.
The following override file installs Ollama during the Open WebUI installation. Replace SUSE_AI_NAMESPACE with your Kubernetes namespace.
global: imagePullSecrets: - application-collection image: registry: dp.apps.rancher.io repository: containers/open-webui tag: 0.3.32 pullPolicy: IfNotPresent ollamaUrls: - http://open-webui-ollama.SUSE_AI_NAMESPACE.svc.cluster.local:11434 persistence: enabled: true storageClass: local-path1 ollama: enabled: true image: registry: dp.apps.rancher.io repository: containers/ollama tag: 0.3.6 pullPolicy: IfNotPresent ingress: enabled: false defaultModel: "gemma:2b" ollama: models:2 - "gemma:2b" - "llama3.1" gpu:3 enabled: true type: 'nvidia' number: 1 persistentVolume:4 enabled: true storageClass: local-path5 pipelines: enabled: False persistence: storageClass: local-path6 ingress: enabled: true class: "" annotations: nginx.ingress.kubernetes.io/ssl-redirect: "true" host: suse-ollama-webui tls: true extraEnvVars: - name: DEFAULT_MODELS7 value: "gemma:2b" - name: DEFAULT_USER_ROLE value: "user" - name: WEBUI_NAME value: "SUSE AI" - name: GLOBAL_LOG_LEVEL value: INFO - name: RAG_EMBEDDING_MODEL value: "sentence-transformers/all-MiniLM-L6-v2" - name: VECTOR_DB value: "milvus" - name: MILVUS_URI value: http://milvus.SUSE_AI_NAMESPACE.svc.cluster.local:19530
Specifies that two large language models (LLM) will be loaded in Ollama when the container starts. | |
Enables GPU support for Ollama. The | |
Without the | |
Use | |
Specifies the default LLM for Ollama. |
The following override file installs Ollama separately from the Open WebUI installation. Replace SUSE_AI_NAMESPACE with your Kubernetes namespace.
global: imagePullSecrets: - application-collection image: registry: dp.apps.rancher.io repository: containers/open-webui tag: 0.3.32 pullPolicy: IfNotPresent ollamaUrls: - http://ollama.SUSE_AI_NAMESPACE.svc.cluster.local:11434 persistence: enabled: true storageClass: local-path1 ollama: enabled: false pipelines: enabled: False persistence: storageClass: local-path2 ingress: enabled: true class: "" annotations: nginx.ingress.kubernetes.io/ssl-redirect: "true" host: suse-ollama-webui tls: true extraEnvVars: - name: DEFAULT_MODELS3 value: "gemma:2b" - name: DEFAULT_USER_ROLE value: "user" - name: WEBUI_NAME value: "SUSE AI" - name: GLOBAL_LOG_LEVEL value: INFO - name: RAG_EMBEDDING_MODEL value: "sentence-transformers/all-MiniLM-L6-v2" - name: VECTOR_DB value: "milvus" - name: MILVUS_URI value: http://milvus.SUSE_AI_NAMESPACE.svc.cluster.local:19530
5.6 Values for the Open WebUI Helm chart #
To override the default values during the Helm chart installation or update,
you can create an override YAML file with custom values. Then, apply these
values by specifying the path to the override file with the
-f
option of the helm
command.
Key |
Type |
Default |
Description |
---|---|---|---|
affinity |
object |
{} |
Affinity for pod assignment |
annotations |
object |
{} | |
cert-manager.enabled |
bool |
true | |
clusterDomain |
string |
"cluster.local" |
Value of cluster domain |
containerSecurityContext |
object |
{} |
Configure container security context, see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-containe. |
extraEnvVars |
list |
[{"name":"OPENAI_API_KEY", "value":"0p3n-w3bu!"}] |
Environment variables added to the Open WebUI deployment. Most up-to-date environment variables can be found in https://docs.openwebui.com/getting-started/env-configuration/. |
extraEnvVars[0] |
object |
{"name":"OPENAI_API_KEY","value":"0p3n-w3bu!"} |
Default API key value for Pipelines. It should be updated in a production deployment and changed to the required API key if not using Pipelines. |
global.imagePullSecrets |
list |
[] |
Global override for container image registry pull secrets |
global.imageRegistry |
string |
"" |
Global override for container image registry |
global.tls.additionalTrustedCAs |
bool |
false | |
global.tls.issuerName |
string |
"suse-private-ai" | |
global.tls.letsEncrypt.email |
string |
"none@example.com" | |
global.tls.letsEncrypt.environment |
string |
"staging" | |
global.tls.letsEncrypt.ingress.class |
string |
"" | |
global.tls.source |
string |
"suse-private-ai" |
The source of Open WebUI TLS keys, see Section 5.6.1, “TLS sources”. |
image.pullPolicy |
string |
"IfNotPresent" |
Image pull policy to use for the Open WebUI container |
image.registry |
string |
"dp.apps.rancher.io" |
Image registry to use for the Open WebUI container |
image.repository |
string |
"containers/open-webui" |
Image repository to use for the Open WebUI container |
image.tag |
string |
"0.3.32" |
Image tag to use for the Open WebUI container |
imagePullSecrets |
list |
[] |
Configure imagePullSecrets to use private registry, see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry. |
ingress.annotations |
object |
{"nginx.ingress.kubernetes.io/ssl-redirect":"true"} |
Use appropriate annotations for your Ingress controller, such as
|
ingress.class |
string |
"" | |
ingress.enabled |
bool |
true | |
ingress.existingSecret |
string |
"" | |
ingress.host |
string |
"" | |
ingress.tls |
bool |
true | |
nameOverride |
string |
"" | |
nodeSelector |
object |
{} |
Node labels for pod assignment |
ollama.enabled |
bool |
true |
Automatically install Ollama Helm chart from https://otwld.github.io/ollama-helm/. Configure the following Helm values. |
ollama.fullnameOverride |
string |
"open-webui-ollama" |
If enabling embedded Ollama, update fullnameOverride to your desired Ollama name value, or else it will use the default ollama.name value from the Ollama chart. |
ollamaUrls |
list |
[] |
A list of Ollama API endpoints. These can be added instead of automatically installing the Ollama Helm chart, or in addition to it. |
openaiBaseApiUrl |
string |
"" |
OpenAI base API URL to use. Defaults to the Pipelines service
endpoint when Pipelines are enabled, or to
|
persistence.accessModes |
list |
["ReadWriteOnce"] |
If using multiple replicas, you must update accessModes to ReadWriteMany. |
persistence.annotations |
object |
{} | |
persistence.enabled |
bool |
true | |
persistence.existingClaim |
string |
"" |
Use existingClaim to reuse an existing Open WebUI PVC instead of creating a new one. |
persistence.selector |
object |
{} | |
persistence.size |
string |
"2Gi" | |
persistence.storageClass |
string |
"" | |
pipelines.enabled |
bool |
false |
Automatically install Pipelines chart to extend Open WebUI functionality using Pipelines, see https://github.com/open-webui/pipelines. |
pipelines.extraEnvVars |
list |
[] |
This section can be used to pass the required environment variables to your pipelines (such as the Langfuse host name). |
podAnnotations |
object |
{} | |
podSecurityContext |
object |
{} |
Configure pod security context, see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-containe. |
replicaCount |
int |
1 | |
resources |
object |
{} | |
service |
object |
{"annotations":{},"containerPort":8080, "labels":{},"loadBalancerClass":"", "nodePort":"","port":80,"type":"ClusterIP"} |
Service values to expose Open WebUI pods to cluster |
tolerations |
list |
[] |
Tolerations for pod assignment |
topologySpreadConstraints |
list |
[] |
Topology Spread Constraints for pod assignment |
5.6.1 TLS sources #
There are three recommended options where Open WebUI can obtain TLS certificates for secure communication.
- Self-Signed TLS certificate
This is the default method. You need to install
cert-manager
on the cluster to issue and maintain the certificates. This method generates a CA and signs the Open WebUI certificate using the CA.cert-manager
then manages the signed certificate.For this method, use the following Helm chart option:
global.tls.source=suse-private-ai
- Let's Encrypt
This method also uses
cert-manager
, but it is combined with a special issuer for Let's Encrypt that performs all actions—including request and validation—to get the Let's Encrypt certificate issued. This configuration uses HTTP validation (HTTP-01) and therefore the load balancer must have a public DNS record and be accessible from the Internet.For this method, use the following Helm chart option:
global.tls.source=letsEncrypt
- Provide your own certificate
This method allows you to bring your own signed certificate to secure the HTTPS traffic. In this case, you must upload this certificate and associated key as PEM-encoded files named
tls.crt
andtls.key
.For this method, use the following Helm chart option:
global.tls.source=secret
6 Legal Notice #
Copyright© 2006–2024 SUSE LLC and contributors. All rights reserved.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”.
For SUSE trademarks, see https://www.suse.com/company/legal/. All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.
All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors, nor the translators shall be held liable for possible errors or the consequences thereof.