Monitoring SUSE AI with OpenTelemetry and SUSE Observability

1 Introduction #

This document focuses on techniques for gathering telemetry data from all SUSE AI components.

For most of the components, it presents two distinct paths:

Recommended settings: Straightforward configurations designed to utilize the SUSE Observability Extension, providing a quick solution for your environment.
Advanced configuration: For users who require deeper and more granular control. Advanced options unlock additional observability signals that are relevant for specialized analysis and fine-tuning.

Several setups are specific to the product, while others—particularly for scraping metrics—are configured directly within the OpenTelemetry Collector. By implementing the recommended settings, you can visualize the complete topology of your services and operations, bringing clarity to your SUSE AI environment.

1.1 What is SUSE AI monitoring? #

Monitoring SUSE AI involves observing and analyzing the behavior, performance and health of its components. In a complex, distributed system like SUSE AI, this is achieved by collecting and interpreting telemetry data. This data is typically categorized into the three pillars of observability:

Metrics: Numerical data representing system performance, such as CPU usage, memory consumption or request latency.
Logs: Time-stamped text records of events that occurred within the system, useful for debugging and auditing.
Traces: A representation of the path of a request as it travels through all the different services in the system. Traces are essential for understanding performance bottlenecks and errors in the system architecture.

1.2 How monitoring works #

SUSE AI uses OpenTelemetry, an open-source observability framework, for instrumenting applications. Instrumentation is the process of adding code to an application to generate telemetry data. By using OpenTelemetry, SUSE AI ensures a standardized, vendor-neutral approach to data collection.

The collected data is then sent to SUSE Observability, which provides a comprehensive platform for visualizing, analyzing and alerting on the telemetry data. This allows administrators and developers to gain deep insights into the system, maintain optimal performance, and troubleshoot issues effectively.

2 Monitoring GPU usage #

To effectively monitor the performance and utilization of your GPUs, configure the OpenTelemetry Collector to scrape metrics from the NVIDIA DCGM Exporter, which is deployed as part of the NVIDIA GPU Operator.

Procedure 1: Collect GPU metrics (recommended) #

Grant permissions (RBAC). The OpenTelemetry Collector requires specific permissions to discover the GPU metrics endpoint within the cluster.

Create a file named otel-rbac.yaml with the following content. It defines a Role with permissions to get services and endpoints, and a RoleBinding to grant these permissions to the OpenTelemetry Collector's service account.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: suse-observability-otel-scraper
rules:
  - apiGroups:
      - ""
    resources:
      - services
      - endpoints
    verbs:
      - list
      - watch

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: suse-observability-otel-scraper
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: suse-observability-otel-scraper
subjects:
  - kind: ServiceAccount
    name: OPENTELEMETRY-COLLECTOR
    namespace: OBSERVABILITY
---

Important

Verify that the ServiceAccount name and namespace in the RoleBinding match your OpenTelemetry Collector's deployment.

Apply this configuration to the gpu-operator namespace.
```
> kubectl apply -n gpu-operator -f otel-rbac.yaml
```

Configure the OpenTelemetry Collector. Add the following Prometheus receiver configuration to your OpenTelemetry Collector's values file. This tells the collector to scrape metrics from any endpoint in the gpu-operator namespace every 10 seconds.

config:
  receivers:
    prometheus:
      config:
        scrape_configs:
          - job_name: 'gpu-metrics'
            scrape_interval: 10s
            scheme: http
            kubernetes_sd_configs:
              - role: endpoints
                namespaces:
                  names:
                    - gpu-operator

3 Monitoring Open WebUI #

The preferred way of retrieving relevant telemetry data from Open WebUI is to use the SUSE AI Filter. It requires enabling and configuring Open WebUI Pipelines.

Procedure 2: Configuring pipeline filter during Open WebUI installation (recommended) #

Verify that the Open WebUI installation override file owui_custom_overrides.yaml includes the following content.

pipelines:
  enabled: true
  persistence:
    storageClass: longhorn 1
  extraEnvVars: 2
    - name: PIPELINES_URLS 3
      value: "https://raw.githubusercontent.com/SUSE/suse-ai-observability-extension/refs/heads/main/integrations/oi-filter/suse_ai_filter.py"
    - name: OTEL_SERVICE_NAME 4
      value: "Open WebUI"
    - name: OTEL_EXPORTER_HTTP_OTLP_ENDPOINT 5
      value: "http://opentelemetry-collector.suse-observability.svc.cluster.local:4318"
    - name: PRICING_JSON 6
      value: "https://raw.githubusercontent.com/SUSE/suse-ai-observability-extension/refs/heads/main/integrations/oi-filter/pricing.json"
extraEnvVars:
- name: OPENAI_API_KEY 7
  value: "0p3n-w3bu!"

Note

In the above example, there are two extraEnvVars blocks: one at the root level and another inside the pipelines configuration. The root-level extraEnvVars is fed into Open WebUI to configure the communication between Open WebUI and Open WebUI Pipelines. The extraEnvVars inside the configuration are injected into the container that acts as a runtime for the pipelines.

1	`longhorn` or `local-path`.
2	The environment variables that you are making available for the pipeline's runtime container.
3	A list of pipeline URLs to be downloaded and installed by default. Individual URLs are separated by a semicolon `;`. For air-gapped deployments, you need to provide the pipelines at URLs that are accessible from the local host, such as an internal GitLab instance.
4	The service name that appears in traces and topological representations in SUSE Observability.
5	The endpoint for the OpenTelemetry collector. Make sure to use the HTTP port of your collector.
6	A file for the model multipliers in cost estimation. You can customize it to match your actual infrastructure experimentally. For air-gapped deployments, you need to provide the pipelines at URLs that are accessible from the local host, such as an internal GitLab instance.
7	The value for the API key between Open WebUI and Open WebUI Pipelines. The default value is “0p3n-w3bu!”.

After you fill the override file with correct values, install or update Open WebUI.
```
> helm upgrade \
  --install open-webui oci://dp.apps.rancher.io/charts/open-webui \
  -n SUSE_AI_NAMESPACE \
  --create-namespace \
  --version 7.2.0 \
  -f owui_custom_overrides.yaml
```
Tip
Make sure to set the version, namespace and other options to the proper values.
After the installation is successful, you can access tracing data in SUSE Observability for each chat.
Tip
You can verify that a new connection was created with correct credentials in Admin Panel › Settings › Connections.
Figure 1: New connection added for the pipeline #

Procedure 3: Configuring a pipeline filter in Open WebUI (recommended) #

If you already have a running instance of Open WebUI with the pipelines enabled and configured, you can set up the SUSE AI Filter in its Web user interface.

Requirements #

You must have Open WebUI administrator privileges to access configuration screens or settings mentioned in this section.

In the bottom left of the Open WebUI window, click your avatar icon to open the user menu and select Admin Panel.
Click the Settings tab and select Pipelines from the left menu.
In the Install from Github URL section, enter https://raw.githubusercontent.com/SUSE/suse-ai-observability-extension/refs/heads/main/integrations/oi-filter/suse_ai_filter.py and click the upload button on the right to upload the pipeline from the URL.
After the upload is finished, you can review the configuration of the pipeline. Confirm with Save.
Figure 2: Adding SUSE AI filter pipeline #

Procedure 4: Configuring default Open WebUI metrics and traces (advanced) #

Open WebUI also offers certain built-in OpenTelemetry integration for traces and metrics. These signals are related to the API consumption but do not provide details about the GenAI monitoring. That is why we need to configure the SUSE AI filter as described in Procedure 2, “Configuring pipeline filter during Open WebUI installation (recommended)”.

Append the following environment variables to your extraEnvVars section in the owui_custom_overrides.yaml file mentioned in Procedure 2, “Configuring pipeline filter during Open WebUI installation (recommended)”.

[...]
extraEnvVars:
- name: OPENAI_API_KEY
  value: "0p3n-w3bu!"
- name: ENABLE_OTEL
  value: "true"
- name: ENABLE_OTEL_METRICS
  value: "true"
- name: OTEL_EXPORTER_OTLP_INSECURE
  value: "false" 1
- name: OTEL_EXPORTER_OTLP_ENDPOINT
  value: CUSTOM_OTEL_ENDPOINT 2
- name: OTEL_SERVICE_NAME
  value: CUSTOM_OTEL_IDENTIFYER 3

1	Set to `"true"` for testing or controlled environments, and `"false"` for production deployments with TLS communication.
2	Enter your custom OpenTelemetry collector endpoint URL, such as `"http://opentelemetry-collector.suse-observability.svc.cluster.local:4318"`.
3	Specify a custom identifier for the OpenTelemetry service, such as `"OI Core""`.

Save the enhanced override file and update Open WebUI:

> helm upgrade \
  --install open-webui oci://dp.apps.rancher.io/charts/open-webui \
  -n SUSE_AI_NAMESPACE \
  --create-namespace \
  --version 7.2.0 \
  -f owui_custom_overrides.yaml

4 Monitoring Milvus #

Milvus is monitored by scraping its Prometheus-compatible metrics endpoint. The SUSE Observability Extension uses these metrics to visualize Milvus's status and activity.

4.1 Scraping the metrics (recommended) #

Add the following job to the scrape_configs section of your OpenTelemetry Collector's configuration. It instructs the collector to scrape the /metrics endpoint of the Milvus service every 15 seconds.

config:
  receivers:
    prometheus:
      config:
        scrape_configs:
          - job_name: 'milvus'
            scrape_interval: 15s
            metrics_path: '/metrics'
 
            static_configs:
            - targets: ['milvus.SUSE_AI_NAMESPACE.svc.cluster.local:9091'] 1

1	Your Milvus service metrics endpoint. The example `milvus.SUSE_AI_NAMESPACE.svc.cluster.local:9091` is a common default, but you should verify that it matches your installation service name and namespace.

4.2 Tracing (advanced) #

Milvus can also export detailed tracing data.

Important: High data volume

Enabling tracing in Milvus can generate a large amount of data. We recommend configuring sampling at the collector level to avoid performance issues and high storage costs.

To enable tracing, configure the following settings in your Milvus Helm chart values:

extraConfigFiles:
  user.yaml: |+
    trace:
      exporter: jaeger
      sampleFraction: 1
      jaeger: url: "http://opentelemetry-collector.observability.svc.cluster.local:14268/api/traces" 1

1	The URL of the OpenTelemetry Collector installed by the user.

5 Monitoring vLLM #

vLLM is monitored by scraping its Prometheus-compatible metrics endpoints. The SUSE Observability Extension uses these metrics to visualize vLLM's status and activity.

5.1 Metrics Scraping (recommended) #

Add the following job to the scrape_configs section of your OpenTelemetry Collector's configuration. This configures the collector to scrape the /metrics endpoint from all vLLM services every 10 seconds. Remember that you can have several jobs defined, so if you defined other jobs—such as Milvus—you can just append the new job to your list. If you have deployed vLLM's services with non-default values, you can easily change the service discovery rules.

Tip

Before using the following example, replace the VLLM_NAMESPACE and VLLM_RELEASE_NAME placeholders with the actual values you used while deploying vLLM.

config:
  receivers:
    prometheus:
      config:
        scrape_configs:
          - job_name: 'vllm'
            scrape_interval: 10s
            scheme: http
            kubernetes_sd_configs:
              - role: service
            relabel_configs:
              - source_labels: [__meta_kubernetes_namespace]
                action: keep
                regex: 'VLLM_NAMESPACE'

              - source_labels: [__meta_kubernetes_service_name]
                action: keep
                regex: '.*VLLM_RELEASE_NAME.*'

Warning: SUSE Observability version 2.6.2 and above

With SUSE Observability version 2.6.2, a change of the standard behavior broke the vLLM monitoring performed by the extension. To fix it, update otel-values.yaml to include the following additions. No changes are required for people using SUSE Observability version 2.6.1 and below.

Add a new processor.

config:
  processors:
    ... # same as before
    transform:
      metric_statements:
        - context: metric
          statements:
            - replace_pattern(name, "^vllm:", "vllm_")

Modify the metrics pipeline to perform the transformation defined above:

config:
  service:
    pipelines:
      ... # same as before
      metrics:
        receivers: [otlp, spanmetrics, prometheus]
        processors: [transform, memory_limiter, resource, batch]
        exporters: [debug, otlp]

6 Monitoring user-managed applications #

To monitor other applications, you can utilize OpenTelemetry SDKs or any other instrumentation provider compatible with OpenTelemetry's semantics, for example, OpenLIT SDK. For more details, refer to Appendix B, Instrument applications with OpenLIT SDK.

OpenTelemetry offers several instrumentation techniques for different deployment scenarios and applications. You can instrument applications either manually, with more detailed control, or automatically for an easier starting point.

Tip

One of the most straightforward ways of getting started with OpenTelemetry is using the OpenTelemetry Operator for Kubernetes, which is available in the SUSE Application Collection. Find more information in this extensive guide on how to use this operator for instrumenting your applications.

6.1 Ensuring that the telemetry data is properly captured by the SUSE Observability Extension #

For the SUSE Observability Extension to acknowledge an application as a GenAI application, it needs to have a meter configured. It must provide at least the RequestsTotal metric with the following attributes:

TelemetrySDKLanguage
ServiceName
ServiceInstanceId
ServiceNamespace
GenAIEnvironment
GenAiApplicationName
GenAiSystem
GenAiOperationName
GenAiRequestModel

Both the meter and the tracer must contain the following resource attributes:

service.name: The logical name of the service. Defaults to "My App".
service.version: The version of the service. Defaults to "1.0".
deployment.environment: The name of the deployment environment, such as "production" or "staging". Defaults to "default"".
telemetry.sdk.name: The value must be "openlit".

The following metrics are utilized in the graphs of the SUSE Observability Extension:

gen_ai.client.token.usage

Measures the number of used input and output tokens.

Type: histogram

Unit: token

gen_ai.total.requests

Number of requests.

Type: counter

Unit: integer

gen_ai.usage.cost

The distribution of GenAI request costs.

Type: histogram

Unit: USD

gen_ai.usage.input_tokens

Number of prompt tokens processed.

Type: counter

Unit: integer

gen_ai.usage.output_tokens

Number of completion tokens processed.

Type: counter

Unit: integer

gen_ai.client.token.usage

Number of tokens processed.

Type: counter

Unit: integer

6.2 Troubleshooting #

1. No metrics received from any components.

Verify the OpenTelemetry Collector deployment.
Check if the exporter is properly set to the SUSE Observability collector and with the correct API key and endpoint specified.

2. No metrics received from the GPU.

Verify if the RBAC rules were applied.
Verify if the metrics receiver scraper is configured.
Check the NVIDIA DCGM Exporter for errors.

3. No metrics received from Milvus.

Verify if Milvus chart configuration is exposing the metrics endpoint.
Verify if the metrics receiver scraper is configured.
For usage metrics, confirm that requests were actually made to Milvus.

4. No metrics received from vLLM.

Verify if the vLLM chart configuration is exposing the metrics endpoint.
Verify if the metrics receiver scraper is properly configured, in particular, the placeholders.
If some metrics are present, but not usage-related ones, verify if you actually made requests to vLLM.
If vLLM was not identified and/or vLLM metrics have the prefix “vllm:”, check the collector configuration. You may be using SUSE Observability 2.6.2, which requires additional configuration.

5. GPU nodes are stuck in NotReady status.

Verify if there is a driver version mismatch between the host and the version that the GPU operator expects.
You may need to reinstall the Kubernetes stackpack.

6. No tracing data received from any components.

Verify the OpenTelemetry Collector deployment.
Check if the exporter is properly set to the SUSE Observability collector, with the right API key and endpoint set.

7. No tracing data received from Open WebUI.

Verify if the SUSE AI Observability Filter was installed and configured properly.
Verify if chat requests actually happened.

8. Cost estimation is far from real values.: Recalculate the multipliers for the PRICING_JSON in the SUSE AI Observability Filter.

9. There is high demand for storage volume.: Verify if sampling is being applied in the OpenTelemetry Collector.