Monitoring SUSE AI with OpenTelemetry and SUSE Observability|Monitoring with the OpenTelemetry Operator
Applies to SUSE AI 1.0

8 Monitoring with the OpenTelemetry Operator

This section describes how to instrument Kubernetes applications using the OpenTelemetry Operator for Kubernetes. It also explains how to forward telemetry (metrics and traces) to SUSE Observability. The Operator simplifies the deployment of OpenTelemetry components. It also enables automatic instrumentation without modifying the application code.

The guidance is presented in two paths:

  • Path A: You already have an OpenTelemetry Collector deployed and configured.

  • Path B: You prefer to have the Operator manage the Collector using an OpenTelemetryCollector custom resource.

8.1 Prerequisites

Ensure the following are in place before proceeding:

  • A Kubernetes cluster managed with Rancher.

  • cert-manager installed (required for the Operator’s admission webhooks).

  • SUSE Observability installed and reachable from the cluster with a valid service token or API key.

8.2 Installing the OpenTelemetry Operator

Install the OpenTelemetry Operator into your cluster using the SUSE Application Collection Helm chart. The Operator supports managing OpenTelemetry Collector instances. It also supports automatic instrumentation of workloads.

Important
Important

Pin the chart version explicitly to avoid unexpected changes during upgrades.

  1. Install the Operator into the namespace where you run your SUSE Observability resources.

    > helm install opentelemetry-operator oci://dp.apps.rancher.io/charts/opentelemetry-operator \
      --namespace <SUSE_OBSERVABILITY_NAMESPACE> \
      --version <CHART-VERSION> \
      --set manager.autoInstrumentation.go.enabled=true \
      --set global.imagePullSecrets={application-collection}
    Note
    Note

    If the previous command fails with your Helm version, use the following alternative.

    > helm install opentelemetry-operator oci://dp.apps.rancher.io/charts/opentelemetry-operator \
      --namespace <SUSE_OBSERVABILITY_NAMESPACE> \
      --version <CHART-VERSION> \
      --set manager.autoInstrumentation.go.enabled=true \
      --set global.imagePullSecrets[0].name=application-collection

    The Helm chart deploys the Operator controller. It manages the following custom resources (CRs):

    • OpenTelemetryCollector: Defines and deploys OpenTelemetry Collector instances managed by the Operator.

    • TargetAllocator: Distributes Prometheus scrape targets across Collector replicas.

    • OpAMPBridge: Optional component that reports and manages Collector state via the OpAMP protocol.

    • Instrumentation: Defines automatic instrumentation settings and exporter configurations for workloads.

  2. Verify that the CRDs are installed.

    # kubectl api-resources --api-group=opentelemetry.io

8.3 Path A: Use an existing Collector

If you already have an OpenTelemetry Collector deployed (for example, installed via Helm and exporting telemetry to SUSE Observability), follow these steps. This enables auto-instrumentation without replacing the Collector.

8.3.1 Enable OTLP reception on the Collector

Ensure your Collector is configured to receive OTLP telemetry from instrumented workloads. Enable at least one OTLP protocol (gRPC or HTTP).

The following snippet shows a SUSE AI example otel-values.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"
service:
  pipelines:
    traces:
      receivers: [otlp, jaeger]
    metrics:
      receivers: [otlp, prometheus, spanmetrics]

After adding the protocols, update the Collector.

> helm upgrade --install opentelemetry-collector \
  oci://dp.apps.rancher.io/charts/opentelemetry-collector \
  -n <SUSE_OBSERVABILITY_NAMESPACE> \
  --version <CHART_VERSION> \
  -f otel-values.yaml

8.3.2 Create an Instrumentation custom resource

Create an Instrumentation custom resource that defines automatic instrumentation behavior and the OTLP export destination.

Important
Important

Namespace rule: The Instrumentation resource must exist before the pod is created. The Operator resolves it either from the same namespace as the pod, or from another namespace when referenced as <namespace>/<name> in the annotation.

  1. Create a file named instrumentation.yaml with the following content.

    apiVersion: opentelemetry.io/v1alpha1
    kind: Instrumentation
    metadata:
      name: otel-instrumentation
    spec:
      exporter:
        endpoint: http://opentelemetry-collector.observability.svc.cluster.local:4317
      propagators:
        - tracecontext
        - baggage
      defaults:
        useLabelsForResourceAttributes: true
      python:
        env:
          - name: OTEL_EXPORTER_OTLP_ENDPOINT
            value: http://opentelemetry-collector.observability.svc.cluster.local:4318
      go:
        env:
          - name: OTEL_EXPORTER_OTLP_ENDPOINT
            value: http://opentelemetry-collector.observability.svc.cluster.local:4318
      sampler:
        type: parentbased_traceidratio
        argument: "1"
    Note
    Note

    Most auto-instrumentation SDKs (Python, Go, NodeJS) default to OTLP/HTTP (port 4318). If your Collector only exposes OTLP/gRPC (4317), explicitly configure the SDK endpoint.

  2. Apply the resource.

    > kubectl apply \
    	--namespace <SUSE_OBSERVABILITY_NAMESPACE> \
    	-f instrumentation.yaml

8.3.3 Enable auto-instrumentation for workloads

To instruct the Operator to auto-instrument application pods, add an annotation to the pod template. Use the annotation matching your workload language.

  • Java: instrumentation.opentelemetry.io/inject-java: <namespace>/otel-instrumentation

  • NodeJS: instrumentation.opentelemetry.io/inject-nodejs: <namespace>/otel-instrumentation

  • Python: instrumentation.opentelemetry.io/inject-python: <namespace>/otel-instrumentation

  • Go: instrumentation.opentelemetry.io/inject-go: <namespace>/otel-instrumentation

For Go workloads, an additional annotation is required:

  • instrumentation.opentelemetry.io/otel-go-auto-target-exe: <path-to-binary>

You can also enable injection at the namespace level:

apiVersion: v1
kind: Namespace
metadata:
  name: <APP-NAMESPACE>
  annotations:
    instrumentation.opentelemetry.io/inject-python: "true"

Or per Deployment:

spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/inject-python: "true"

Annotation values may be:

  • "true" to use an Instrumentation resource in the same namespace.

  • "my-instrumentation" to use a named resource in the same namespace.

  • "other-namespace/my-instrumentation" for a cross-namespace reference.

  • "false" to disable injection.

When a pod with injection annotations is created, the Operator mutates it via an admission webhook:

  • An init container is injected to copy auto-instrumentation binaries.

  • The application container is modified to preload the instrumentation.

  • Environment variables are added to configure the SDK.

Example 8.1: Enable auto-instrumentation in SUSE AI

The following procedure shows how to inject instrumentation into the open-webui-mcpo workload.

  1. Edit the deployment.

    > kubectl edit deployment open-webui-mcpo -n suse-private-ai
  2. Add an injection annotation to spec.template.metadata.annotations.

    spec:
      template:
        metadata:
          annotations:
            instrumentation.opentelemetry.io/inject-python: <namespace>/otel-instrumentation
    Note
    Note

    For Go workloads, the binary being instrumented must provide the .gopclntab section. Binaries stripped of this section during or after compilation are not compatible. To check if your ollama binary has symbols, run nm /bin/ollama. If it returns no symbols, auto-instrumentation will not work with that build.

  3. Roll out the updated deployment.

    > kubectl rollout restart deployment open-webui-mcpo -n suse-private-ai

8.3.4 Verify the telemetry workflow

After injecting instrumentation, verify that an init container was injected automatically.

> kubectl -n suse-private-ai get pod <OPENWEBUI_MCPO_POD> \
  -o jsonpath="{.spec.initContainers[*]['name','image']}"

Example output:

> opentelemetry-auto-instrumentation-python \
ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.59b0

8.3.5 Verify SUSE Observability UI

In the SUSE Observability UI, verify that application traces and metrics are visible in the appropriate dashboards. For example, check the OpenTelemetry Services and Traces views.

After completing the instrumentation steps, allow a short period of time for data to be collected. Ensure that the instrumented pods are receiving traffic. Once data is available, the application appears under its service name (for example, open-webui-mcpo) in OpenTelemetry Services and Service Instances.

Application traces are visible in the Trace Explorer. They are also visible in the Trace perspective for both the service and service instance components. Span metrics and language-specific metrics (when available) appear in the Metrics perspective for the corresponding components.

If the Kubernetes StackPack is installed, traces for the instrumented pods are also available directly in the Traces perspective.

From OpenTelemetry services:

Screenshot showing the `open-webui-mcpo` service in SUSE Observability UI
Figure 8.1: Verifying OpenTelemetry service view in SUSE Observability UI

From the traces perspective:

Screenshot showing the traces view in SUSE Observability UI with `open-webui-mcpo` traces
Figure 8.2: Verifying traces view on SUSE Observability UI

8.4 Path B: Use an Operator-managed Collector

If you prefer the Operator to manage the Collector deployment and configuration, use the OpenTelemetryCollector custom resource.

8.4.1 Configure image pulls from the Application Collection

To pull the Collector image from the Application Collection, create a ServiceAccount with imagePullSecrets. Then attach it to the Collector CR via the spec.serviceAccount attribute.

> kubectl -n <SUSE_OBSERVABILITY_NAMESPACE> create serviceaccount image-puller
> kubectl -n <SUSE_OBSERVABILITY_NAMESPACE> patch serviceaccount image-puller \
  --patch '{"imagePullSecrets":[{"name":"application-collection"}]}'

8.4.2 Create an OpenTelemetryCollector resource

An OpenTelemetryCollector resource encapsulates the desired Collector configuration. This includes receivers, processors, exporters and routing logic.

  1. Create a file named opentelemetry-collector.yaml with the following content.

    apiVersion: opentelemetry.io/v1beta1
    kind: OpenTelemetryCollector
    metadata:
      name: opentelemetry
    spec:
      serviceAccount: image-puller
      mode: deployment
      envFrom:
        - secretRef:
            name: open-telemetry-collector
      config:
        receivers:
          otlp:
            protocols:
              grpc:
                endpoint: 0.0.0.0:4317
              http:
                endpoint: 0.0.0.0:4318
          prometheus:
            config:
              scrape_configs:
                - job_name: opentelemetry-collector
                  scrape_interval: 10s
                  static_configs:
                    - targets:
                        - 0.0.0.0:8888
    
        exporters:
          debug: {}
          nop: {}
          otlp:
            endpoint: http://suse-observability-otel-collector.suse-observability.svc.cluster.local:4317
            headers:
              Authorization: "SUSEObservability ${env:API_KEY}"
            tls:
              insecure: true
    
        processors:
          tail_sampling:
            decision_wait: 10s
            policies:
              - name: rate-limited-composite
                type: composite
                composite:
                  max_total_spans_per_second: 500
                  policy_order: [errors, slow-traces, rest]
                  composite_sub_policy:
                    - name: errors
                      type: status_code
                      status_code:
                        status_codes: [ERROR]
                    - name: slow-traces
                      type: latency
                      latency:
                        threshold_ms: 1000
                    - name: rest
                      type: always_sample
                rate_allocation:
                  - policy: errors
                    percent: 33
                  - policy: slow-traces
                    percent: 33
                  - policy: rest
                    percent: 34
    
          resource:
            attributes:
              - key: k8s.cluster.name
                action: upsert
                value: local
              - key: service.instance.id
                from_attribute: k8s.pod.uid
                action: insert
    
        filter/dropMissingK8sAttributes:
          error_mode: ignore
          traces:
            span:
              - resource.attributes["k8s.node.name"] == nil
              - resource.attributes["k8s.pod.uid"] == nil
              - resource.attributes["k8s.namespace.name"] == nil
              - resource.attributes["k8s.pod.name"] == nil
    
        connectors:
          spanmetrics:
            metrics_expiration: 5m
            namespace: otel_span
    
        routing/traces:
          error_mode: ignore
          table:
            - statement: route()
              pipelines: [traces/sampling, traces/spanmetrics]
    
        service:
          pipelines:
            traces:
              receivers: [otlp]
              processors: [filter/dropMissingK8sAttributes, resource]
              exporters: [routing/traces]
    
            traces/spanmetrics:
              receivers: [routing/traces]
              processors: []
              exporters: [spanmetrics]
    
            traces/sampling:
              receivers: [routing/traces]
              processors: [tail_sampling]
              exporters: [debug, otlp]
    
            metrics:
              receivers: [otlp, spanmetrics, prometheus]
              processors: [resource]
              exporters: [debug, otlp]
  2. Customize the configuration to include any scrape jobs, processors, or routing logic required.

  3. Apply the resource.

    > kubectl apply \
    --namespace <SUSE_OBSERVABILITY_NAMESPACE> \
    -f opentelemetry-collector.yaml

8.4.3 Configure Instrumentation, annotation, and verification

These steps are the same as for Section 8.3, “Path A: Use an existing Collector”. Follow Section 8.3.2, “Create an Instrumentation custom resource” through Section 8.3.5, “Verify SUSE Observability UI”.

8.5 Common validation steps

  • Collector readiness: Ensure the Collector is running and listening on the configured OTLP endpoint.

  • Instrumentation injection: Pod annotations should result in injected init containers or sidecars.

  • Telemetry export: In SUSE Observability, confirm that traces and metrics from your applications appear alongside other monitored data.

  • Resource enrichment: Kubernetes attributes (for example, k8s.pod.name and k8s.namespace.name) help SUSE Observability correlate telemetry with topology.

8.6 Troubleshooting

Go auto-instrumentation silent or failing.
  • Go auto-instrumentation (eBPF-based) may require kernel support, shareProcessNamespace: true, and (depending on the Operator version) privileged containers.

  • Verify Operator version requirements and feature gates.

  • Ensure pod security settings allow eBPF.

  • If this is not possible, use manual SDK instrumentation.

No init container or injection not happening.
  • This may be caused by a typo in the annotation, the wrong language annotation (for example, inject-java vs inject-python), or the Instrumentation resource not being present in the namespace at pod startup.

  • Confirm that the annotation matches the intended language.

  • Ensure the Instrumentation resource exists in the pod namespace before pods are created.

  • If pods are already running, redeploy them after creating Instrumentation.

Telemetry not reaching the Collector (exporter pointing to localhost).
  • Instrumentation defaults to http://localhost:4317 if spec.exporter.endpoint is omitted. Telemetry is dropped or sent to a pod-local endpoint.

  • Set spec.exporter.endpoint to the Collector Service FQDN (for example, http://<collector-name>.<namespace>.svc.cluster.local:4318).

  • Verify OTEL_EXPORTER_OTLP_ENDPOINT in the pod environment.

Webhook or admission failed (TLS or cert errors).
  • The Operator webhook rejects resources. You can see error events for webhook certificates.

  • Ensure cert-manager is installed.

  • Ensure the chart values enable certificates (for example, admissionWebhooks.certManager.enabled: true), or enable auto-generated certificates per chart values.

  • Check kubectl get validatingwebhookconfigurations and review the Operator logs.

Image pull or permission issues.
  • The init container fails to start due to image pull errors.

  • Run kubectl describe pod and look for ImagePullBackOff.

  • Fix the image pull secrets and registry access.

Late annotations (Operator did not inject).
  • The pod started before the Instrumentation resource existed.

  • Delete and recreate the pod after the Instrumentation resource exists.

  • Alternatively, add automation around re-initialization during rollout.

TLS to the Collector (secure OTLP).
  • Your environment requires Instrumentation.spec.exporter.tls (mTLS or a custom CA).

  • Create a ConfigMap containing the CA bundle.

  • Reference it from Instrumentation.spec.exporter.tls.configMapName.