Applies to SUSE AI 1.0

6 Monitoring vLLM #

vLLM is monitored by scraping its Prometheus-compatible metrics endpoints. The SUSE Observability Extension uses these metrics to visualize vLLM’s status and activity.

6.1 Metrics Scraping (recommended) #

Add the following job to the scrape_configs section of your OpenTelemetry Collector’s configuration. This configures the collector to scrape the /metrics endpoint from all vLLM services every 10 seconds. Remember that you can have several jobs defined, so if you defined other jobs—such as Milvus—you can just append the new job to your list. If you have deployed vLLM’s services with non-default values, you can easily change the service discovery rules.

Tip

Before using the following example, replace the VLLM_NAMESPACE and VLLM_RELEASE_NAME placeholders with the actual values you used while deploying vLLM.

config:
  receivers:
    prometheus:
      config:
        scrape_configs:
          - job_name: 'vllm'
            scrape_interval: 10s
            scheme: http
            kubernetes_sd_configs:
              - role: service
            relabel_configs:
              - source_labels: [__meta_kubernetes_namespace]
                action: keep
                regex: 'VLLM_NAMESPACE'

              - source_labels: [__meta_kubernetes_service_name]
                action: keep
                regex: '.*VLLM_RELEASE_NAME.*'

Warning: SUSE Observability version 2.6.2 and above

With SUSE Observability version 2.6.2, a change of the standard behavior broke the vLLM monitoring performed by the extension. To fix it, update otel-values.yaml to include the following additions. No changes are required for people using SUSE Observability version 2.6.1 and below.

Add a new processor.

config:
  processors:
    ... # same as before
    transform:
      metric_statements:
        - context: metric
          statements:
            - replace_pattern(name, "^vllm:", "vllm_")

Modify the metrics pipeline to perform the transformation defined above:

config:
  service:
    pipelines:
      ... # same as before
      metrics:
        receivers: [otlp, spanmetrics, prometheus]
        processors: [transform, memory_limiter, resource, batch]
        exporters: [debug, otlp]