6 Monitoring vLLM #
vLLM is monitored by scraping its Prometheus-compatible metrics endpoints. The SUSE Observability Extension uses these metrics to visualize vLLM’s status and activity.
6.1 Metrics Scraping (recommended) #
Add the following job to the scrape_configs section of your OpenTelemetry Collector’s configuration.
This configures the collector to scrape the /metrics endpoint from all vLLM services every 10 seconds.
Remember that you can have several jobs defined, so if you defined other jobs—such as Milvus—you can just append the new job to your list.
If you have deployed vLLM’s services with non-default values, you can easily change the service discovery rules.
Before using the following example, replace the VLLM_NAMESPACE and VLLM_RELEASE_NAME placeholders with the actual values you used while deploying vLLM.
config:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'vllm'
scrape_interval: 10s
scheme: http
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: 'VLLM_NAMESPACE'
- source_labels: [__meta_kubernetes_service_name]
action: keep
regex: '.*VLLM_RELEASE_NAME.*'With SUSE Observability version 2.6.2, a change of the standard behavior broke the vLLM monitoring performed by the extension.
To fix it, update otel-values.yaml to include the following additions.
No changes are required for people using SUSE Observability version 2.6.1 and below.
Add a new processor.
config: processors: ... # same as before transform: metric_statements: - context: metric statements: - replace_pattern(name, "^vllm:", "vllm_")Modify the metrics pipeline to perform the transformation defined above:
config: service: pipelines: ... # same as before metrics: receivers: [otlp, spanmetrics, prometheus] processors: [transform, memory_limiter, resource, batch] exporters: [debug, otlp]