Metrics
RKE2 provides metrics for monitoring the health and performance of the cluster.
Individual components provide most metrics. See the following component-specific documentation for more information:
Other components may provide additional metrics. Consult the upstream project documentation for any components not listed above.
Supervisor Metrics
When you start RKE2 with supervisor-metrics: true
, the RKE2 supervisor exposes metrics. You can access these metrics through the /metrics
endpoint on each node at port 9345
:
kubectl get --server https://NODENAME:9345 --raw /metrics
The RKE2 supervisor process exposes the following metrics:
If the RKE2 embedded registry is enabled, the RKE2 supervisor process also exposes the following metrics:
RKE2 Cluster Management Metrics
rke2_certificate_expiration_seconds
Remaining lifetime in seconds of the certificate, labeled by certificate subject and usages.
-
Type: Gauge
-
Labels:
subject
,usage
rke2_loadbalancer_server_connections
Count of current connections to the load balancer server, labeled by the load balancer name and server address.
-
Type: Gauge
-
Labels:
name
,server
rke2_loadbalancer_server_health
Current health state of the load balancer backend servers, labeled by the load balancer name and server address.
State is enum of 0=INVALID, 1=FAILED, 2=STANDBY, 3=UNCHECKED, 4=RECOVERING, 5=HEALTHY, 6=PREFERRED, 7=ACTIVE.
-
Type: Gauge
-
Labels:
name
,server
rke2_loadbalancer_dial_duration_seconds
Time in seconds taken to dial a connection to a backend server, labeled by the load balancer name and success/failure status.
-
Type: Histogram
-
Labels:
name
,status
rke2_etcd_snapshot_save_duration_seconds
Total time in seconds taken to complete the etcd snapshot process, labeled by success/failure status.
-
Type: Histrogram
-
Labels:
status
rke2_etcd_snapshot_save_local_duration_seconds
Total time in seconds taken to save a local snapshot file, labeled by success/failure status.
-
Type: Histrogram
-
Labels:
status
rke2_etcd_snapshot_save_s3_duration_seconds
Total time in seconds taken to upload a snapshot file to S3, labeled by success/failure status.
-
Type: Histrogram
-
Labels:
status
rke2_etcd_snapshot_reconcile_duration_seconds
Total time in seconds taken to sync the list of etcd snapshots, labeled by success/failure status.
-
Type: Histrogram
-
Labels:
status