CNCF AI Conformance

The CNCF Kubernetes AI Conformance defines a set of additional capabilities, APIs and configurations that a Kubernetes cluster MUST offer, on top of standard CNCF Kubernetes Conformance, to reliably and efficiently run AI/ML workloads.

This page shows how to meet these requirements using RKE2 v1.35.5+rke2r2 and NVIDIA GPU operator v26.3.2. The operator is installed following the GPU Operators documentation

Support Dynamic Resource Allocation (DRA)

DRA is a new API that enables more flexible, fine-grained resource requests beyond simple counts and has been generally available (GA) since v1.34.

Verify that all resource.k8s.io/v1 DRA API resources are enabled by running:

kubectl api-resources --api-group=resource.k8s.io

Expected Output:

NAME                     SHORTNAMES   APIVERSION           NAMESPACED   KIND
deviceclasses                         resource.k8s.io/v1   false        DeviceClass
resourceclaims                        resource.k8s.io/v1   true         ResourceClaim
resourceclaimtemplates                resource.k8s.io/v1   true         ResourceClaimTemplate
resourceslices                        resource.k8s.io/v1   false        ResourceSlice

Driver Runtime Management

DRA is the ideal API to verify that GPU drivers and container runtime configurations are correctly installed and maintained on nodes with accelerators. However, because DRA features are still evolving into the Kubernetes core ecosystem, platforms utilize Node Feature Discovery (NFD) and GPU Feature Discovery (GFD) to fulfill this requirement via node metadata attestation.

1. Verifying Base Container Runtime Version

To confirm the base engine handling the container lifecycle, the platform exposes the underlying containerd runtime via the core Node API:

kubectl get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[*].status.nodeInfo.containerRuntimeVersion}'

Expected output:

containerd://2.2.3-k3s1

2. Verifying Accelerator Drivers and Runtime Configurations

By querying the node labels, administrators can programmatically verify both the driver installation and the specialized container runtime configuration patch applied to containerd:

kubectl get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[*].metadata.labels}'

Expected output:

nvidia.com/cuda.driver-version.full: "595.71.05"
nvidia.com/gpu.deploy.container-toolkit: "true"
nvidia.com/gpu-driver-upgrade-state: "upgrade-done"

gpu.deploy.container-toolkit: "true" explicitly confirms that the container runtime configuration (the toolkit patch to containerd) is successfully injected and active.
gpu-driver-upgrade-state: "upgrade-done" confirms the operator is actively maintaining the lifecycle of these components.

GPU Sharing support

There are different strategies to share a GPU. We are going to use time-slicing because it is a pure software trick handled by the NVIDIA driver and container runtime. Therefore, it is compatible with almost every GPU.

Please refer to the NVIDIA documentation for details about how to apply time-slicing.

If everything worked as expected, you can verify it worked by executing:

kubectl describe node $GPU-NODE-NAME

Expected output (assuming 4 replicas were configured):

Labels:
        nvidia.com/gpu.product=Tesla-T4-SHARED
        nvidia.com/gpu.replicas=4
        nvidia.com/gpu.sharing-strategy=time-slicing

Capcity:
  nvidia.com/gpu:     4

Allocatable:
  nvidia.com/gpu:     4

Maintain API consistency when using vGPUs and GPUs

Maintaining consistency with the API and request mechanisms of physical fractional GPUs ensures that workloads, scheduling, policies, and user expectations remain the same across both physical and virtual backends, enabling portability and avoiding fragmented, vendor-specific resource models.

The NVIDIA GPU operator supports vGPUs and maintain the required API consistency. Please refer to the NVIDIA documentation for details about how to configure the NVIDIA GPU operator to support vGPU.

Support the Gateway API

Gateway API represents the next generation of Kubernetes ingress, load balancing and service mesh APIs.

To enable the Gateway API in RKE2, the cluster must be deployed with Traefik enabled and its KubernetesGateway provider configured, as explained in the Ingress Controller docs.

Verify that all gateway.networking.k8s.io/v1 Gateway API resources are enabled by running:

kubectl api-resources --api-group=gateway.networking.k8s.io/v1

Expected Output:

NAME              SHORTNAMES   APIVERSION                          NAMESPACED   KIND
gatewayclasses    gc           gateway.networking.k8s.io/v1        false        GatewayClass
gateways          gtw          gateway.networking.k8s.io/v1        true         Gateway
grpcroutes                     gateway.networking.k8s.io/v1        true         GRPCRoute
httproutes                     gateway.networking.k8s.io/v1        true         HTTPRoute
referencegrants   refgrant     gateway.networking.k8s.io/v1beta1   true         ReferenceGrant

To verify Traefik is consuming Gateway API resources:

Create a GatewayClass:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: traefik
spec:
  controllerName: traefik.io/gateway-controller

Check the status:

kubectl get gatewayclass traefik -o jsonpath='{.status}'

Expected Output:

"message":"Handled by Traefik controller","observedGeneration":1,"reason":"Handled","status":"True","type":"Accepted"

Gang Scheduling

A gang scheduling solution (e.g., Kueue or Volcano) must be available for installation to ensure all-or-nothing scheduling for distributed AI workloads.

We will use Volcano in RKE2 for this verification test.

helm repo add volcano-sh https://volcano-sh.github.io/helm-charts
helm repo update
helm install volcano volcano-sh/volcano -n volcano-system --create-namespace

The installation creates three deployments in the volcano-system namespace:

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/volcano-admission     1/1     1            1           130m
deployment.apps/volcano-controllers   1/1     1            1           130m
deployment.apps/volcano-scheduler     1/1     1            1           130m

Verification is complete, but we will perform a functional test. The following step creates a gang job with two tasks (each requiring an NVIDIA GPU) on a two-GPU cluster:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: gpu-nbody-gang-job
  namespace: default
spec:
  minAvailable: 2
  schedulerName: volcano

  tasks:
    - name: nbody-task-1
      replicas: 1
      template:
        spec:
          restartPolicy: OnFailure
          runtimeClassName: nvidia
          containers:
            - name: cuda-container-1
              image: nvcr.io/nvidia/k8s/cuda-sample:nbody
              command: ["/bin/bash", "-c"]
              args:
                - "while true; do sleep 5 && cuda-samples/nbody -gpu -benchmark; done"
              resources:
                limits:
                  nvidia.com/gpu: 1

    - name: nbody-task-2
      replicas: 1
      template:
        spec:
          restartPolicy: OnFailure
          runtimeClassName: nvidia
          containers:
            - name: cuda-container-2
              image: nvcr.io/nvidia/k8s/cuda-sample:nbody
              command: ["/bin/bash", "-c"]
              args:
                - "while true; do sleep 5 && cuda-samples/nbody -gpu -benchmark; done"
              resources:
                limits:
                  nvidia.com/gpu: 1

Both pods should be running after a few seconds.

To test gang-scheduling failure, modify the manifest to use minAvailable: 3 and add a third task. Re-submit the job:

    - name: nbody-task-3
      replicas: 1
      template:
        spec:
          restartPolicy: OnFailure
          runtimeClassName: nvidia
          containers:
            - name: cuda-container-3
              image: nvcr.io/nvidia/k8s/cuda-sample:nbody
              command: ["/bin/bash", "-c"]
              args:
                - "while true; do sleep 5 && cuda-samples/nbody -gpu -benchmark; done"
              resources:
                limits:
                  nvidia.com/gpu: 1

Observe that the three pods remain in a Pending status. This demonstrates that gang scheduling is working as expected.

default          gpu-nbody-gang-job-nbody-task-1-0                             0/1     Pending     0          50s
default          gpu-nbody-gang-job-nbody-task-2-0                             0/1     Pending     0          50s
default          gpu-nbody-gang-job-nbody-task-3-0                             0/1     Pending     0          50s

Cluster autoscaler

If the platform provides a cluster autoscaler or an equivalent mechanism, it must be capable of scaling accelerator-specific node groups based on pending pods. Since RKE2 is as a Kubernetes distribution, it does not provide an integrated cluster autoscaler.

For reference, we explain how to use the upstream autoscaler autoscaler with Azure as an example.

Create a Virtual Machine Scale Set (VMSS) with GPU-equipped VMs.

Deploy RKE2 with the following options:

disable-cloud-controller: true # Only in rke2-server
kubelet-arg: # On both rke2-server and rke2-agent
- --cloud-provider=external

Install the Azure CCM:

helm install --repo https://raw.githubusercontent.com/kubernetes-sigs/cloud-provider-azure/master/helm/repo cloud-provider-azure --generate-name --set cloudControllerManager.imageRepository=mcr.microsoft.com/oss/kubernetes --set cloudControllerManager.imageName=azure-cloud-controller-manager --set cloudNodeManager.imageRepository=mcr.microsoft.com/oss/kubernetes --set cloudNodeManager.imageName=azure-cloud-node-manager --set cloudControllerManager.configureCloudRoutes=false --set cloudControllerManager.allocateNodeCidrs=false

Create the azure.json file and save it to /etc/kubernetes/azure.json. Ensure it contains the following two options:
```
  "useManagedIdentityExtension": false,
  "useInstanceMetadata": true
```
The deployed nodes should include a ProviderID. Verify this with:
```
kubectl get nodes -o yaml | grep ProviderID
```
The ProviderID is retrieved from the instance’s Metadata. Check this with:
```
curl -H Metadata:true "http://169.254.169.254/metadata/instance?api-version=2021-02-01"
```
Install the upstream autoscaler.
1. Firstly, create a values.yaml configuration file, specifying the VMSS from Step 1 and other necessary Azure details.
2. Then run the following helm commands:
  helm repo add autoscaler https://kubernetes.github.io/autoscaler helm repo update helm install cluster-autoscaler autoscaler/cluster-autoscaler -f values.yaml
When correctly deployed, the autoscaler monitors for pods requesting a GPU resource. If the cluster cannot satisfy the request, the autoscaler contacts Azure to automatically provision and add a new GPU node to the cluster.

Horizontal pod autoscaler

The ability to scale Pods based custom metrics relevant to AI/ML workloads, is achieved using the HorizontalPodAutoscaler (HPA), which is included by default in Kubernetes.

To demonstrate this requirement, install an Ollama deployment in RKE2. The following manifest is then used for verification:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ollama-hpa
spec:
  scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: ollama
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: Object
    object:
      describedObject:
        apiVersion: v1
        kind: Namespace
        name: suse-private-ai
      metric:
        name: gpu_utilization
      target:
        type: AverageValue
        averageValue: "70"

Increasing the load on Ollama will raise the GPU utilization to 70%, triggering the deployment of new Ollama pods.

Accelerator Performance Metrics

This requirement mandates a functional accelerator metrics solution that exposes fine-grained performance metrics via a standardized, machine-readable metrics endpoint. This solution must include a core set of metrics for per-accelerator utilization and memory usage.

When the NVIDIA GPU Operator is installed (as described in the GPU Operators documentation), a nvidia-dcgm-exporter DaemonSet and Service are deployed. Query this service to collect required GPU metrics, such as accelerator utilization, memory usage, temperature, power usage, etc.

For example, if you SSH into one cluster node from within the cluster, it will show the metrics exposed using the OpenMetrics text format. The following section details how to deploy Prometheus and Grafana to consume them.

# Get the clusterIP
svcIP=$(kubectl get svc nvidia-dcgm-exporter -n gpu-operator -o jsonpath='{.spec.clusterIP}')
# Get the port
svcPort=$(kubectl get svc nvidia-dcgm-exporter -n gpu-operator -o jsonpath='{.spec.ports[0].port}')
# Output the metrics
curl -sL http://${svcIP}:${svcPort}/metrics

AI Job & Inference Service Metrics

This requirement mandates a system capable of discovering and collecting metrics exposed by workloads in a standardized format.

Prometheus and Grafana fulfill this requirement. First, install them:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-stack prometheus-community/kube-prometheus-stack \
>   --namespace monitoring \
>   --create-namespace

Once installed, create a ServiceMonitor to scrape metrics from workloads. As an example, the following manifest configures Prometheus to collect DCGM metrics from the NVIDIA GPU Operator:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: nvidia-dcgm-monitor
  namespace: monitoring
  labels:
    release: prometheus-stack
spec:
  selector:
    matchLabels:
      app: nvidia-dcgm-exporter
  namespaceSelector:
    matchNames:
      - gpu-operator
  endpoints:
  - port: gpu-metrics
    path: /metrics
    interval: 15s

After a few minutes, the Grafana dashboard will show the DCGM metrics, such as DCGM_FI_DEV_GPU_UTIL.

Secure accelerator access

This requirement mandates access to accelerators from within containers must be properly isolated and mediated by Kubernetes

The NVIDIA GPU operator in v26.3.2 automatically configures that stack with the proper isolation.

Verify the isolation requirement by running the following three Pods in a cluster with only one GPU:

apiVersion: v1
kind: Pod
metadata:
  name: nbody-gpu-benchmark1
  namespace: default
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-container
    image: nvcr.io/nvidia/k8s/cuda-sample:nbody
    command: ["/bin/bash", "-c"]
    args:
      - "while true; do sleep 5 && cuda-samples/nbody -gpu -benchmark; done"
    resources:
      limits:
        nvidia.com/gpu: 1
apiVersion: v1
kind: Pod
metadata:
  name: nbody-gpu-benchmark2
  namespace: default
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-container2
    image: nvcr.io/nvidia/k8s/cuda-sample:nbody
    command: ["/bin/bash", "-c"]
    args:
      - "while true; do sleep 5 && cuda-samples/nbody -gpu -benchmark; done"
    resources:
      limits:
        nvidia.com/gpu: 1
apiVersion: v1
kind: Pod
metadata:
  name: nbody-gpu-benchmark3
  namespace: default
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-container3
    image: nvcr.io/nvidia/k8s/cuda-sample:nbody
    command: ["/bin/bash", "-c"]
    args:
      - "while true; do sleep 5 && cuda-samples/nbody -gpu -benchmark; done"

Expected Results (Isolation Confirmed):

Pod 1 runs successfully and consumes the GPU.
Pod 2 is not scheduled by Kubernetes because the only GPU available in the cluster is already being consumed by Pod 1.
Pod 3 runs but fails to find an available GPU, as seen in the logs.

This outcome demonstrates that accelerator isolation is working correctly.

Robust CRD and Controller Operation

This requirement mandates the installation and reliable function of at least one complex AI Operator with CRDs. Verification requires confirming that CRDs are registered and that an Admission Webhook rejects invalid configurations.

To verify this requirement, install the Kubeflow Training Operator in RKE2. Since a Helm chart is unavailable, use the following kubectl command as a workaround:

kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.8.0"

Verify that CRDs are installed and the webhook is registered:

$> kubectl get crds | grep kubeflow
mpijobs.kubeflow.org                                       2025-10-24T13:04:27Z
mxjobs.kubeflow.org                                        2025-10-24T13:04:27Z
paddlejobs.kubeflow.org                                    2025-10-24T13:04:28Z
pytorchjobs.kubeflow.org                                   2025-10-24T13:04:28Z
tfjobs.kubeflow.org                                        2025-10-24T13:04:29Z
xgboostjobs.kubeflow.org                                   2025-10-24T13:04:29Z

$> kubectl get validatingwebhookconfigurations
validator.training-operator.kubeflow.org   5          10m

$> kubectl get pods -n kubeflow
NAME                                READY   STATUS    RESTARTS   AGE
training-operator-f7d4b59f6-vdnh9   1/1     Running   0          9m54s

Test the admission webhook’s rejection capability by attempting to apply the following invalid TFJob manifest (missing the required image field):

# saved as invalid-tfjob.yaml
apiVersion: kubeflow.org/v1
kind: TFJob
metadata:
  name: tfjob-invalid-test
spec:
  tfReplicaSpecs:
    Chief:
      replicas: 1
      template:
        spec:
          containers:
            - name: tensorflow
              # INTENTIONAL ERROR: Missing the 'image' field
              # image: tensorflow/tensorflow:latest
              # command: ["/bin/bash", "-c"]
              # args: ["echo 'Chief running'; sleep 10;"]

The Admission Webhook returns the expected error, confirming its function:

Error from server (Forbidden): error when creating "invalid-tfjob.yaml": admission webhook "validator.tfjob.training-operator.kubeflow.org" denied the request: spec.tfReplicaSpecs[Chief].template.spec.containers[0].image: Required value: must be required

Remove the comments in the previous example and re-try the job to see successful deployment.