Troubleshooting
This section contains common troubleshooting steps for K3K virtual clusters.
Too many open files error
The k3k-kubelet or k3kcluster-server- run into the following issue:
E0604 13:14:53.369369 1 leaderelection.go:336] error initially creating leader election record: Post "https://k3k-http-proxy-k3kcluster-service/apis/coordination.k8s.io/v1/namespaces/kube-system/leases": context canceled
{"level":"fatal","timestamp":"2025-06-04T13:14:53.369Z","logger":"k3k-kubelet","msg":"virtual manager stopped","error":"too many open files"}
This typically indicates a low limit on inotify watchers or file descriptors on the host system.
To increase the inotify limits connect to the host nodes and run:
sudo sysctl -w fs.inotify.max_user_watches=2099999999
sudo sysctl -w fs.inotify.max_user_instances=2099999999
sudo sysctl -w fs.inotify.max_queued_events=2099999999
You can persist these settings by adding them to /etc/sysctl.conf:
fs.inotify.max_user_watches=2099999999
fs.inotify.max_user_instances=2099999999
fs.inotify.max_queued_events=2099999999
Apply the changes:
sudo sysctl -p
You can find more details in this KB document.
Inspect Controller Logs for Failure Diagnosis
To view logs for a failed virtual cluster:
kubectl logs -n k3k-system -l app.kubernetes.io/name=k3k
This retrieves logs from K3k controller components.
Inspect Cluster Logs for Failure Diagnosis
To view logs for a failed virtual cluster:
kubectl logs -n <cluster_namespace> -l cluster=<cluster_name>
This retrieves logs from K3k cluster components (agents, server and virtual-kubelet).
You can also use kubectl describe cluster <cluster_name> to check for recent events and status conditions.
|
Virtual Cluster Not Starting or Stuck in Pending
Some of the most common causes are related to missing prerequisites or wrong configuration.
Storage class not available
When creating a Virtual Cluster with dynamic persistence, a PVC is needed. You can check if the PVC was claimed but not bound with kubectl get pvc -n <cluster_namespace>. If you see a pending PVC you probably don’t have a default storage class defined, or you have specified a wrong one.
Example with wrong storage class
The pvc is pending:
kubectl get pvc -n k3k-test-storage
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
varlibrancherk3s-k3k-test-storage-server-0 Pending not-available <unset> 4s
The server is pending:
kubectl get po -n k3k-test-storage
NAME READY STATUS RESTARTS AGE
k3k-test-storage-kubelet-j4zn5 1/1 Running 0 54s
k3k-test-storage-server-0 0/1 Pending 0 54s
To fix this you should use a valid storage class, you can list existing storage class using:
kubectl get storageclasses.storage.k8s.io
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 3d6h
Wrong node selector
When creating a Virtual Cluster with defaultNodeSelector, if the selector is not valid all pods will be pending.
Example
The server is pending:
kubectl get po
NAME READY STATUS RESTARTS AGE
k3k-k3kcluster-node-placed-server-0 0/1 Pending 0 58s
The description of the pod provide the reason:
kubectl describe po k3k-k3kcluster-node-placed-server-0
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 84s default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
To fix this you should use a valid node affinity/selector.
Image pull issues (airgapped setup)
When creating a Virtual Cluster in air-gapped environment, images need to be available in the configured registry. You can check for ImagePullBackOff status when getting the pods in the virtual cluster namespace.
Example
The server is failing:
kubectl get po -n k3k-test-registry
NAME READY STATUS RESTARTS AGE
k3k-test-registry-kubelet-r4zh5 1/1 Running 0 54s
k3k-test-registry-server-0 0/1 ImagePullBackOff 0 54s
To fix this make sure the failing image is available. You can describe the failing pod to get more details.