Rolling Back K3s
You can roll back a K3s Kubernetes version after a problematic upgrade, using a combination of K3s binary downgrading and datastore restoration. This is applicable to both clusters with external databases and clusters using an embedded etcd.
Important Considerations
-
Backups: Before upgrading, ensure you have a valid database or etcd snapshot from your cluster running the older version of K3s. Without a backup, a rollback is impossible.
-
Potential Data Loss: The
k3s-killall.sh
script forcefully terminates K3s processes and may result in data loss if applications are not properly shut down. -
Version Specifics: Always verify K3s and component versions before and after the rollback.
Rollback with External Database
This section applies to K3s clusters using an external database (e.g., PostgreSQL, MySQL).
-
If the cluster is running and the Kubernetes API is available, gracefully stop workloads by draining all nodes:
kubectl drain --ignore-daemonsets --delete-emptydir-data <NODE-ONE-NAME> <NODE-TWO-NAME> <NODE-THREE-NAME> ...
This process may disrupt running applications.
-
On each node, stop the K3s service and all running pod processes:
k3s-killall.sh
-
Restore a database snapshot taken before upgrading K3s and verify the integrity of the database. For example, if you’re using PostgreSQL, run the following command:
pg_restore -U <DB-USER> -d <DB-NAME> <BACKUP-FILE>
-
On each node, roll back the K3s binary to the previous version.
-
Clusters with Internet Access:
-
Server nodes:
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=vX.Y.Z+k3s1 INSTALL_K3S_EXEC="server" sh -
-
Agent nodes:
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=vX.Y.Z+k3s1 INSTALL_K3S_EXEC="agent" sh -
-
-
Air-gapped Clusters:
-
Download the artifacts and run the install script locally. Verify the K3s version after install with
k3s --version
and reapply any custom configurations that where used before the upgrade.
-
-
-
Start the K3s service on each node:
systemctl start k3s
-
Verify the K3s service status with
systemctl status k3s
.
Rollback with Embedded etcd
This section applies to K3s clusters using an embedded etcd.
-
If the cluster is running and the Kubernetes API is available, gracefully stop workloads by draining all nodes:
kubectl drain --ignore-daemonsets --delete-emptydir-data <NODE-ONE-NAME> <NODE-TWO-NAME> <NODE-THREE-NAME> ...
-
On each node, stop the K3s service and all running pod processes:
k3s-killall.sh
-
On each node, roll back the K3s binary to the previous version, but do not start K3s.
-
Clusters with Internet Access:
-
Server nodes:
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=vX.Y.Z+k3s1 INSTALL_K3S_EXEC="server" INSTALL_K3S_SKIP_START="true" sh -
-
Agent nodes:
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=vX.Y.Z+k3s1 INSTALL_K3S_EXEC="agent" INSTALL_K3S_SKIP_START="true" sh -
-
-
Air-gapped Clusters:
-
Download the artifacts and run the install script locally. Add the environment variable
INSTALL_K3S_SKIP_START="true"
when running the install script to prevent K3s from starting.
-
-
-
On the first server node or the node without a
server:
entry in its K3s config file, initiate the cluster restore. Refer to the Snapshot Restore Steps for more information:k3s server --cluster-reset --cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
This will overwrite all data in the etcd datastore. Verify the snapshot’s integrity before restoring. Be aware that large snapshots can take a long time to restore.
-
Start the K3s service on the first server node:
systemctl start k3s
-
On the other server nodes, remove the K3s database directory:
rm -rf /var/lib/rancher/k3s/server/db
-
Start the K3s service on the other server nodes:
systemctl start k3s
-
Start the K3s service on all agent nodes:
systemctl start k3s
-
Verify the K3s service status with
systemctl status k3s
.