Rolling Back K3s

You can roll back a K3s Kubernetes version after a problematic upgrade, using a combination of K3s binary downgrading and datastore restoration. This is applicable to both clusters with external databases and clusters using an embedded etcd.

Important Considerations

  • Backups: Before upgrading, ensure you have a valid database or etcd snapshot from your cluster running the older version of K3s. Without a backup, a rollback is impossible.

  • Potential Data Loss: The k3s-killall.sh script forcefully terminates K3s processes and may result in data loss if applications are not properly shut down.

  • Version Specifics: Always verify K3s and component versions before and after the rollback.

Rollback with External Database

This section applies to K3s clusters using an external database (e.g., PostgreSQL, MySQL).

  1. If the cluster is running and the Kubernetes API is available, gracefully stop workloads by draining all nodes:

    kubectl drain --ignore-daemonsets --delete-emptydir-data <NODE-ONE-NAME> <NODE-TWO-NAME> <NODE-THREE-NAME> ...

    This process may disrupt running applications.

  2. On each node, stop the K3s service and all running pod processes:

    k3s-killall.sh
  3. Restore a database snapshot taken before upgrading K3s and verify the integrity of the database. For example, if you’re using PostgreSQL, run the following command:

    pg_restore -U <DB-USER> -d <DB-NAME> <BACKUP-FILE>
  4. On each node, roll back the K3s binary to the previous version.

    • Clusters with Internet Access:

      • Server nodes:

        curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=vX.Y.Z+k3s1 INSTALL_K3S_EXEC="server" sh -
      • Agent nodes:

        curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=vX.Y.Z+k3s1 INSTALL_K3S_EXEC="agent" sh -
    • Air-gapped Clusters:

      • Download the artifacts and run the install script locally. Verify the K3s version after install with k3s --version and reapply any custom configurations that where used before the upgrade.

  5. Start the K3s service on each node:

    systemctl start k3s
  6. Verify the K3s service status with systemctl status k3s.

Rollback with Embedded etcd

This section applies to K3s clusters using an embedded etcd.

  1. If the cluster is running and the Kubernetes API is available, gracefully stop workloads by draining all nodes:

    kubectl drain --ignore-daemonsets --delete-emptydir-data <NODE-ONE-NAME> <NODE-TWO-NAME> <NODE-THREE-NAME> ...
  2. On each node, stop the K3s service and all running pod processes:

    k3s-killall.sh
  3. On each node, roll back the K3s binary to the previous version, but do not start K3s.

    • Clusters with Internet Access:

      • Server nodes:

        curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=vX.Y.Z+k3s1 INSTALL_K3S_EXEC="server" INSTALL_K3S_SKIP_START="true" sh -
      • Agent nodes:

        curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=vX.Y.Z+k3s1 INSTALL_K3S_EXEC="agent" INSTALL_K3S_SKIP_START="true" sh -
    • Air-gapped Clusters:

      • Download the artifacts and run the install script locally. Add the environment variable INSTALL_K3S_SKIP_START="true" when running the install script to prevent K3s from starting.

  4. On the first server node or the node without a server: entry in its K3s config file, initiate the cluster restore. Refer to the Snapshot Restore Steps for more information:

    k3s server --cluster-reset --cluster-reset-restore-path=<PATH-TO-SNAPSHOT>

    This will overwrite all data in the etcd datastore. Verify the snapshot’s integrity before restoring. Be aware that large snapshots can take a long time to restore.

  5. Start the K3s service on the first server node:

    systemctl start k3s
  6. On the other server nodes, remove the K3s database directory:

    rm -rf /var/lib/rancher/k3s/server/db
  7. Start the K3s service on the other server nodes:

    systemctl start k3s
  8. Start the K3s service on all agent nodes:

    systemctl start k3s
  9. Verify the K3s service status with systemctl status k3s.

Verification

After the rollback, verify the following:

  • K3s version: k3s --version

  • Kubernetes cluster health: kubectl get nodes

  • Application functionality.

  • Check the K3s logs for errors.