Automated Upgrades
Overview
You can manage K3s cluster upgrades using Rancher’s system-upgrade-controller. This is a Kubernetes-native approach to cluster upgrades. It leverages a Plan Custom Resource to declaratively describe what nodes to upgrade, and to what version.
The plan defines upgrade policies and requirements. It also defines which nodes should be upgraded through a label selector. See below for plans with defaults appropriate for upgrading a K3s cluster. For more advanced plan configuration options, see the Plan documentation linked above.
The System Upgrade controller schedules upgrades by monitoring plans and selecting nodes to run upgrade Jobs on. When a Job has run to completion successfully, the controller will label the node on which it ran accordingly.
| If the K3s cluster is managed by Rancher, you should use the Rancher UI to manage upgrades. 
 | 
Using the System Upgrade Controller
To automate upgrades, you must do the following:
- 
Install the system-upgrade-controller into your cluster 
- 
Create plans describing which groups of nodes to upgrade, and how. 
For more details on the design and architecture of the system-upgrade-controller or its integration with K3s, see the following Git repositories:
| When attempting to upgrade to a new version of K3s, the Kubernetes version skew policy applies. Ensure that your plan does not skip intermediate minor versions when upgrading. The system-upgrade-controller itself will not protect against unsupported changes to the Kubernetes version. | 
Installation
The system-upgrade-controller manifest installs a custom resource definition, deployment, service account, cluster role binding, and configmap. To install these components, run the following command:
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/crd.yaml -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yamlThe controller can be configured and customized via the previously mentioned configmap, but the controller pod must be deleted for the changes to be applied.
Configuration
Server nodes should always be upgraded before agent nodes. For this reason, it is recommended you create at least two plans: a plan for upgrading server (control-plane) nodes, and a plan for upgrading agent nodes. You can create additional plans as needed to control the rollout of the upgrade across nodes. After the plans are created, the controller picks them up and begins to upgrade your cluster.
The following two example plans continuously keep your your cluster upgraded to the current stable release, by targeting the stable release channel:
# Server plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: server-plan
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
    - key: node-role.kubernetes.io/control-plane
      operator: In
      values:
      - "true"
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/k3s-upgrade
  channel: https://update.k3s.io/v1-release/channels/stable
---
# Agent plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: agent-plan
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
    - key: node-role.kubernetes.io/control-plane
      operator: DoesNotExist
  prepare:
    args:
    - prepare
    - server-plan
    image: rancher/k3s-upgrade
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/k3s-upgrade
  channel: https://update.k3s.io/v1-release/channels/stableThere are a few important things to call out regarding these plans:
- 
The plans must be created in the same namespace where the controller is deployed. 
- 
The concurrencyfield indicates how many nodes can be upgraded at the same time.
- 
The server-plan targets server nodes by specifying a label selector that selects nodes with the node-role.kubernetes.io/control-planelabel. The agent-plan targets agent nodes by specifying a label selector that select nodes without that label.
- 
The preparestep in the agent-plan causes upgrade jobs for that plan to wait for the server-plan to complete before they execute. This logic is built into the image used for the prepare step, and is not part of system-upgrade-controller itself.
- 
Both plans have the channelfield set to the stable release channel URL. This causes the controller to monitor that URL and upgrade the cluster any time it resolves to a new release. This works well with the release channels. Thus, you can configure your plans with the following channel to ensure your cluster is always automatically upgraded to the newest stable release of K3s. Alternatively, you can omit thechannelfield and set theversionfield to a specific release of K3s:apiVersion: upgrade.cattle.io/v1 kind: Plan # ... spec: # ... version: v1.33.4+k3s1
The upgrade begins as soon as the controller detects the target version for a plan has been resolved, either from the version field, or by polling the channel server. Modifying a plan causes the controller to re-evaluate the plan and determine if another upgrade is needed. If a channel has been configured, the URL is also polled periodically to check for new versions.
You can monitor the progress of an upgrade by viewing the plan and jobs via kubectl:
kubectl -n system-upgrade get plans -o wide
kubectl -n system-upgrade get jobsScheduling Upgrades
Plans can be restricted to occurring within a specific time window by setting the window field within the plan spec. The time window fields are compatible with and take the same format as kured schedule options. For example:
apiVersion: upgrade.cattle.io/v1
kind: Plan
# ...
spec:
  # ...
  window:
    days:
      - monday
      - tuesday
      - wednesday
      - thursday
      - friday
    startTime: 19:00
    endTime: 21:00
    timeZone: UTCJobs to execute upgrades for a plan are not created outside the time window. After jobs are created, plans may continue running after the window has closed.
Downgrade Prevention
| Version Gate Starting with the 2023-07 releases (v1.27.4+k3s1, v1.26.7+k3s1, v1.25.12+k3s1, v1.24.16+k3s1) | 
Kubernetes does not support downgrades of control-plane components. The k3s-upgrade image used by upgrade plans will refuse to downgrade K3s, failing the plan. Nodes with cordon: true configured in their plan will stay cordoned following the failure.
Here is an example cluster, showing failed upgrade pods and cordoned nodes:
$ kubectl get pods -n system-upgrade
NAME                                                              READY   STATUS    RESTARTS   AGE
apply-k3s-server-on-ip-172-31-0-16-with-7af95590a5af8e8c3-2cdc6   0/1     Error     0          9m25s
apply-k3s-server-on-ip-172-31-10-23-with-7af95590a5af8e8c-9xvwg   0/1     Error     0          14m
apply-k3s-server-on-ip-172-31-13-213-with-7af95590a5af8e8-8j72v   0/1     Error     0          18m
system-upgrade-controller-7c4b84d5d9-kkzr6                        1/1     Running   0          20m
$ kubectl get nodes
NAME               STATUS                     ROLES                       AGE   VERSION
ip-172-31-0-16     Ready,SchedulingDisabled   control-plane,etcd,master   19h   v1.27.4+k3s1
ip-172-31-10-23    Ready,SchedulingDisabled   control-plane,etcd,master   19h   v1.27.4+k3s1
ip-172-31-13-213   Ready,SchedulingDisabled   control-plane,etcd,master   19h   v1.27.4+k3s1
ip-172-31-2-13     Ready                      <none>                      19h   v1.27.4+k3s1You can return your cordoned nodes to service by either of the following methods:
- 
Change the version or channel on your plan to target a release that is the same or newer than what is currently running on the cluster, so that the plan succeeds. 
- 
Delete the plan and manually uncordon the nodes. Use kubectl get plan -n system-upgradeto find the plan name, thenkubectl delete plan -n system-upgrade PLAN_NAMEto delete it. Once the plan has been deleted, usekubectl uncordon NODE_NAMEto uncordon each of the nodes.