Attempting a cluster update without updating the installed packages pattern on the management node, can lead to an incomplete or failed update.
Before updating a SUSE CaaS Platform cluster, it’s required to update packages installed by the SUSE-CaaSP-Management
pattern on the management workstation.
The cluster update depends on updated skuba, but might also require new helm / Terraform or other dependencies which will be updated with the refreshed pattern.
Run sudo zypper update
on the management workstation before any attempt to update the cluster.
Updating of Kubernetes and its components from one minor version to the next (for example from 1.16 to 1.17) is handled by skuba
.
The reason for this is that minor updates require special plan and apply procedures.
These procedures differ for patch updates (for example 1.16.1 to 1.16.2), which are handled by skuba-update
as described in Section 4.4, “Base OS Updates”.
Generally speaking: If you have other deployments not installed via Kubernetes or helm, update them last in the upgrade process.
However, if your applications/deployments in their current versions are incompatible with the Kubernetes version that you are upgrading to, you must update these applications/deployments to a compatible version before attempting a cluster upgrade.
Refer to the individual application/deployment for the requirements for Kubernetes version and dependencies.
The general procedure should look like this:
Check if all current versions of applications and deployments in the cluster will work on the new Kubernetes version you plan to install.
If an application/deployment is incompatible with the new Kubernetes version, update the application/deployment before performing any of the other upgrade steps.
Update the packages and reboot your management workstation to get all the latest changes to skuba, helm and their dependencies.
Run the commands on the management workstation.
skuba addon refresh localconfig
skuba addon upgrade plan
skuba addon upgrade apply
Apply all the configuration files that you modified for addons, the upgrade will have reset the configurations to defaults.
Check if there are addon upgrades available for the current cluster using skuba addon upgrade plan
.
Check if all the deployments in the cluster are compatible with the Kubernetes release that will be installed (refer to your individual deployments' documentation). If the deployment is not compatible you must update it to ensure it working with the updated Kubernetes.
Upgrade all master nodes by sequentially running skuba node upgrade plan
and skuba node upgrade apply
.
Make sure to wait until all PODs/deployments/DaemonSets are up and running as expected before moving to the next node.
Upgrade all worker nodes by sequentially running skuba node upgrade plan
and skuba node upgrade apply
.
Make sure to wait until all PODs/deployments/DaemonSets are up and running as expected before moving to the next node.
Check if new addons are available for the new version using skuba addon upgrade plan
.
Once all nodes are up to date, update helm and tiller (as needed) and subsequently the helm deployments.
Run sudo zypper up
on your management workstation to get the latest version of skuba
and its dependencies.
Reboot the machine to make sure that all system changes are correctly applied.
In order to get an overview of the updates available, you can run:
skuba cluster upgrade plan
This will show you a list of updates (if available) for different components installed on the cluster. If the cluster is already running the latest available versions, the output should look like this:
Current Kubernetes cluster version: 1.16.2 Latest Kubernetes version: 1.16.2 Congratulations! You are already at the latest version available
If the cluster has a new patch-level or minor Kubernetes version available, the output should look like this:
Current Kubernetes cluster version: 1.15.2 Latest Kubernetes version: 1.16.2 Upgrade path to update from 1.15.2 to 1.16.2: - 1.15.2 -> 1.16.2
Similarly, you can also fetch this information on a per-node basis with the following command:
skuba node upgrade plan <NODE>
For example, if the cluster has a node named worker0
which is running the latest available versions, the output should look like this:
Current Kubernetes cluster version: 1.16.2 Latest Kubernetes version: 1.16.2 Node worker0 is up to date
On the other hand, if this same node has a new patch-level or minor Kubernetes version available, the output should look like this:
Current Kubernetes cluster version: 1.15.2 Latest Kubernetes version: 1.16.2 Current Node version: 1.15.2 Component versions in worker0 - kubelet: 1.15.2 -> 1.16.2 - cri-o: 1.15.0 -> 1.16.0
You will get a similar output if there is a version available on a master node
(named master0
in this example):
Current Kubernetes cluster version: 1.15.2 Latest Kubernetes version: 1.16.2 Current Node version: 1.15.2 Component versions in master0 - apiserver: 1.15.2 -> 1.16.2 - controller-manager: 1.15.2 -> 1.16.2 - scheduler: 1.15.2 -> 1.16.2 - etcd: 3.3.11 -> 3.3.15 - kubelet: 1.15.2 -> 1.16.2 - cri-o: 1.15.0 -> 1.16.0
It may happen that the Kubernetes version on the control plane is too outdated for the update to progress. In this case, you would get output similar to the following:
Current Kubernetes cluster version: 1.15.0 Latest Kubernetes version: 1.15.0 Unable to plan node upgrade: at least one control plane does not tolerate the current cluster version
The control plane consists of these components:
apiserver
controller-manager
scheduler
etcd
kubelet
cri-o
Due to changes to the way skuba
handles addons some existing components might be shown as new addon
in the status output.
This is expected and no cause for concern. For any upgrade afterwards the addon will be considered known and only show available upgrades.
SUSE CaaS Platform 4.2.1 provides the update of Cilium from 1.5.3 to 1.6.6.
The important change in Cilium 1.6 is usage of Kubernetes CRDs instead of etcd.
skuba
performs and automated migration of data from etcd to CRDs.
If that migration is not successful, skuba
shows the following warning:
"Could not migrate data from etcd to CRD. Addons upgrade will be continued without it, which will result in temporary connection loss for currently existing pods and services."
That warning means that Cilium is going to regenerate all internal data on the first run after upgrade. It can result in temporary connection loss for pods and services which might take few minutes.
Each Kubernetes cluster version comes with different addons base manifests. To update your local addons cluster folder definition in-sync with current Kubernetes cluster version, please run:
skuba addon refresh localconfig
In order to get an overview of the addon updates available, you can run:
skuba addon upgrade plan
This will show you a list of updates (if available) for different addons installed on the cluster:
Current Kubernetes cluster version: 1.17.4 Latest Kubernetes version: 1.17.4 Addon upgrades for 1.17.4: - cilium: 1.5.3 -> 1.6.6 - dex: 2.16.0 (manifest version from 5 to 6) - gangway: 3.1.0-rev4 (manifest version from 4 to 5) - metrics-server: 0.3.6 (new addon)
If the cluster is already running the latest available versions, the output should look like this:
Current Kubernetes cluster version: 1.17.4 Latest Kubernetes version: 1.17.4 Congratulations! Addons are already at the latest version available
Before updating the nodes you must apply the addon upgrades to your management workstation. Please run:
skuba addon upgrade apply
It is recommended to use a load balancer with active health checks and pool management that will take care of adding/removing nodes to/from the pool during this process.
Updates have to be applied separately to each node, starting with the control plane all the way down to the worker nodes.
Note that the upgrade via skuba node upgrade apply
will:
Upgrade the containerized control plane.
Upgrade the rest of the Kubernetes system stack (kubelet
, cri-o
).
Restart services.
During the upgrade to a newer version, the API server will be unavailable.
During the upgrade all the pods in the worker node will be restarted so it is
recommended to drain the pods if your application requires high availability.
In most cases, the restart is handled by replicaSet
.
Upgrade the master nodes:
skuba node upgrade apply --target <MASTER_NODE_IP> --user <USER> --sudo
When all master nodes are upgraded, upgrade the worker nodes as well:
skuba node upgrade apply --target <WORKER_NODE_IP> --user <USER> --sudo
Verify that your cluster nodes are upgraded by running:
skuba cluster upgrade plan
The upgrade via skuba node upgrade apply
will:
upgrade the containerized control plane.
upgrade the rest of the Kubernetes system stack (kubelet
, cri-o
).
temporarily drain/cordon the node before starting the whole process, and then undrain/uncordon the node after the upgrade has been successfully applied.
restart services.
Once you have upgraded all nodes, please run skuba cluster upgrade plan
again.
This will show if any upgrades are available that required the versions you just installed.
If there are upgrades available please repeat the procedure until no more new upgrades are shown.
Base operating system updates are handled by skuba-update
, which works together
with the kured
reboot daemon.
Nodes added to a cluster have the service skuba-update.timer
, which is responsible for running automatic updates, activated by default.
This service calls the skuba-update
utility and it can be configured with the /etc/sysconfig/skuba-update
file.
skuba-update
uses the flags --non-interactive
and --non-interactive-include-reboot-patches
. The --non-interactive
flag causes zypper to use default answers to questions rather than prompting a user for answers. In non-interactive mode, the --non-interactive-include-reboot-patches
flag causes patches with the rebootSuggested-flag
to not be skipped. Zypper does not perform the reboot directly. Instead, kured
will be used to safely schedule reboots as needed.
To disable the automatic updates on a node, simply ssh
to it and then configure the skuba-update service by editing the /etc/sysconfig/skuba-update
file with the following runtime options:
## Path : System/Management ## Description : Extra switches for skuba-update ## Type : string ## Default : "" ## ServiceRestart : skuba-update # SKUBA_UPDATE_OPTIONS="--annotate-only"
It is not required to reload or restart skuba-update.timer
.
The --annotate-only
flag makes the skuba-update
utility only check if updates are available and annotate the node accordingly.
When this flag is activated no updates are installed at all.
When OS updates are disabled, then you will have to manage OS updates manually. In order to do so, you will have to call skuba-update
manually on each node.
Do not use zypper up/zypper patch
commands as these do not manage the Kubernetes annotations used by kured
.
If you perform a manual update using these commands you might render your cluster unusable.
After that, rebooting the node will depend on whether you have also disabled reboots or not. If you have disabled reboots for this node, then you will have to follow the instructions as given in Section 4.4.2, “Completely Disabling Reboots”. Otherwise, you will have to wait until kured
performs the reboot of the node
If you would like to take care of reboots manually, either as a temporary measure or permanently, you can disable them by creating a lock:
kubectl -n kube-system annotate ds kured weave.works/kured-node-lock='{"nodeID":"manual"}'
This command modifies an annotation (annotate
) on the daemonset (ds
) named kured
.
When automatic reboots are disabled, you will have to manage reboots yourself.
In order to do this, you will have to follow some steps whenever you want to issue a reboot marker for a node.
First of all, you will have to cordon
and drain
the node:
kubectl cordon <NODE_ID> kubectl drain --force=true \ --ignore-daemonsets=true \ 1 --delete-local-data=false \ 2 --grace-period 600 \ 3 --timeout=900s \ 4 <NODE_ID>
Core components like | |
Continues even if there are pods using | |
Running applications will be notified of termination and given 10 minutes ( | |
Draining of the node will fail after 15 minutes ( |
Depending on your deployed applications, you must adjust the values for --grace-period
and --timeout
to grant the applications enough time to safely shut down without losing data.
The values here are meant to represent a conservative default for an application like SUSE Cloud Application Platform.
If you do not set these values, applications might never finish and draining of the pod will hang indefinitely.
Only then you will be able to manually reboot
the node safely.
Once the node is back, remember to uncordon
it so it is scheduleable again:
kubectl uncordon <NODE_ID>
Perform the above steps first on control plane nodes, and afterwards on worker nodes.
If the node that should be rebooted does not contain any workload you can skip the above steps and simply reboot the node.
In exceptional circumstances, such as a node experiencing a permanent failure whilst rebooting, manual intervention may be required to remove the cluster lock:
kubectl -n kube-system annotate ds kured weave.works/kured-node-lock-
This command modifies an annotation (annotate
) on the daemonset (ds
) named kured
.
It explicitly performs an "unset" (-
) for the value for the annotation named weave.works/kured-node-lock
.