20 Upgrade Controller #
See the Upgrade Controller documentation.
A Kubernetes controller capable of performing infrastructure platform upgrades consisting of:
Operating System (SL Micro)
Kubernetes (K3s & RKE2)
Additional components (Rancher, Elemental, NeuVector, etc.)
20.1 How does SUSE Edge use Upgrade Controller? #
The Upgrade Controller is essential in automating the (formerly manual) Day 2 operations required to upgrade management clusters from one SUSE Edge release version to the next.
To achieve this automation, the Upgrade Controller utilizes tools such as the System Upgrade Controller (Chapter 19, System Upgrade Controller) and the Helm Controller.
For further details on how the Upgrade Controller works, see "How does the Upgrade Controller work?" (Section 20.3, “How does the Upgrade Controller work?”).
For known limitations that the Upgrade Controller has, see the Known Limitations (Section 20.6, “Known Limitations”) section.
20.2 Installing the Upgrade Controller #
20.2.1 Prerequisites #
System Upgrade Controller (Section 19.2, “Installing the System Upgrade Controller”)
A Kubernetes cluster; either K3s or RKE2
20.2.2 Steps #
Install the
Upgrade ControllerHelm chart on your management cluster:helm install upgrade-controller oci://registry.suse.com/edge/3.1/upgrade-controller-chart --version 0.1.0 --create-namespace --namespace upgrade-controller-systemValidate the
Upgrade Controllerdeployment:kubectl get deployment -n upgrade-controller-systemValidate the
Upgrade Controllerpod:kubectl get pods -n upgrade-controller-systemValidate the
Upgrade Controllerpod logs:kubectl logs <pod_name> -n upgrade-controller-system
20.3 How does the Upgrade Controller work? #
In order to perform an Edge release upgrade, the Upgrade Controller introduces two new Kubernetes custom resources:
UpgradePlan (Section 20.4.1, “UpgradePlan”) -
created by the user; holds configurations regarding an Edge release upgrade.ReleaseManifest (Section 20.4.2, “ReleaseManifest”) -
created by the Upgrade Controller; holds component versions specific to a particular Edge release version. Must not be edited by users.
The Upgrade Controller proceeds to create a ReleaseManifest resource that holds the component data for the Edge release version specified by the user under the releaseVersion property in the UpgradePlan resource.
Using the component data from the ReleaseManifest, the Upgrade Controller proceeds to upgrade the Edge release components in the following order:
Operating System (OS) (Section 20.3.1, “Operating System upgrade”).
Kubernetes (Section 20.3.2, “Kubernetes upgrade”).
Additional components (Section 20.3.3, “Additional components upgrades”).
During the upgrade process, the Upgrade Controller constantly outputs upgrade information to the created UpgradePlan. For more information on how to track the upgrade process, see Tracking the upgrade process (Section 20.5, “Tracking the upgrade process”).
20.3.1 Operating System upgrade #
To upgrade the OS component, the Upgrade Controller creates SUC (Chapter 19, System Upgrade Controller) Plans that have the following naming template:
For SUC Plans related to
control-planenode OS upgrades -control-plane-<os-name>-<os-version>-<suffix>.For SUC Plans related to
workernode OS upgrades -workers-<os-name>-<os-version>-<suffix>.
Based on these plans, SUC proceeds to create workloads on each node of the cluster that perform the actual OS upgrade.
Depending on the ReleaseManifest, the OS upgrade may include:
Package only updates- for use-cases where the OS version does not change between Edge releases.Full OS migration- for use-cases where the OS version changes between Edge releases.
The upgrade is executed one node at a time starting with the control-plane nodes first. Only if the control-plane node upgrade finishes, will the worker nodes begin to be upgraded.
The Upgrade Controller configures the OS SUC Plans to do drain of the cluster nodes if the cluster has more than one node of the specific type.
For clusters where the control-plane nodes are greater than one and there is only one worker node, drain will be performed only for the control-plane nodes and vice versa.
For information on how to disable node drains altogether, see the UpgradePlan (Section 20.4.1, “UpgradePlan”) section.
20.3.2 Kubernetes upgrade #
To upgrade the Kubernetes distribution of a cluster, the Upgrade Controller creates SUC (Chapter 19, System Upgrade Controller) Plans that have the following naming template:
For SUC Plans related to
control-planenode Kubernetes upgrades -control-plane-<k8s-version>-<suffix>.For SUC Plans related to
workernode Kubernetes upgrades -workers-<k8s-version>-<suffix>.
Based on these plans, SUC proceeds to create workloads on each node of the cluster that perform the actual Kubernetes upgrade.
The Kubernetes upgrade will happen one node at a time starting with the control-plane nodes first. Only if the control-plane node upgrade finishes, will the worker nodes begin to be upgraded.
The Upgrade Controller configures the Kubernetes SUC Plans to do drain of the cluster nodes if the cluster has more than one node of the specific type.
For clusters where the control-plane nodes are greater than one and there is only one worker node, drain will be performed only for the control-plane nodes and vice versa.
For information on how to disable node drains altogether, see the UpgradePlan (Section 20.4.1, “UpgradePlan”) section.
20.3.3 Additional components upgrades #
Currently, all additional components are installed via Helm charts. For a full list of the components for a specific release, refer to the Release Notes (Section 47.1, “Abstract”).
For Helm charts deployed through EIB (Chapter 9, Edge Image Builder), the Upgrade Controller updates the existing HelmChart CR of each component.
For Helm charts deployed outside of EIB, the Upgrade Controller creates a HelmChart resource for each component.
After the creation/update of the HelmChart resource, the Upgrade Controller relies on the helm-controller to pick up this change and proceed with the actual component upgrade.
Charts will be upgraded sequentially based on their order in the ReleaseManifest. Additional values can also be passed through the UpgradePlan. For more information about this, refer to the UpgradePlan (Section 20.4.1, “UpgradePlan”) section.
20.4 Kubernetes API extensions #
Extensions to the Kubernetes API introduced by the Upgrade Controller.
20.4.1 UpgradePlan #
The Upgrade Controller introduces a new Kubernetes custom resource called an UpgradePlan.
The UpgradePlan serves as an instruction mechanism for the Upgrade Controller and it supports the following configurations:
releaseVersion- Edge release version to which the cluster should be upgraded to. The release version must follow semantic versioning and should be retrieved from the Release Notes (Section 47.1, “Abstract”).disableDrain- Optional; instructs the Upgrade Controller on whether to disable node drains. Useful for when you have workloads with Disruption Budgets.Example for
control-planenode drain disablement:spec: disableDrain: controlPlane: trueExample for
control-planeandworkernode drain disablement:spec: disableDrain: controlPlane: true worker: true
helm- Optional; specifies additional values for components installed via Helm.WarningIt is only advised to use this field for values that are critical for upgrades. Standard chart value updates should be performed after the respective charts have been upgraded to the next version.
Example:
spec: helm: - chart: foo values: bar: baz
20.4.2 ReleaseManifest #
The Upgrade Controller introduces a new Kubernetes custom resource called a ReleaseManifest.
The ReleaseManifest is created by the Upgrade Controller and holds component data for one specific Edge release version. This means that each Edge release version upgrade will be represented by a different ReleaseManifest resource.
The ReleaseManifest should always be created by the Upgrade Controller.
It is not advisable to manually create or edit the ReleaseManifest. Users that decide to do so, should do this at their own risk.
Component data that the ReleaseManifest ships include, but is not limited to:
For an example of how a ReleaseManifest can look, refer to the upstream documentation. Please note that this is just an example and it is not intended to be created as a valid ReleaseManifest resource.
20.5 Tracking the upgrade process #
This section serves as means to track and debug the upgrade process that the Upgrade Controller initiates once the user creates an UpgradePlan.
20.5.1 General #
General information about the state of the upgrade process can be viewed in the UpgradePlan’s status conditions.
The UpgradePlan resource’s status can be viewed in the following way:
kubectl get upgradeplan <upgradeplan_name> -n upgrade-controller-system -o yamlRunning UpgradePlan example:
apiVersion: lifecycle.suse.com/v1alpha1
kind: UpgradePlan
metadata:
name: upgrade-plan-mgmt-3-1-0
namespace: upgrade-controller-system
spec:
releaseVersion: 3.1.0
status:
conditions:
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Control plane nodes are being upgraded
reason: InProgress
status: "False"
type: OSUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Kubernetes upgrade is not yet started
reason: Pending
status: Unknown
type: KubernetesUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Rancher upgrade is not yet started
reason: Pending
status: Unknown
type: RancherUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Longhorn upgrade is not yet started
reason: Pending
status: Unknown
type: LonghornUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: MetalLB upgrade is not yet started
reason: Pending
status: Unknown
type: MetalLBUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: CDI upgrade is not yet started
reason: Pending
status: Unknown
type: CDIUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: KubeVirt upgrade is not yet started
reason: Pending
status: Unknown
type: KubeVirtUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: NeuVector upgrade is not yet started
reason: Pending
status: Unknown
type: NeuVectorUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: EndpointCopierOperator upgrade is not yet started
reason: Pending
status: Unknown
type: EndpointCopierOperatorUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Elemental upgrade is not yet started
reason: Pending
status: Unknown
type: ElementalUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: SRIOV upgrade is not yet started
reason: Pending
status: Unknown
type: SRIOVUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Akri upgrade is not yet started
reason: Pending
status: Unknown
type: AkriUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Metal3 upgrade is not yet started
reason: Pending
status: Unknown
type: Metal3Upgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: RancherTurtles upgrade is not yet started
reason: Pending
status: Unknown
type: RancherTurtlesUpgraded
observedGeneration: 1
sucNameSuffix: 90315a2b6d
Here you can view every component that the Upgrade Controller will try to schedule an upgrade for. Each condition follows the below template:
lastTransitionTime- the last time that this component condition has transitioned from one status to another.message- message that indicates the current upgrade state of the specific component condition.reason- the current upgrade state of the specific component condition. Possiblereasonsinclude:Succeeded- upgrade of the specific component is successful.Failed- upgrade of the specific component has failed.InProgress- upgrade of the specific component is currently in progress.Pending- upgrade of the specific component is not yet scheduled.Skipped- specific component is not found on the cluster, so its upgrade will be skipped.Error- specific component has encountered a transient error.
status- status of the current conditiontype, one ofTrue, False, Unknown.type- indicator for the currently upgraded component.
The Upgrade Controller creates SUC Plans for component conditions of type "OSUpgraded" and "KubernetesUpgraded". To further track the SUC Plans created for these components, refer to the Monitoring System Upgrade Controller Plans (Section 19.3, “Monitoring System Upgrade Controller Plans”) section.
All other component condition types can be further tracked by viewing the resources created for them by the helm-controller. For more information, see the Helm Controller (Section 20.5.2, “Helm Controller”) section.
An UpgradePlan scheduled by the Upgrade Controller can be marked as successful once:
There are no
PendingorInProgresscomponent conditions.The
lastSuccessfulReleaseVersionproperty points to thereleaseVersionthat is specified in theUpgradePlan’sconfiguration. This property is added to theUpgradePlan’sstatus by theUpgrade Controlleronce theupgrade processis successful.
Successful UpgradePlan example:
apiVersion: lifecycle.suse.com/v1alpha1
kind: UpgradePlan
metadata:
name: upgrade-plan-mgmt-3-1-0
namespace: upgrade-controller-system
spec:
releaseVersion: 3.1.0
status:
conditions:
- lastTransitionTime: "2024-10-01T06:26:48Z"
message: All cluster nodes are upgraded
reason: Succeeded
status: "True"
type: OSUpgraded
- lastTransitionTime: "2024-10-01T06:26:59Z"
message: All cluster nodes are upgraded
reason: Succeeded
status: "True"
type: KubernetesUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart rancher upgrade succeeded
reason: Succeeded
status: "True"
type: RancherUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart longhorn is not installed
reason: Skipped
status: "False"
type: LonghornUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Specified version of chart metallb is already installed
reason: Skipped
status: "False"
type: MetalLBUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart cdi is not installed
reason: Skipped
status: "False"
type: CDIUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart kubevirt is not installed
reason: Skipped
status: "False"
type: KubeVirtUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart neuvector-crd is not installed
reason: Skipped
status: "False"
type: NeuVectorUpgraded
- lastTransitionTime: "2024-10-01T06:27:14Z"
message: Specified version of chart endpoint-copier-operator is already installed
reason: Skipped
status: "False"
type: EndpointCopierOperatorUpgraded
- lastTransitionTime: "2024-10-01T06:27:14Z"
message: Chart elemental-operator upgrade succeeded
reason: Succeeded
status: "True"
type: ElementalUpgraded
- lastTransitionTime: "2024-10-01T06:27:15Z"
message: Chart sriov-crd is not installed
reason: Skipped
status: "False"
type: SRIOVUpgraded
- lastTransitionTime: "2024-10-01T06:27:16Z"
message: Chart akri is not installed
reason: Skipped
status: "False"
type: AkriUpgraded
- lastTransitionTime: "2024-10-01T06:27:19Z"
message: Chart metal3 is not installed
reason: Skipped
status: "False"
type: Metal3Upgraded
- lastTransitionTime: "2024-10-01T06:27:27Z"
message: Chart rancher-turtles is not installed
reason: Skipped
status: "False"
type: RancherTurtlesUpgraded
lastSuccessfulReleaseVersion: 3.1.0
observedGeneration: 1
sucNameSuffix: 90315a2b6d
20.5.2 Helm Controller #
This section covers how to track resources created by the helm-controller.
The below steps assume that kubectl has been configured to connect to the cluster where the Upgrade Controller has been deployed to.
Locate the
HelmChartresource for the specific component:kubectl get helmcharts -n kube-systemUsing the name of the
HelmChartresource, locate the upgrade Pod that was created by thehelm-controller:kubectl get pods -l helmcharts.helm.cattle.io/chart=<helmchart_name> -n kube-system # Example for Rancher kubectl get pods -l helmcharts.helm.cattle.io/chart=rancher -n kube-system NAME READY STATUS RESTARTS AGE helm-install-rancher-tv9wn 0/1 Completed 0 16mView the logs of the component specific pod:
kubectl logs <pod_name> -n kube-system
20.6 Known Limitations #
Downstreamcluster upgrades are not yet managed by theUpgrade Controller. For information on how to upgradedownstreamclusters, refer to the Downstream clusters (Chapter 31, Downstream clusters) section.The
Upgrade Controllerexpects any additional SUSE Edge Helm charts that are deployed through EIB (Chapter 9, Edge Image Builder) to have their HelmChart CR deployed in thekube-systemnamespace. To do this, configure theinstallationNamespaceproperty in your EIB definition file. For more information, see the upstream documentation.Currently the
Upgrade Controllerhas no way to determine the current running Edge release version on themanagementcluster. Ensure to provide an Edge release version that is greater than the currently running Edge release version on the cluster.Currently the
Upgrade Controllersupports non air-gapped environment upgrades only. Air-gapped upgrades are not yet possible.