Upgrade from v1.3.1 to v1.3.2
General information
An Upgrade button appears on the Dashboard screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see Start an upgrade.
For air-gapped environments, see Prepare an air-gapped upgrade.
Known issues
1. Two-node cluster upgrade stuck after the first node is pre-drained
Shut down all workload VMs before upgrading two-node clusters to prevent data loss. |
The worker node can falsely transition to a not-ready state when RKE2 is upgraded on the management node. Consequently, the existing pods on the worker node are evicted and new pods cannot be scheduled on any nodes. These ultimately cause a chained failure in the whole cluster and prevent completion of the upgrade process.
Check the cluster status when the following occur:
-
The upgrade process becomes stuck for some time.
-
You are unable to access the Harvester UI and receive an HTTP 503 error.
-
Check the conditions and node statuses of the latest
Upgrade
custom resource.Proceed to the next step if the following conditions are met:
-
SystemServicesUpgraded
is set toTrue
, indicating that the system services upgrade is completed. -
In
nodeStatuses
, the state of the management node is eitherPre-drained
orWaiting Reboot
. -
In
nodeStatuses
, the state of the worker node isImages preloaded
.
Example:
+
# Find out the latest Upgrade custom resource $ kubectl -n harvester-system get upgrades.harvesterhci -l harvesterhci.io/latestUpgrade=true NAME AGE hvst-upgrade-szlg8 48m # Check the conditions and node statuses $ kubectl -n harvester-system get upgrades hvst-upgrade-szlg8 -o yaml apiVersion: harvesterhci.io/v1beta1 kind: Upgrade metadata: ... labels: harvesterhci.io/latestUpgrade: "true" harvesterhci.io/upgradeState: UpgradingNodes name: hvst-upgrade-szlg8 namespace: harvester-system ... spec: image: "" logEnabled: false version: v1.3.2-rc2 status: conditions: - status: Unknown type: Completed - lastUpdateTime: "2024-09-02T11:57:04Z" message: Upgrade observability is administratively disabled reason: Disabled status: "False" type: LogReady - lastUpdateTime: "2024-09-02T11:58:01Z" status: "True" type: ImageReady - lastUpdateTime: "2024-09-02T12:02:31Z" status: "True" type: RepoReady - lastUpdateTime: "2024-09-02T12:18:44Z" status: "True" type: NodesPrepared - lastUpdateTime: "2024-09-02T12:31:25Z" status: "True" type: SystemServicesUpgraded - status: Unknown type: NodesUpgraded imageID: harvester-system/hvst-upgrade-szlg8 nodeStatuses: harvester-c6phd: state: Pre-drained harvester-jkqhq: state: Images preloaded previousVersion: v1.3.1 ...
-
-
Check the node status.
Proceed to the next step if the following conditions are met:
-
The status of the worker node is
NotReady
. -
The status of the management node is
Ready,SchedulingDisabled
.
-
Example:
+
$ kubectl get nodes NAME STATUS ROLES AGE VERSION harvester-c6phd Ready,SchedulingDisabled control-plane,etcd,master 174m v1.28.12+rke2r1 harvester-jkqhq NotReady <none> 166m v1.27.13+rke2r1
-
Check the pods on the worker node.
The issue exists in the cluster if the status of most pods is
Terminating
.Example:
# Assume harvester-jkqhq is the worker node $ kubectl get pods -A --field-selector spec.nodeName=harvester-jkqhq NAMESPACE NAME READY STATUS RESTARTS AGE cattle-fleet-local-system fleet-agent-6779fb5dd9-dkpjz 1/1 Terminating 0 18m cattle-fleet-system fleet-agent-86db8d9954-qgcpq 1/1 Terminating 2 (18m ago) 61m cattle-fleet-system fleet-controller-696d4b8878-ddctd 1/1 Terminating 1 (19m ago) 29m cattle-fleet-system gitjob-694dd97686-s4z68 1/1 Terminating 1 (19m ago) 29m cattle-provisioning-capi-system capi-controller-manager-6f497d5574-wkrnf 1/1 Terminating 0 20m cattle-system cattle-cluster-agent-76db9cf9fc-5hhsx 1/1 Terminating 0 20m cattle-system cattle-cluster-agent-76db9cf9fc-dnr6m 1/1 Terminating 0 20m cattle-system harvester-cluster-repo-7458c7c69d-p982g 1/1 Terminating 0 27m cattle-system rancher-7d65df9bd4-77n7w 1/1 Terminating 0 31m cattle-system rancher-webhook-cfc66d5d7-fd6gm 1/1 Terminating 0 28m harvester-system harvester-85ff674986-wxkl4 1/1 Terminating 0 26m harvester-system harvester-load-balancer-54cd9754dc-cwtxg 1/1 Terminating 0 20m harvester-system harvester-load-balancer-webhook-c8699b786-x6clw 1/1 Terminating 0 20m harvester-system harvester-network-controller-manager-b69bf6b69-9f99x 1/1 Terminating 0 178m harvester-system harvester-network-controller-vs4jg 1/1 Running 0 178m harvester-system harvester-network-webhook-7b98f8cd98-gjl8b 1/1 Terminating 0 20m harvester-system harvester-node-disk-manager-tbh4b 1/1 Running 0 26m harvester-system harvester-node-manager-7pqcp 1/1 Running 0 178m harvester-system harvester-node-manager-webhook-9cfccc84c-68tgp 1/1 Running 0 20m harvester-system harvester-node-manager-webhook-9cfccc84c-6bbvg 1/1 Running 0 20m harvester-system harvester-webhook-565dc698b6-np89r 1/1 Terminating 0 26m harvester-system hvst-upgrade-szlg8-apply-manifests-4rmjw 0/1 Completed 0 33m harvester-system virt-api-6fb7d97b68-cbc5m 1/1 Terminating 0 20m harvester-system virt-api-6fb7d97b68-gqg5c 1/1 Terminating 0 23m harvester-system virt-controller-67d8b4c75c-5qz9x 1/1 Terminating 0 24m harvester-system virt-controller-67d8b4c75c-bdf8w 1/1 Terminating 2 (18m ago) 23m harvester-system virt-handler-xw98h 1/1 Running 0 24m harvester-system virt-operator-6c98db546-brgnx 1/1 Terminating 2 (18m ago) 26m kube-system harvester-snapshot-validation-webhook-b75f94bcb-95zlb 1/1 Terminating 0 20m kube-system harvester-snapshot-validation-webhook-b75f94bcb-xfrmf 1/1 Terminating 0 20m kube-system harvester-whereabouts-tdr5g 1/1 Running 1 (178m ago) 178m kube-system helm-install-rke2-ingress-nginx-4wt4j 0/1 Terminating 0 15m kube-system helm-install-rke2-metrics-server-jn58m 0/1 Terminating 0 15m kube-system kube-proxy-harvester-jkqhq 1/1 Running 0 178m kube-system rke2-canal-wfpch 2/2 Running 0 178m kube-system rke2-coredns-rke2-coredns-864fbd7785-t7k6t 1/1 Terminating 0 178m kube-system rke2-coredns-rke2-coredns-autoscaler-6c87968579-rg6g4 1/1 Terminating 0 20m kube-system rke2-ingress-nginx-controller-d4h25 1/1 Running 0 178m kube-system rke2-metrics-server-7f745dbddf-2mp5j 1/1 Terminating 0 20m kube-system rke2-multus-fsp94 1/1 Running 0 178m kube-system snapshot-controller-65d5f465d9-5b2sb 1/1 Terminating 0 20m kube-system snapshot-controller-65d5f465d9-c264r 1/1 Terminating 0 20m longhorn-system backing-image-manager-c16a-7c90 1/1 Terminating 0 54m longhorn-system csi-attacher-5fbd66cf8-674vc 1/1 Terminating 0 20m longhorn-system csi-attacher-5fbd66cf8-725mn 1/1 Terminating 0 20m longhorn-system csi-attacher-5fbd66cf8-85k5d 1/1 Terminating 0 20m longhorn-system csi-provisioner-5b6ff8f4d4-97wsf 1/1 Terminating 0 20m longhorn-system csi-provisioner-5b6ff8f4d4-cbpm9 1/1 Terminating 0 20m longhorn-system csi-provisioner-5b6ff8f4d4-q7z58 1/1 Terminating 0 19m longhorn-system csi-resizer-74c5555748-6rmbf 1/1 Terminating 0 20m longhorn-system csi-resizer-74c5555748-fw2cw 1/1 Terminating 0 20m longhorn-system csi-resizer-74c5555748-p4nph 1/1 Terminating 0 20m longhorn-system csi-snapshotter-6bc4bcf4c5-6858b 1/1 Terminating 0 20m longhorn-system csi-snapshotter-6bc4bcf4c5-cqkbw 1/1 Terminating 0 20m longhorn-system csi-snapshotter-6bc4bcf4c5-mkqtg 1/1 Terminating 0 20m longhorn-system engine-image-ei-b0369a5d-2t4k4 1/1 Running 0 178m longhorn-system instance-manager-a5bd20597b82bcf3ba9d314620b7e670 1/1 Terminating 0 178m longhorn-system longhorn-csi-plugin-x6bdg 3/3 Running 0 178m longhorn-system longhorn-driver-deployer-85cf4b4849-5lc52 1/1 Terminating 0 20m longhorn-system longhorn-loop-device-cleaner-hhvgv 1/1 Running 0 178m longhorn-system longhorn-manager-5h2zw 1/1 Running 0 178m longhorn-system longhorn-ui-6b677889f8-hrg8j 1/1 Terminating 0 20m longhorn-system longhorn-ui-6b677889f8-w5hng 1/1 Terminating 0 20m
-
To resolve the issue, you must restart the rke2-agent
service on the worker node.
# On the worker node sudo systemctl restart rke2-agent.service
The upgrade should resume after the rke2-agent
service is fully restarted.
This issue occurs because the agent load balancer on the worker node is unable to connect to the API server on the management node after the To determine if the agent load balancer is functioning, run the following commands: # On the management node, check if the `rke2-server` service is running. sudo systemctl status rke2-server.service # On the worker node, check if the agent load balancer is functioning. sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig get nodes If the kubectl command does not return a response, the kubelet is unable to access the API server via the agent load balancer. You must restart the |
For more information, see Issue #6432.