Upgrade from v1.3.1 to v1.3.2

General Information

An Upgrade button appears on the Dashboard screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see Start an upgrade.

For air-gapped environments, see Prepare an air-gapped upgrade.

Known Issues

1. Two-node cluster upgrade stuck after the first node is pre-drained

Shut down all workload VMs before upgrading two-node clusters to prevent data loss.

The worker node can falsely transition to a not-ready state when RKE2 is upgraded on the management node. Consequently, the existing pods on the worker node are evicted and new pods cannot be scheduled on any nodes. These ultimately cause a chained failure in the whole cluster and prevent completion of the upgrade process.

Check the cluster status when the following occur:

  • The upgrade process becomes stuck for some time.

  • You are unable to access the Harvester UI and receive an HTTP 503 error.

    1. Check the conditions and node statuses of the latest Upgrade custom resource.

      Proceed to the next step if the following conditions are met:

      • SystemServicesUpgraded is set to True, indicating that the system services upgrade is completed.

      • In nodeStatuses, the state of the management node is either Pre-drained or Waiting Reboot.

      • In nodeStatuses, the state of the worker node is Images preloaded.

        Example:

        # Find out the latest Upgrade custom resource
        $ kubectl -n harvester-system get upgrades.harvesterhci -l harvesterhci.io/latestUpgrade=true
        NAME                 AGE
        hvst-upgrade-szlg8   48m
        
        # Check the conditions and node statuses
        $ kubectl -n harvester-system get upgrades hvst-upgrade-szlg8 -o yaml
        apiVersion: harvesterhci.io/v1beta1
        kind: Upgrade
        metadata:
          ...
          labels:
            harvesterhci.io/latestUpgrade: "true"
            harvesterhci.io/upgradeState: UpgradingNodes
          name: hvst-upgrade-szlg8
          namespace: harvester-system
          ...
        spec:
          image: ""
          logEnabled: false
          version: v1.3.2-rc2
        status:
          conditions:
          - status: Unknown
            type: Completed
          - lastUpdateTime: "2024-09-02T11:57:04Z"
            message: Upgrade observability is administratively disabled
            reason: Disabled
            status: "False"
            type: LogReady
          - lastUpdateTime: "2024-09-02T11:58:01Z"
            status: "True"
            type: ImageReady
          - lastUpdateTime: "2024-09-02T12:02:31Z"
            status: "True"
            type: RepoReady
          - lastUpdateTime: "2024-09-02T12:18:44Z"
            status: "True"
            type: NodesPrepared
          - lastUpdateTime: "2024-09-02T12:31:25Z"
            status: "True"
            type: SystemServicesUpgraded
          - status: Unknown
            type: NodesUpgraded
          imageID: harvester-system/hvst-upgrade-szlg8
          nodeStatuses:
            harvester-c6phd:
              state: Pre-drained
            harvester-jkqhq:
              state: Images preloaded
          previousVersion: v1.3.1
          ...
    2. Check the node status.

      Proceed to the next step if the following conditions are met:

      • The status of the worker node is NotReady.

      • The status of the management node is Ready,SchedulingDisabled.

        Example:

         $ kubectl get nodes
         NAME              STATUS                     ROLES                       AGE    VERSION
         harvester-c6phd   Ready,SchedulingDisabled   control-plane,etcd,master   174m   v1.28.12+rke2r1
         harvester-jkqhq   NotReady                   <none>                      166m   v1.27.13+rke2r1
    3. Check the pods on the worker node.

      The issue exists in the cluster if the status of most pods is Terminating.

      Example:

      # Assume harvester-jkqhq is the worker node
      $ kubectl get pods -A --field-selector spec.nodeName=harvester-jkqhq
      NAMESPACE                         NAME                                                    READY   STATUS        RESTARTS       AGE
      cattle-fleet-local-system         fleet-agent-6779fb5dd9-dkpjz                            1/1     Terminating   0              18m
      cattle-fleet-system               fleet-agent-86db8d9954-qgcpq                            1/1     Terminating   2 (18m ago)    61m
      cattle-fleet-system               fleet-controller-696d4b8878-ddctd                       1/1     Terminating   1 (19m ago)    29m
      cattle-fleet-system               gitjob-694dd97686-s4z68                                 1/1     Terminating   1 (19m ago)    29m
      cattle-provisioning-capi-system   capi-controller-manager-6f497d5574-wkrnf                1/1     Terminating   0              20m
      cattle-system                     cattle-cluster-agent-76db9cf9fc-5hhsx                   1/1     Terminating   0              20m
      cattle-system                     cattle-cluster-agent-76db9cf9fc-dnr6m                   1/1     Terminating   0              20m
      cattle-system                     harvester-cluster-repo-7458c7c69d-p982g                 1/1     Terminating   0              27m
      cattle-system                     rancher-7d65df9bd4-77n7w                                1/1     Terminating   0              31m
      cattle-system                     rancher-webhook-cfc66d5d7-fd6gm                         1/1     Terminating   0              28m
      harvester-system                  harvester-85ff674986-wxkl4                              1/1     Terminating   0              26m
      harvester-system                  harvester-load-balancer-54cd9754dc-cwtxg                1/1     Terminating   0              20m
      harvester-system                  harvester-load-balancer-webhook-c8699b786-x6clw         1/1     Terminating   0              20m
      harvester-system                  harvester-network-controller-manager-b69bf6b69-9f99x    1/1     Terminating   0              178m
      harvester-system                  harvester-network-controller-vs4jg                      1/1     Running       0              178m
      harvester-system                  harvester-network-webhook-7b98f8cd98-gjl8b              1/1     Terminating   0              20m
      harvester-system                  harvester-node-disk-manager-tbh4b                       1/1     Running       0              26m
      harvester-system                  harvester-node-manager-7pqcp                            1/1     Running       0              178m
      harvester-system                  harvester-node-manager-webhook-9cfccc84c-68tgp          1/1     Running       0              20m
      harvester-system                  harvester-node-manager-webhook-9cfccc84c-6bbvg          1/1     Running       0              20m
      harvester-system                  harvester-webhook-565dc698b6-np89r                      1/1     Terminating   0              26m
      harvester-system                  hvst-upgrade-szlg8-apply-manifests-4rmjw                0/1     Completed     0              33m
      harvester-system                  virt-api-6fb7d97b68-cbc5m                               1/1     Terminating   0              20m
      harvester-system                  virt-api-6fb7d97b68-gqg5c                               1/1     Terminating   0              23m
      harvester-system                  virt-controller-67d8b4c75c-5qz9x                        1/1     Terminating   0              24m
      harvester-system                  virt-controller-67d8b4c75c-bdf8w                        1/1     Terminating   2 (18m ago)    23m
      harvester-system                  virt-handler-xw98h                                      1/1     Running       0              24m
      harvester-system                  virt-operator-6c98db546-brgnx                           1/1     Terminating   2 (18m ago)    26m
      kube-system                       harvester-snapshot-validation-webhook-b75f94bcb-95zlb   1/1     Terminating   0              20m
      kube-system                       harvester-snapshot-validation-webhook-b75f94bcb-xfrmf   1/1     Terminating   0              20m
      kube-system                       harvester-whereabouts-tdr5g                             1/1     Running       1 (178m ago)   178m
      kube-system                       helm-install-rke2-ingress-nginx-4wt4j                   0/1     Terminating   0              15m
      kube-system                       helm-install-rke2-metrics-server-jn58m                  0/1     Terminating   0              15m
      kube-system                       kube-proxy-harvester-jkqhq                              1/1     Running       0              178m
      kube-system                       rke2-canal-wfpch                                        2/2     Running       0              178m
      kube-system                       rke2-coredns-rke2-coredns-864fbd7785-t7k6t              1/1     Terminating   0              178m
      kube-system                       rke2-coredns-rke2-coredns-autoscaler-6c87968579-rg6g4   1/1     Terminating   0              20m
      kube-system                       rke2-ingress-nginx-controller-d4h25                     1/1     Running       0              178m
      kube-system                       rke2-metrics-server-7f745dbddf-2mp5j                    1/1     Terminating   0              20m
      kube-system                       rke2-multus-fsp94                                       1/1     Running       0              178m
      kube-system                       snapshot-controller-65d5f465d9-5b2sb                    1/1     Terminating   0              20m
      kube-system                       snapshot-controller-65d5f465d9-c264r                    1/1     Terminating   0              20m
      longhorn-system                   backing-image-manager-c16a-7c90                         1/1     Terminating   0              54m
      longhorn-system                   csi-attacher-5fbd66cf8-674vc                            1/1     Terminating   0              20m
      longhorn-system                   csi-attacher-5fbd66cf8-725mn                            1/1     Terminating   0              20m
      longhorn-system                   csi-attacher-5fbd66cf8-85k5d                            1/1     Terminating   0              20m
      longhorn-system                   csi-provisioner-5b6ff8f4d4-97wsf                        1/1     Terminating   0              20m
      longhorn-system                   csi-provisioner-5b6ff8f4d4-cbpm9                        1/1     Terminating   0              20m
      longhorn-system                   csi-provisioner-5b6ff8f4d4-q7z58                        1/1     Terminating   0              19m
      longhorn-system                   csi-resizer-74c5555748-6rmbf                            1/1     Terminating   0              20m
      longhorn-system                   csi-resizer-74c5555748-fw2cw                            1/1     Terminating   0              20m
      longhorn-system                   csi-resizer-74c5555748-p4nph                            1/1     Terminating   0              20m
      longhorn-system                   csi-snapshotter-6bc4bcf4c5-6858b                        1/1     Terminating   0              20m
      longhorn-system                   csi-snapshotter-6bc4bcf4c5-cqkbw                        1/1     Terminating   0              20m
      longhorn-system                   csi-snapshotter-6bc4bcf4c5-mkqtg                        1/1     Terminating   0              20m
      longhorn-system                   engine-image-ei-b0369a5d-2t4k4                          1/1     Running       0              178m
      longhorn-system                   instance-manager-a5bd20597b82bcf3ba9d314620b7e670       1/1     Terminating   0              178m
      longhorn-system                   longhorn-csi-plugin-x6bdg                               3/3     Running       0              178m
      longhorn-system                   longhorn-driver-deployer-85cf4b4849-5lc52               1/1     Terminating   0              20m
      longhorn-system                   longhorn-loop-device-cleaner-hhvgv                      1/1     Running       0              178m
      longhorn-system                   longhorn-manager-5h2zw                                  1/1     Running       0              178m
      longhorn-system                   longhorn-ui-6b677889f8-hrg8j                            1/1     Terminating   0              20m
      longhorn-system                   longhorn-ui-6b677889f8-w5hng                            1/1     Terminating   0              20m

To resolve the issue, you must restart the rke2-agent service on the worker node.

# On the worker node
sudo systemctl restart rke2-agent.service

The upgrade should resume after the rke2-agent service is fully restarted.

This issue occurs because the agent load balancer on the worker node is unable to connect to the API server on the management node after the rke2-server service is restarted. Because the rke2-server service can be restarted multiple times when nodes are upgraded, the upgrade process is likely to become stuck again. You may need to restart the rke2-agent service multiple times.

To determine if the agent load balancer is functioning, run the following commands:

# On the management node, check if the `rke2-server` service is running.
sudo systemctl status rke2-server.service

# On the worker node, check if the agent load balancer is functioning.
sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig get nodes

If the kubectl command does not return a response, the kubelet is unable to access the API server via the agent load balancer. You must restart the rke2-agent service.

For more information, see Issue #6432.


2. Automatic image cleanup is not functioning

Because the published Harvester ISO contains an incomplete image list, automatic image cleanup cannot be performed during an upgrade from v1.3.1 to v1.3.2. This issue does not block the upgrade, and you can use this script to manually clean up container images after the upgrade is completed. For more information, see Issue #6620.