Upgrade from v1.3.2 to v1.4.0

General information

An Upgrade button appears on the Dashboard screen whenever a new SUSE Virtualization version that you can upgrade to becomes available. For more information, see Start an upgrade.

For air-gapped environments, see Prepare an air-gapped upgrade.

Prevent corruption of virtual machine images during upgrades

Before starting the upgrade, ensure that the BackingImage CRD is updated to the SUSE Storage v1.7.2 version.

Skipping the CRD update may lead to backing image corruption, as described in issue #10644.

Perform the following steps before starting the upgrade.

Patch the SUSE Virtualization ManagedChart object to avoid related errors and warnings.

kubectl patch managedchart harvester \
-n fleet-local \
--type='json' \
-p='[
  {
    "op":"add",
    "path":"/spec/diff/comparePatches/-",
    "value": {
      "apiVersion":"apiextensions.k8s.io/v1",
      "jsonPointers":["/spec","/metadata/annotations", "/metadata/labels", "/status"],
      "kind":"CustomResourceDefinition",
      "name":"backingimages.longhorn.io"
    }
  }
]'

Apply the SUSE Storage v1.7.2 BackingImage CRD.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.15.0
  labels:
    app.kubernetes.io/name: longhorn
    app.kubernetes.io/instance: longhorn
    app.kubernetes.io/version: v1.7.2
    longhorn-manager: ""
  name: backingimages.longhorn.io
spec:
  conversion:
    strategy: Webhook
    webhook:
      clientConfig:
        service:
          name: longhorn-conversion-webhook
          namespace: longhorn-system
          path: /v1/webhook/conversion
          port: 9501
      conversionReviewVersions:
      - v1beta2
      - v1beta1
  group: longhorn.io
  names:
    kind: BackingImage
    listKind: BackingImageList
    plural: backingimages
    shortNames:
    - lhbi
    singular: backingimage
  scope: Namespaced
  versions:
  - additionalPrinterColumns:
    - description: The backing image name
      jsonPath: .spec.image
      name: Image
      type: string
    - jsonPath: .metadata.creationTimestamp
      name: Age
      type: date
    name: v1beta1
    schema:
      openAPIV3Schema:
        description: BackingImage is where Longhorn stores backing image object.
        properties:
          apiVersion:
            description: |-
              APIVersion defines the versioned schema of this representation of an object.
              Servers should convert recognized schemas to the latest internal value, and
              may reject unrecognized values.
              More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
            type: string
          kind:
            description: |-
              Kind is a string value representing the REST resource this object represents.
              Servers may infer this from the endpoint the client submits requests to.
              Cannot be updated.
              In CamelCase.
              More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
            type: string
          metadata:
            type: object
          spec:
            x-kubernetes-preserve-unknown-fields: true
          status:
            x-kubernetes-preserve-unknown-fields: true
        type: object
    served: true
    storage: false
    subresources:
      status: {}
  - additionalPrinterColumns:
    - description: The system generated UUID
      jsonPath: .status.uuid
      name: UUID
      type: string
    - description: The source of the backing image file data
      jsonPath: .spec.sourceType
      name: SourceType
      type: string
    - description: The backing image file size in each disk
      jsonPath: .status.size
      name: Size
      type: string
    - description: The virtual size of the image (may be larger than file size)
      jsonPath: .status.virtualSize
      name: VirtualSize
      type: string
    - jsonPath: .metadata.creationTimestamp
      name: Age
      type: date
    name: v1beta2
    schema:
      openAPIV3Schema:
        description: BackingImage is where Longhorn stores backing image object.
        properties:
          apiVersion:
            description: |-
              APIVersion defines the versioned schema of this representation of an object.
              Servers should convert recognized schemas to the latest internal value, and
              may reject unrecognized values.
              More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
            type: string
          kind:
            description: |-
              Kind is a string value representing the REST resource this object represents.
              Servers may infer this from the endpoint the client submits requests to.
              Cannot be updated.
              In CamelCase.
              More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
            type: string
          metadata:
            type: object
          spec:
            description: BackingImageSpec defines the desired state of the Longhorn
              backing image
            properties:
              checksum:
                type: string
              diskFileSpecMap:
                additionalProperties:
                  properties:
                    evictionRequested:
                      type: boolean
                  type: object
                type: object
              diskSelector:
                items:
                  type: string
                type: array
              disks:
                additionalProperties:
                  type: string
                description: Deprecated. We are now using DiskFileSpecMap to assign
                  different spec to the file on different disks.
                type: object
              minNumberOfCopies:
                type: integer
              nodeSelector:
                items:
                  type: string
                type: array
              secret:
                type: string
              secretNamespace:
                type: string
              sourceParameters:
                additionalProperties:
                  type: string
                type: object
              sourceType:
                enum:
                - download
                - upload
                - export-from-volume
                - restore
                - clone
                type: string
            type: object
          status:
            description: BackingImageStatus defines the observed state of the Longhorn
              backing image status
            properties:
              checksum:
                type: string
              diskFileStatusMap:
                additionalProperties:
                  properties:
                    lastStateTransitionTime:
                      type: string
                    message:
                      type: string
                    progress:
                      type: integer
                    state:
                      type: string
                  type: object
                nullable: true
                type: object
              diskLastRefAtMap:
                additionalProperties:
                  type: string
                nullable: true
                type: object
              ownerID:
                type: string
              size:
                format: int64
                type: integer
              uuid:
                type: string
              virtualSize:
                description: Virtual size of image, which may be larger than physical
                  size. Will be zero until known (e.g. while a backing image is uploading)
                format: int64
                type: integer
            type: object
        type: object
    served: true
    storage: true
    subresources:
      status: {}

Known Issues

1. Upgrade Stuck in "Pre-draining" State

A virtual machine with a container disk cannot be migrated because of a limitation of the live migration feature. This causes the upgrade process to become stuck in the "Pre-draining" state.

Manually stop the virtual machines to continue the upgrade process.

For more information, see Issue #7005.

2. Upgrade Stuck on Waiting for Bundle to Become Ready

This issue is caused by a race condition when the Fleet agent (fleet-agent) is redeployed. The following error messages indicate that the issue exists.

> kubectl get bundles -n fleet-local
NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
mcc-harvester                                 0/1                       ErrApplied(1) [Cluster fleet-local/local: encountered 2 deletion errors. First is: admission webhook "validator.harvesterhci.io" denied the request: Internal error occurred: no route match found for DELETE /v1, Kind=Secret harvester-system/sh.helm.release.v1.harvester.v2]
mcc-harvester-crd                             0/1                       ErrApplied(1) [Cluster fleet-local/local: admission webhook "validator.harvesterhci.io" denied the request: Internal error occurred: no route match found for DELETE /v1, Kind=Secret harvester-system/sh.helm.release.v1.harvester-crd.v1]

You can run the following script to fix the issue.

#!/bin/bash

patch_fleet_bundle() {
  local bundleName=$1
  local generation=$(kubectl get -n fleet-local bundle ${bundleName} -o jsonpath='{.spec.forceSyncGeneration}')
  local new_generation=$((generation+1))
  patch_manifest="$(mktemp)"
  cat > "$patch_manifest" <<EOF
{
  "spec": {
    "forceSyncGeneration": $new_generation
  }
}
EOF
  echo "patch bundle to new generation: $new_generation"
  kubectl patch -n fleet-local bundle ${bundleName}  --type=merge --patch-file $patch_manifest
  rm -f $patch_manifest
}

echo "removing harvester validating webhook"
kubectl delete validatingwebhookconfiguration harvester-validator

for bundle in mcc-harvester-crd mcc-harvester
do
  patch_fleet_bundle ${bundle}
done

echo "removing longhorn services"
kubectl delete svc longhorn-engine-manager -n longhorn-system --ignore-not-found=true
kubectl delete svc longhorn-replica-manager -n longhorn-system --ignore-not-found=true

3. Upgrade Stuck on Waiting for Fleet

When upgrading from v1.3.2 to v1.4.0, the upgrade process may become stuck on waiting for Fleet to become ready. This issue is caused by a race condition when Rancher is redeployed.

Check the Harvester logs and Fleet history for the following indicators:

The manifest pod is stuck in the deployed status.
The upgrade is pending with a chart version that has been deployed.

Example:

> kubectl logs -n harvester-system -l harvesterhci.io/upgradeComponent=manifest
wait helm release cattle-fleet-system fleet fleet-104.0.2+up0.10.2 0.10.2 deployed

> helm history -n cattle-fleet-system fleet
REVISION	UPDATED                 	STATUS         	CHART                	APP VERSION	DESCRIPTION
26      	Tue Dec 10 03:09:13 2024	superseded     	fleet-103.1.5+up0.9.5	0.9.5      	Upgrade complete
27      	Sun Dec 15 09:26:54 2024	superseded     	fleet-103.1.5+up0.9.5	0.9.5      	Upgrade complete
28      	Sun Dec 15 09:27:03 2024	superseded     	fleet-103.1.5+up0.9.5	0.9.5      	Upgrade complete
29      	Mon Dec 16 05:57:03 2024	deployed       	fleet-103.1.5+up0.9.5	0.9.5      	Upgrade complete
30      	Mon Dec 16 05:57:13 2024	pending-upgrade	fleet-103.1.5+up0.9.5	0.9.5      	Preparing upgrade

You can run the following command to fix the issue.

helm rollback fleet -n cattle-fleet-system <last-deployed-revision>

4. Upgrade Restarts Unexpectedly After Clicking "Dismiss it" Button

When you use Rancher to upgrade SUSE Virtualization, the Rancher UI displays a dialog with a button labeled "Dismiss it". Clicking this button may result in the following issues:

The status section of the harvesterhci.io/v1beta1/upgrade CR is cleared, causing the loss of all important information about the upgrade.
The upgrade process restarts unexpectedly.

This issue affects Rancher v2.10.x, which uses v1.0.2, v1.0.3, and v1.0.4 of the Harvester UI Extension. All other SUSE Virtualization UI versions are not affected. The issue is fixed in Harvester UI Extension v1.0.5 and v1.5.0.

To avoid this issue, perform either of the following actions:

Use the SUSE Virtualization UI for upgrades. Clicking the "Dismiss it" button on the SUSE Virtualization UI does not result in unexpected behavior.

Instead of clicking the button on the Rancher UI, run the following command against the cluster:

kubectl -n harvester-system label upgrades -l harvesterhci.io/latestUpgrade=true harvesterhci.io/read-message=true

Related issue: #7791