本文档采用自动化机器翻译技术翻译。 尽管我们力求提供准确的译文,但不对翻译内容的完整性、准确性或可靠性作出任何保证。 若出现任何内容不一致情况,请以原始 英文 版本为准,且原始英文版本为权威文本。

升级故障排除

有关升级生命周期和组件的高级概述,请参阅相关文档

Rancher 端

在此示例中,我们使用以下 ManagedOSImage 定义升级了集群节点:

apiVersion: elemental.cattle.io/v1beta1
kind: ManagedOSImage
metadata:
  name: my-upgrade
  namespace: fleet-default
spec:
  # Set to the new {elemental-product-name} version you would like to upgrade to or track the latest tag
  osImage: "registry.suse.com/rancher/sle-micro/5.5:latest"
  clusterTargets:
    - clusterName: my-cluster

一旦应用了`ManagedOSImage`,elemental-operator`将验证它并生成相关的`BundleBundle`名称将以`mos`为前缀,然后是`ManagedOSImage`名称。在本例中为 `mos-my-upgrade

在`Bundle`定义中,您将找到有关升级计划和所需目标的详细信息。 例如:

kubectl -n fleet-default get bundle mos-my-upgrade -o yaml
点击此处查看详细信息
apiVersion: fleet.cattle.io/v1alpha1
kind: Bundle
metadata:
  creationTimestamp: "2023-06-16T09:01:47Z"
  generation: 1
  name: mos-my-upgrade
  namespace: fleet-default
  ownerReferences:
  - apiVersion: elemental.cattle.io/v1beta1
    controller: true
    kind: ManagedOSImage
    name: my-upgrade
    uid: e468ed21-23bb-487a-a022-dbc7ef753720
  resourceVersion: "1038645"
  uid: 35e83fc4-28c8-4b10-8059-cae6cdff2cda
spec:
  resources:
  - content: '{"kind":"ClusterRole","apiVersion":"rbac.authorization.k8s.io/v1","metadata":{"name":"os-upgrader-my-upgrade","creationTimestamp":null},"rules":[{"verbs":["update","get","list","watch","patch"],"apiGroups":[""],"resources":["nodes"]},{"verbs":["list"],"apiGroups":[""],"resources":["pods"]}]}'
    name: ClusterRole--os-upgrader-my-upgrade-296a3abf3451.yaml
  - content: '{"kind":"ClusterRoleBinding","apiVersion":"rbac.authorization.k8s.io/v1","metadata":{"name":"os-upgrader-my-upgrade","creationTimestamp":null},"subjects":[{"kind":"ServiceAccount","name":"os-upgrader-my-upgrade","namespace":"cattle-system"}],"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"os-upgrader-my-upgrade"}}'
    name: ClusterRoleBinding--os-upgrader-my-upgrade-f63eaecde935.yaml
  - content: '{"kind":"ServiceAccount","apiVersion":"v1","metadata":{"name":"os-upgrader-my-upgrade","namespace":"cattle-system","creationTimestamp":null}}'
    name: ServiceAccount-cattle-system-os-upgrader-my-upgrade-ce93d-01096.yaml
  - content: '{"kind":"Secret","apiVersion":"v1","metadata":{"name":"os-upgrader-my-upgrade","namespace":"cattle-system","creationTimestamp":null},"data":{"cloud-config":""}}'
    name: Secret-cattle-system-os-upgrader-my-upgrade-a997ee6a67ef.yaml
  - content: '{"kind":"Plan","apiVersion":"upgrade.cattle.io/v1","metadata":{"name":"os-upgrader-my-upgrade","namespace":"cattle-system","creationTimestamp":null},"spec":{"concurrency":1,"nodeSelector":{},"serviceAccountName":"os-upgrader-my-upgrade","version":"latest","secrets":[{"name":"os-upgrader-my-upgrade","path":"/run/data"}],"tolerations":[{"operator":"Exists"}],"cordon":true,"upgrade":{"image":"registry.suse.com/suse/sle-micro/5.5","command":["/usr/sbin/suc-upgrade"]}},"status":{}}'
    name: Plan-cattle-system-os-upgrader-my-upgrade-273c2c09afca.yaml
  targets:
  - clusterName: my-cluster
.
.
.

SUSE® Rancher Prime: OS Manager 集群端

任何正确注册并属于目标集群的SUSE® Rancher Prime: OS Manager节点将获取该包并开始应用。
此操作由运行在SUSE® Rancher Prime: OS Manager集群上的Rancher的`system-upgrade-controller`执行。
要监控此控制器的正确操作,您可以查看其日志:

kubectl -n cattle-system logs deployment/system-upgrade-controller

如果一切正常,`system-upgrade-controller`将在集群上创建一个升级计划:

kubectl -n cattle-system get plans

对于每个计划,控制器将协调将在每个目标节点上应用的作业。
作业名称将使用计划名称(os-upgrader-my-upgrade)和目标机器主机名(my-host)以便于发现。
例如:apply-os-upgrader-my-upgrade-on-my-host-7a25e
您可以使用以下方式监控这些作业:

kubectl -n cattle-system get jobs

每个作业将使用在`ManagedOSImage`定义中指定的SLE Micro镜像的`privileged: true`容器。该容器将尝试升级系统并执行重启。

如果作业失败,您可以通过检查日志来查看其状态:

kubectl -n cattle-system logs job.batch/apply-os-upgrader-my-upgrade-on-my-host-7a25e
两个阶段的作业处理

请注意,升级处理分为两个阶段。
您会注意到同一个作业被运行了两次,第一次以`Unknown`状态结束,并且不会完成。
这可以预期,因为SUSE® Rancher Prime: OS Manager依赖于在机器重启后再次运行该作业,以便验证新版本是否正确安装。
您会注意到任务的第二次运行,这次完成得很顺利。

kubectl -n cattle-system get jobs
NAMESPACE       NAME                                            COMPLETIONS   DURATION   AGE
cattle-system   apply-os-upgrader-my-upgrade-on-my-host-0b392   1/1           2m34s      6m23s
cattle-system   apply-os-upgrader-my-upgrade-on-my-host-7a25e   0/1           6m23s      6m23s
kubectl -n cattle-system get pods
NAME                                            READY   STATUS      RESTARTS      AGE
apply-os-upgrader-my-upgrade-on-my-host-zbkrh   0/1     Completed   0             9m40s
apply-os-upgrader-my-upgrade-on-my-host-zvrff   0/1     Unknown     0             12m

从故障中恢复

ManagedOSImage 升级处理可能会失败,导致一个或多个节点处于故障状态。例如,如果要升级的镜像在注册表中未找到或损坏,则在下游集群上运行的升级作业将不会成功。

在这种情况下,运行失败的升级作业的节点将保持被隔离状态。您可以使用一个正常工作的`osImage`来更新ManagedOSImage,或者选择删除它以停止任何进一步的升级尝试。无论如何,为了恢复它们并能够安排后续的升级,受影响的节点需要手动解除隔离。