Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / SUSE Edge Documentation / Product Documentation / Setting up the management cluster

29 Setting up the management cluster

29.1 Introduction

The management cluster is the part of ATIP that is used to manage the provision and lifecycle of the runtime stacks. From a technical point of view, the management cluster contains the following components:

  • SUSE Linux Enterprise Micro as the OS. Depending on the use case, some configurations like networking, storage, users and kernel arguments can be customized.

  • RKE2 as the Kubernetes cluster. Depending on the use case, it can be configured to use specific CNI plugins, such as Multus, Cilium, etc.

  • Rancher as the management platform to manage the lifecycle of the clusters.

  • Metal3 as the component to manage the lifecycle of the bare-metal nodes.

  • CAPI as the component to manage the lifecycle of the Kubernetes clusters (downstream clusters). With ATIP, also the RKE2 CAPI Provider is used to manage the lifecycle of the RKE2 clusters (downstream clusters).

With all components mentioned above, the management cluster can manage the lifecycle of downstream clusters, using a declarative approach to manage the infrastructure and applications.

Note
Note

For more information about SUSE Linux Enterprise Micro, see: SLE Micro (Chapter 7, SLE Micro)

For more information about RKE2, see: RKE2 (Chapter 14, RKE2)

For more information about Rancher, see: Rancher (Chapter 4, Rancher)

For more information about Metal3, see: Metal3 (Chapter 8, Metal3)

29.2 Steps to set up the management cluster

The following steps are necessary to set up the management cluster (using a single node):

product atip mgmtcluster1

The following are the main steps to set up the management cluster using a declarative approach:

  1. Image preparation for connected environments (Section 29.3, “Image preparation for connected environments”): The first step is to prepare the manifests and files with all the necessary configurations to be used in connected environments.

    • Directory structure for connected environments (Section 29.3.1, “Directory structure”): This step creates a directory structure to be used by Edge Image Builder to store the configuration files and the image itself.

    • Management cluster definition file (Section 29.3.2, “Management cluster definition file”): The mgmt-cluster.yaml file is the main definition file for the management cluster. It contains the following information about the image to be created:

      • Image Information: The information related to the image to be created using the base image.

      • Operating system: The operating system configurations to be used in the image.

      • Kubernetes: Helm charts and repositories, kubernetes version, network configuration, and the nodes to be used in the cluster.

    • Custom folder (Section 29.3.3, “Custom folder”): The custom folder contains the configuration files and scripts to be used by Edge Image Builder to deploy a fully functional management cluster.

      • Files: Contains the configuration files to be used by the management cluster.

      • Scripts: Contains the scripts to be used by the management cluster.

    • Kubernetes folder (Section 29.3.4, “Kubernetes folder”): The kubernetes folder contains the configuration files to be used by the management cluster.

      • Manifests: Contains the manifests to be used by the management cluster.

      • Helm: Contains the Helm charts to be used by the management cluster.

      • Config: Contains the configuration files to be used by the management cluster.

    • Network folder (Section 29.3.5, “Networking folder”): The network folder contains the network configuration files to be used by the management cluster nodes.

  2. Image preparation for air-gap environments (Section 29.4, “Image preparation for air-gap environments”): The step is to show the differences to prepare the manifests and files to be used in an air-gap scenario.

    • Directory structure for air-gap environments (Section 29.4.1, “Directory structure for air-gap environments”): The directory structure must be modified to include the resources needed to run the management cluster in an air-gap environment.

    • Modifications in the definition file (Section 29.4.2, “Modifications in the definition file”): The mgmt-cluster.yaml file must be modified to include the embeddedArtifactRegistry section with the images field set to all container images to be included into the EIB output image.

    • Modifications in the custom folder (Section 29.4.3, “Modifications in the custom folder”): The custom folder must be modified to include the resources needed to run the management cluster in an air-gap environment.

      • Register script: The custom/scripts/99-register.sh script must be removed when you use an air-gap environment.

      • Air-gap resources: The custom/files/airgap-resources.tar.gz file must be included in the custom/files folder with all the resources needed to run the management cluster in an air-gap environment.

      • Scripts: The custom/scripts/99-mgmt-setup.sh script must be modified to extract and copy the airgap-resources.tar.gz file to the final location. The custom/files/metal3.sh script must be modified to use the local resources included in the airgap-resources.tar.gz file instead of downloading them from the internet.

  3. Image creation (Section 29.5, “Image creation”): This step covers the creation of the image using the Edge Image Builder tool (for both, connected and air-gap scenarios). Check the prerequisites (Chapter 9, Edge Image Builder) to run the Edge Image Builder tool on your system.

  4. Management Cluster Provision (Section 29.6, “Provision the management cluster”): This step covers the provisioning of the management cluster using the image created in the previous step (for both, connected and air-gap scenarios). This step can be done using a laptop, server, VM or any other x86_64 system with a USB port.

Note
Note

For more information about Edge Image Builder, see Edge Image Builder (Chapter 9, Edge Image Builder) and Edge Image Builder Quick Start (Chapter 3, Standalone clusters with Edge Image Builder).

29.3 Image preparation for connected environments

Using Edge Image Builder to create the image for the management cluster, a lot of configurations can be customized, but in this document, we cover the minimal configurations necessary to set up the management cluster. Edge Image Builder is typically run from inside a container so, if you do not already have a way to run containers, we need to start by installing a container runtime such as Podman or Rancher Desktop. For this guide, we assume you already have a container runtime available.

Also, as a prerequisite to deploy a highly available management cluster, you need to reserve three IPs in your network: - apiVIP for the API VIP Address (used to access the Kubernetes API server). - ingressVIP for the Ingress VIP Address (consumed, for example, by the Rancher UI). - metal3VIP for the Metal3 VIP Address.

29.3.1 Directory structure

When running EIB, a directory is mounted from the host, so the first thing to do is to create a directory structure to be used by EIB to store the configuration files and the image itself. This directory has the following structure:

eib
├── mgmt-cluster.yaml
├── network
│ └── mgmt-cluster-node1.yaml
├── kubernetes
│ ├── manifests
│ │ ├── rke2-ingress-config.yaml
│ │ ├── neuvector-namespace.yaml
│ │ ├── ingress-l2-adv.yaml
│ │ └── ingress-ippool.yaml
│ ├── helm
│ │ └── values
│ │     ├── rancher.yaml
│ │     ├── neuvector.yaml
│ │     ├── metal3.yaml
│ │     └── certmanager.yaml
│ └── config
│     └── server.yaml
├── custom
│ ├── scripts
│ │ ├── 99-register.sh
│ │ ├── 99-mgmt-setup.sh
│ │ └── 99-alias.sh
│ └── files
│     ├── rancher.sh
│     ├── mgmt-stack-setup.service
│     ├── metal3.sh
│     └── basic-setup.sh
└── base-images
Note
Note

The image SLE-Micro.x86_64-5.5.0-Default-SelfInstall-GM2.install.iso must be downloaded from the SUSE Customer Center or the SUSE Download page, and it must be located under the base-images folder.

You should check the SHA256 checksum of the image to ensure it has not been tampered with. The checksum can be found in the same location where the image was downloaded.

An example of the directory structure can be found in the SUSE Edge GitHub repository under the "telco-examples" folder.

29.3.2 Management cluster definition file

The mgmt-cluster.yaml file is the main definition file for the management cluster. It contains the following information:

apiVersion: 1.0
image:
  imageType: iso
  arch: x86_64
  baseImage: SLE-Micro.x86_64-5.5.0-Default-SelfInstall-GM2.install.iso
  outputImageName: eib-mgmt-cluster-image.iso
operatingSystem:
  isoConfiguration:
    installDevice: /dev/sda
  users:
  - username: root
    encryptedPassword: ${ROOT_PASSWORD}
  packages:
    packageList:
    - git
    - jq
    sccRegistrationCode: ${SCC_REGISTRATION_CODE}
kubernetes:
  version: ${KUBERNETES_VERSION}
  helm:
    charts:
      - name: cert-manager
        repositoryName: jetstack
        version: 1.14.2
        targetNamespace: cert-manager
        valuesFile: certmanager.yaml
        createNamespace: true
        installationNamespace: kube-system
      - name: longhorn-crd
        version: 103.3.0+up1.6.1
        repositoryName: rancher-charts
        targetNamespace: longhorn-system
        createNamespace: true
        installationNamespace: kube-system
      - name: longhorn
        version: 103.3.0+up1.6.1
        repositoryName: rancher-charts
        targetNamespace: longhorn-system
        createNamespace: true
        installationNamespace: kube-system
      - name: metal3-chart
        version: 0.7.4
        repositoryName: suse-edge-charts
        targetNamespace: metal3-system
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: metal3.yaml
      - name: neuvector-crd
        version: 103.0.3+up2.7.6
        repositoryName: rancher-charts
        targetNamespace: neuvector
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: neuvector.yaml
      - name: neuvector
        version: 103.0.3+up2.7.6
        repositoryName: rancher-charts
        targetNamespace: neuvector
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: neuvector.yaml
      - name: rancher
        version: 2.8.8
        repositoryName: rancher-prime
        targetNamespace: cattle-system
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: rancher.yaml
    repositories:
      - name: jetstack
        url: https://charts.jetstack.io
      - name: rancher-charts
        url: https://charts.rancher.io/
      - name: suse-edge-charts
        url: oci://registry.suse.com/edge
      - name: rancher-prime
        url: https://charts.rancher.com/server-charts/prime
  network:
    apiHost: ${API_HOST}
    apiVIP: ${API_VIP}
  nodes:
    - hostname: mgmt-cluster-node1
      initializer: true
      type: server
#   - hostname: mgmt-cluster-node2
#     type: server
#   - hostname: mgmt-cluster-node3
#     type: server

To explain the fields and values in the mgmt-cluster.yaml definition file, we have divided it into the following sections.

  • Image section (definition file):

image:
  imageType: iso
  arch: x86_64
  baseImage: SLE-Micro.x86_64-5.5.0-Default-SelfInstall-GM2.install.iso
  outputImageName: eib-mgmt-cluster-image.iso

where the baseImage is the original image you downloaded from the SUSE Customer Center or the SUSE Download page. outputImageName is the name of the new image that will be used to provision the management cluster.

  • Operating system section (definition file):

operatingSystem:
  isoConfiguration:
    installDevice: /dev/sda
  users:
  - username: root
    encryptedPassword: ${ROOT_PASSWORD}
  packages:
    packageList:
    - jq
    sccRegistrationCode: ${SCC_REGISTRATION_CODE}

where the installDevice is the device to be used to install the operating system, the username and encryptedPassword are the credentials to be used to access the system, the packageList is the list of packages to be installed (jq is required internally during the installation process), and the sccRegistrationCode is the registration code used to get the packages and dependencies at build time and can be obtained from the SUSE Customer Center. The encrypted password can be generated using the openssl command as follows:

openssl passwd -6 MyPassword!123

This outputs something similar to:

$6$UrXB1sAGs46DOiSq$HSwi9GFJLCorm0J53nF2Sq8YEoyINhHcObHzX2R8h13mswUIsMwzx4eUzn/rRx0QPV4JIb0eWCoNrxGiKH4R31
  • Kubernetes section (definition file):

kubernetes:
  version: ${KUBERNETES_VERSION}
  helm:
    charts:
      - name: cert-manager
        repositoryName: jetstack
        version: 1.14.2
        targetNamespace: cert-manager
        valuesFile: certmanager.yaml
        createNamespace: true
        installationNamespace: kube-system
      - name: longhorn-crd
        version: 103.3.0+up1.6.1
        repositoryName: rancher-charts
        targetNamespace: longhorn-system
        createNamespace: true
        installationNamespace: kube-system
      - name: longhorn
        version: 103.3.0+up1.6.1
        repositoryName: rancher-charts
        targetNamespace: longhorn-system
        createNamespace: true
        installationNamespace: kube-system
      - name: metal3-chart
        version: 0.7.4
        repositoryName: suse-edge-charts
        targetNamespace: metal3-system
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: metal3.yaml
      - name: neuvector-crd
        version: 103.0.3+up2.7.6
        repositoryName: rancher-charts
        targetNamespace: neuvector
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: neuvector.yaml
      - name: neuvector
        version: 103.0.3+up2.7.6
        repositoryName: rancher-charts
        targetNamespace: neuvector
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: neuvector.yaml
      - name: rancher
        version: 2.8.8
        repositoryName: rancher-prime
        targetNamespace: cattle-system
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: rancher.yaml
    repositories:
      - name: jetstack
        url: https://charts.jetstack.io
      - name: rancher-charts
        url: https://charts.rancher.io/
      - name: suse-edge-charts
        url: oci://registry.suse.com/edge
      - name: rancher-prime
        url: https://charts.rancher.com/server-charts/prime
    network:
      apiHost: ${API_HOST}
      apiVIP: ${API_VIP}
    nodes:
    - hostname: mgmt-cluster-node1
      initializer: true
      type: server
#   - hostname: mgmt-cluster-node2
#     type: server
#   - hostname: mgmt-cluster-node3
#     type: server

where version is the version of Kubernetes to be installed. In our case, we are using an RKE2 cluster, so the version must be minor less than 1.29 to be compatible with Rancher (for example, v1.28.13+rke2r1).

The helm section contains the list of Helm charts to be installed, the repositories to be used, and the version configuration for all of them.

The network section contains the configuration for the network, like the apiHost and apiVIP to be used by the RKE2 component. The apiVIP should be an IP address that is not used in the network and should not be part of the DHCP pool (in case we use DHCP). Also, when we use the apiVIP in a multi-node cluster, it is used to access the Kubernetes API server. The apiHost is the name resolution to apiVIP to be used by the RKE2 component.

The nodes section contains the list of nodes to be used in the cluster. The nodes section contains the list of nodes to be used in the cluster. In this example, a single-node cluster is being used, but it can be extended to a multi-node cluster by adding more nodes to the list (by uncommenting the lines).

Note
Note
  • The names of the nodes must be unique in the cluster.

  • Optionally, use the initializer field to specify the bootstrap host, otherwise it will be the first node in the list.

  • The names of the nodes must be the same as the host names defined in the Network Folder (Section 29.3.5, “Networking folder”) when network configuration is required.

29.3.3 Custom folder

The custom folder contains the following subfolders:

...
├── custom
│ ├── scripts
│ │ ├── 99-register.sh
│ │ ├── 99-mgmt-setup.sh
│ │ └── 99-alias.sh
│ └── files
│     ├── rancher.sh
│     ├── mgmt-stack-setup.service
│     ├── metal3.sh
│     └── basic-setup.sh
...
  • The custom/files folder contains the configuration files to be used by the management cluster.

  • The custom/scripts folder contains the scripts to be used by the management cluster.

The custom/files folder contains the following files:

  • basic-setup.sh: contains the configuration parameters about the Metal3 version to be used, as well as the Rancher and MetalLB basic parameters. Only modify this file if you want to change the versions of the components or the namespaces to be used.

    #!/bin/bash
    # Pre-requisites. Cluster already running
    export KUBECTL="/var/lib/rancher/rke2/bin/kubectl"
    export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
    
    ##################
    # METAL3 DETAILS #
    ##################
    export METAL3_CHART_TARGETNAMESPACE="metal3-system"
    export METAL3_CLUSTERCTLVERSION="1.6.2"
    export METAL3_CAPICOREVERSION="1.6.2"
    export METAL3_CAPIMETAL3VERSION="1.6.0"
    export METAL3_CAPIRKE2VERSION="0.4.1"
    export METAL3_CAPIPROVIDER="rke2"
    export METAL3_CAPISYSTEMNAMESPACE="capi-system"
    export METAL3_RKE2BOOTSTRAPNAMESPACE="rke2-bootstrap-system"
    export METAL3_CAPM3NAMESPACE="capm3-system"
    export METAL3_RKE2CONTROLPLANENAMESPACE="rke2-control-plane-system"
    export METAL3_CAPI_IMAGES="registry.suse.com/edge"
    # Or registry.opensuse.org/isv/suse/edge/clusterapi/containerfile/suse for the upstream ones
    
    ###########
    # METALLB #
    ###########
    export METALLBNAMESPACE="metallb-system"
    
    ###########
    # RANCHER #
    ###########
    export RANCHER_CHART_TARGETNAMESPACE="cattle-system"
    export RANCHER_FINALPASSWORD="adminadminadmin"
    
    die(){
      echo ${1} 1>&2
      exit ${2}
    }
  • metal3.sh: contains the configuration for the Metal3 component to be used (no modifications needed). In future versions, this script will be replaced to use instead Rancher Turtles to make it easy.

    #!/bin/bash
    set -euo pipefail
    
    BASEDIR="$(dirname "$0")"
    source ${BASEDIR}/basic-setup.sh
    
    METAL3LOCKNAMESPACE="default"
    METAL3LOCKCMNAME="metal3-lock"
    
    trap 'catch $? $LINENO' EXIT
    
    catch() {
      if [ "$1" != "0" ]; then
        echo "Error $1 occurred on $2"
        ${KUBECTL} delete configmap ${METAL3LOCKCMNAME} -n ${METAL3LOCKNAMESPACE}
      fi
    }
    
    # Get or create the lock to run all those steps just in a single node
    # As the first node is created WAY before the others, this should be enough
    # TODO: Investigate if leases is better
    if [ $(${KUBECTL} get cm -n ${METAL3LOCKNAMESPACE} ${METAL3LOCKCMNAME} -o name | wc -l) -lt 1 ]; then
      ${KUBECTL} create configmap ${METAL3LOCKCMNAME} -n ${METAL3LOCKNAMESPACE} --from-literal foo=bar
    else
      exit 0
    fi
    
    # Wait for metal3
    while ! ${KUBECTL} wait --for condition=ready -n ${METAL3_CHART_TARGETNAMESPACE} $(${KUBECTL} get pods -n ${METAL3_CHART_TARGETNAMESPACE} -l app.kubernetes.io/name=metal3-ironic -o name) --timeout=10s; do sleep 2 ; done
    
    # Get the ironic IP
    IRONICIP=$(${KUBECTL} get cm -n ${METAL3_CHART_TARGETNAMESPACE} ironic-bmo -o jsonpath='{.data.IRONIC_IP}')
    
    # If LoadBalancer, use metallb, else it is NodePort
    if [ $(${KUBECTL} get svc -n ${METAL3_CHART_TARGETNAMESPACE} metal3-metal3-ironic -o jsonpath='{.spec.type}') == "LoadBalancer" ]; then
      # Wait for metallb
      while ! ${KUBECTL} wait --for condition=ready -n ${METALLBNAMESPACE} $(${KUBECTL} get pods -n ${METALLBNAMESPACE} -l app.kubernetes.io/component=controller -o name) --timeout=10s; do sleep 2 ; done
    
      # Do not create the ippool if already created
      ${KUBECTL} get ipaddresspool -n ${METALLBNAMESPACE} ironic-ip-pool -o name || cat <<-EOF | ${KUBECTL} apply -f -
      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        name: ironic-ip-pool
        namespace: ${METALLBNAMESPACE}
      spec:
        addresses:
        - ${IRONICIP}/32
        serviceAllocation:
          priority: 100
          serviceSelectors:
          - matchExpressions:
            - {key: app.kubernetes.io/name, operator: In, values: [metal3-ironic]}
    	EOF
    
      # Same for L2 Advs
      ${KUBECTL} get L2Advertisement -n ${METALLBNAMESPACE} ironic-ip-pool-l2-adv -o name || cat <<-EOF | ${KUBECTL} apply -f -
      apiVersion: metallb.io/v1beta1
      kind: L2Advertisement
      metadata:
        name: ironic-ip-pool-l2-adv
        namespace: ${METALLBNAMESPACE}
      spec:
        ipAddressPools:
        - ironic-ip-pool
    	EOF
    fi
    
    # If clusterctl is not installed, install it
    if ! command -v clusterctl > /dev/null 2>&1; then
      LINUXARCH=$(uname -m)
      case $(uname -m) in
        "x86_64")
          export GOARCH="amd64" ;;
        "aarch64")
          export GOARCH="arm64" ;;
        "*")
          echo "Arch not found, asumming amd64"
          export GOARCH="amd64" ;;
      esac
    
      # Clusterctl bin
      # Maybe just use the binary from hauler if available
      curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v${METAL3_CLUSTERCTLVERSION}/clusterctl-linux-${GOARCH} -o /usr/local/bin/clusterctl
      chmod +x /usr/local/bin/clusterctl
    fi
    
    # If rancher is deployed
    if [ $(${KUBECTL} get pods -n ${RANCHER_CHART_TARGETNAMESPACE} -l app=rancher -o name | wc -l) -ge 1 ]; then
      cat <<-EOF | ${KUBECTL} apply -f -
    	apiVersion: management.cattle.io/v3
    	kind: Feature
    	metadata:
    	  name: embedded-cluster-api
    	spec:
    	  value: false
    	EOF
    
      # Disable Rancher webhooks for CAPI
      ${KUBECTL} delete mutatingwebhookconfiguration.admissionregistration.k8s.io mutating-webhook-configuration
      ${KUBECTL} delete validatingwebhookconfigurations.admissionregistration.k8s.io validating-webhook-configuration
      ${KUBECTL} wait --for=delete namespace/cattle-provisioning-capi-system --timeout=300s
    fi
    
    # Deploy CAPI
    if [ $(${KUBECTL} get pods -n ${METAL3_CAPISYSTEMNAMESPACE} -o name | wc -l) -lt 1 ]; then
    
      # https://github.com/rancher-sandbox/cluster-api-provider-rke2#setting-up-clusterctl
      mkdir -p ~/.cluster-api
      cat <<-EOF > ~/.cluster-api/clusterctl.yaml
    	images:
    	  all:
    	    repository: ${METAL3_CAPI_IMAGES}
    	EOF
    
      # Try this command 3 times just in case, stolen from https://stackoverflow.com/a/33354419
      if ! (r=3; while ! clusterctl init \
        --core "cluster-api:v${METAL3_CAPICOREVERSION}"\
        --infrastructure "metal3:v${METAL3_CAPIMETAL3VERSION}"\
        --bootstrap "${METAL3_CAPIPROVIDER}:v${METAL3_CAPIRKE2VERSION}"\
        --control-plane "${METAL3_CAPIPROVIDER}:v${METAL3_CAPIRKE2VERSION}" ; do
                ((--r))||exit
                echo "Something went wrong, let's wait 10 seconds and retry"
                sleep 10;done) ; then
          echo "clusterctl failed"
          exit 1
      fi
    
      # Wait for capi-controller-manager
      while ! ${KUBECTL} wait --for condition=ready -n ${METAL3_CAPISYSTEMNAMESPACE} $(${KUBECTL} get pods -n ${METAL3_CAPISYSTEMNAMESPACE} -l cluster.x-k8s.io/provider=cluster-api -o name) --timeout=10s; do sleep 2 ; done
    
      # Wait for capm3-controller-manager, there are two pods, the ipam and the capm3 one, just wait for the first one
      while ! ${KUBECTL} wait --for condition=ready -n ${METAL3_CAPM3NAMESPACE} $(${KUBECTL} get pods -n ${METAL3_CAPM3NAMESPACE} -l cluster.x-k8s.io/provider=infrastructure-metal3 -o name | head -n1 ) --timeout=10s; do sleep 2 ; done
    
      # Wait for rke2-bootstrap-controller-manager
      while ! ${KUBECTL} wait --for condition=ready -n ${METAL3_RKE2BOOTSTRAPNAMESPACE} $(${KUBECTL} get pods -n ${METAL3_RKE2BOOTSTRAPNAMESPACE} -l cluster.x-k8s.io/provider=bootstrap-rke2 -o name) --timeout=10s; do sleep 2 ; done
    
      # Wait for rke2-control-plane-controller-manager
      while ! ${KUBECTL} wait --for condition=ready -n ${METAL3_RKE2CONTROLPLANENAMESPACE} $(${KUBECTL} get pods -n ${METAL3_RKE2CONTROLPLANENAMESPACE} -l cluster.x-k8s.io/provider=control-plane-rke2 -o name) --timeout=10s; do sleep 2 ; done
    
    fi
    
    # Clean up the lock cm
    
    ${KUBECTL} delete configmap ${METAL3LOCKCMNAME} -n ${METAL3LOCKNAMESPACE}
    • rancher.sh: contains the configuration for the Rancher component to be used (no modifications needed).

      #!/bin/bash
      set -euo pipefail
      
      BASEDIR="$(dirname "$0")"
      source ${BASEDIR}/basic-setup.sh
      
      RANCHERLOCKNAMESPACE="default"
      RANCHERLOCKCMNAME="rancher-lock"
      
      if [ -z "${RANCHER_FINALPASSWORD}" ]; then
        # If there is no final password, then finish the setup right away
        exit 0
      fi
      
      trap 'catch $? $LINENO' EXIT
      
      catch() {
        if [ "$1" != "0" ]; then
          echo "Error $1 occurred on $2"
          ${KUBECTL} delete configmap ${RANCHERLOCKCMNAME} -n ${RANCHERLOCKNAMESPACE}
        fi
      }
      
      # Get or create the lock to run all those steps just in a single node
      # As the first node is created WAY before the others, this should be enough
      # TODO: Investigate if leases is better
      if [ $(${KUBECTL} get cm -n ${RANCHERLOCKNAMESPACE} ${RANCHERLOCKCMNAME} -o name | wc -l) -lt 1 ]; then
        ${KUBECTL} create configmap ${RANCHERLOCKCMNAME} -n ${RANCHERLOCKNAMESPACE} --from-literal foo=bar
      else
        exit 0
      fi
      
      # Wait for rancher to be deployed
      while ! ${KUBECTL} wait --for condition=ready -n ${RANCHER_CHART_TARGETNAMESPACE} $(${KUBECTL} get pods -n ${RANCHER_CHART_TARGETNAMESPACE} -l app=rancher -o name) --timeout=10s; do sleep 2 ; done
      until ${KUBECTL} get ingress -n ${RANCHER_CHART_TARGETNAMESPACE} rancher > /dev/null 2>&1; do sleep 10; done
      
      RANCHERBOOTSTRAPPASSWORD=$(${KUBECTL} get secret -n ${RANCHER_CHART_TARGETNAMESPACE} bootstrap-secret -o jsonpath='{.data.bootstrapPassword}' | base64 -d)
      RANCHERHOSTNAME=$(${KUBECTL} get ingress -n ${RANCHER_CHART_TARGETNAMESPACE} rancher -o jsonpath='{.spec.rules[0].host}')
      
      # Skip the whole process if things have been set already
      if [ -z $(${KUBECTL} get settings.management.cattle.io first-login -ojsonpath='{.value}') ]; then
        # Add the protocol
        RANCHERHOSTNAME="https://${RANCHERHOSTNAME}"
        TOKEN=""
        while [ -z "${TOKEN}" ]; do
          # Get token
          sleep 2
          TOKEN=$(curl -sk -X POST ${RANCHERHOSTNAME}/v3-public/localProviders/local?action=login -H 'content-type: application/json' -d "{\"username\":\"admin\",\"password\":\"${RANCHERBOOTSTRAPPASSWORD}\"}" | jq -r .token)
        done
      
        # Set password
        curl -sk ${RANCHERHOSTNAME}/v3/users?action=changepassword -H 'content-type: application/json' -H "Authorization: Bearer $TOKEN" -d "{\"currentPassword\":\"${RANCHERBOOTSTRAPPASSWORD}\",\"newPassword\":\"${RANCHER_FINALPASSWORD}\"}"
      
        # Create a temporary API token (ttl=60 minutes)
        APITOKEN=$(curl -sk ${RANCHERHOSTNAME}/v3/token -H 'content-type: application/json' -H "Authorization: Bearer ${TOKEN}" -d '{"type":"token","description":"automation","ttl":3600000}' | jq -r .token)
      
        curl -sk ${RANCHERHOSTNAME}/v3/settings/server-url -H 'content-type: application/json' -H "Authorization: Bearer ${APITOKEN}" -X PUT -d "{\"name\":\"server-url\",\"value\":\"${RANCHERHOSTNAME}\"}"
        curl -sk ${RANCHERHOSTNAME}/v3/settings/telemetry-opt -X PUT -H 'content-type: application/json' -H 'accept: application/json' -H "Authorization: Bearer ${APITOKEN}" -d '{"value":"out"}'
      fi
      
      # Clean up the lock cm
      ${KUBECTL} delete configmap ${RANCHERLOCKCMNAME} -n ${RANCHERLOCKNAMESPACE}
    • mgmt-stack-setup.service: contains the configuration to create the systemd service to run the scripts during the first boot (no modifications needed).

      [Unit]
      Description=Setup Management stack components
      Wants=network-online.target
      # It requires rke2 or k3s running, but it will not fail if those services are not present
      After=network.target network-online.target rke2-server.service k3s.service
      # At least, the basic-setup.sh one needs to be present
      ConditionPathExists=/opt/mgmt/bin/basic-setup.sh
      
      [Service]
      User=root
      Type=forking
      # Metal3 can take A LOT to download the IPA image
      TimeoutStartSec=1800
      
      ExecStartPre=/bin/sh -c "echo 'Setting up Management components...'"
      # Scripts are executed in StartPre because Start can only run a single on
      ExecStartPre=/opt/mgmt/bin/rancher.sh
      ExecStartPre=/opt/mgmt/bin/metal3.sh
      ExecStart=/bin/sh -c "echo 'Finished setting up Management components'"
      RemainAfterExit=yes
      KillMode=process
      # Disable & delete everything
      ExecStartPost=rm -f /opt/mgmt/bin/rancher.sh
      ExecStartPost=rm -f /opt/mgmt/bin/metal3.sh
      ExecStartPost=rm -f /opt/mgmt/bin/basic-setup.sh
      ExecStartPost=/bin/sh -c "systemctl disable mgmt-stack-setup.service"
      ExecStartPost=rm -f /etc/systemd/system/mgmt-stack-setup.service
      
      [Install]
      WantedBy=multi-user.target

The custom/scripts folder contains the following files:

  • 99-alias.sh script: contains the alias to be used by the management cluster to load the kubeconfig file at first boot (no modifications needed).

    #!/bin/bash
    echo "alias k=kubectl" >> /etc/profile.local
    echo "alias kubectl=/var/lib/rancher/rke2/bin/kubectl" >> /etc/profile.local
    echo "export KUBECONFIG=/etc/rancher/rke2/rke2.yaml" >> /etc/profile.local
  • 99-mgmt-setup.sh script: contains the configuration to copy the scripts during the first boot (no modifications needed).

    #!/bin/bash
    
    # Copy the scripts from combustion to the final location
    mkdir -p /opt/mgmt/bin/
    for script in basic-setup.sh rancher.sh metal3.sh; do
    	cp ${script} /opt/mgmt/bin/
    done
    
    # Copy the systemd unit file and enable it at boot
    cp mgmt-stack-setup.service /etc/systemd/system/mgmt-stack-setup.service
    systemctl enable mgmt-stack-setup.service
  • 99-register.sh script: contains the configuration to register the system using the SCC registration code. The ${SCC_ACCOUNT_EMAIL} and ${SCC_REGISTRATION_CODE} have to be set properly to register the system with your account.

    #!/bin/bash
    set -euo pipefail
    
    # Registration https://www.suse.com/support/kb/doc/?id=000018564
    if ! which SUSEConnect > /dev/null 2>&1; then
    	zypper --non-interactive install suseconnect-ng
    fi
    SUSEConnect --email "${SCC_ACCOUNT_EMAIL}" --url "https://scc.suse.com" --regcode "${SCC_REGISTRATION_CODE}"

29.3.4 Kubernetes folder

The kubernetes folder contains the following subfolders:

...
├── kubernetes
│ ├── manifests
│ │ ├── rke2-ingress-config.yaml
│ │ ├── neuvector-namespace.yaml
│ │ ├── ingress-l2-adv.yaml
│ │ └── ingress-ippool.yaml
│ ├── helm
│ │ └── values
│ │     ├── rancher.yaml
│ │     ├── neuvector.yaml
│ │     ├── metal3.yaml
│ │     └── certmanager.yaml
│ └── config
│     └── server.yaml
...

The kubernetes/config folder contains the following files:

  • server.yaml: By default, the CNI plug-in installed by default is Cilium, so you do not need to create this folder and file. Just in case you need to customize the CNI plug-in, you can use the server.yaml file under the kubernetes/config folder. It contains the following information:

    cni:
    - multus
    - cilium
Note
Note

This is an optional file to define certain Kubernetes customization, like the CNI plug-ins to be used or many options you can check in the official documentation.

The kubernetes/manifests folder contains the following files:

  • rke2-ingress-config.yaml: contains the configuration to create the Ingress service for the management cluster (no modifications needed).

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
      name: rke2-ingress-nginx
      namespace: kube-system
    spec:
      valuesContent: |-
        controller:
          config:
            use-forwarded-headers: "true"
            enable-real-ip: "true"
          publishService:
            enabled: true
          service:
            enabled: true
            type: LoadBalancer
            externalTrafficPolicy: Local
  • neuvector-namespace.yaml: contains the configuration to create the NeuVector namespace (no modifications needed).

    apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        pod-security.kubernetes.io/enforce: privileged
      name: neuvector
  • ingress-l2-adv.yaml: contains the configuration to create the L2Advertisement for the MetalLB component (no modifications needed).

    apiVersion: metallb.io/v1beta1
    kind: L2Advertisement
    metadata:
      name: ingress-l2-adv
      namespace: metallb-system
    spec:
      ipAddressPools:
        - ingress-ippool
  • ingress-ippool.yaml: contains the configuration to create the IPAddressPool for the rke2-ingress-nginx component. The ${INGRESS_VIP} has to be set properly to define the IP address reserved to be used by the rke2-ingress-nginx component.

    apiVersion: metallb.io/v1beta1
    kind: IPAddressPool
    metadata:
      name: ingress-ippool
      namespace: metallb-system
    spec:
      addresses:
        - ${INGRESS_VIP}/32
      serviceAllocation:
        priority: 100
        serviceSelectors:
          - matchExpressions:
              - {key: app.kubernetes.io/name, operator: In, values: [rke2-ingress-nginx]}

The kubernetes/helm/values folder contains the following files:

  • rancher.yaml: contains the configuration to create the Rancher component. The ${INGRESS_VIP} must be set properly to define the IP address to be consumed by the Rancher component. The URL to access the Rancher component will be https://rancher-${INGRESS_VIP}.sslip.io.

    hostname: rancher-${INGRESS_VIP}.sslip.io
    bootstrapPassword: "foobar"
    replicas: 1
    global.cattle.psp.enabled: "false"
  • neuvector.yaml: contains the configuration to create the NeuVector component (no modifications needed).

    controller:
      replicas: 1
      ranchersso:
        enabled: true
    manager:
      enabled: false
    cve:
      scanner:
        enabled: false
        replicas: 1
    k3s:
      enabled: true
    crdwebhook:
      enabled: false
  • metal3.yaml: contains the configuration to create the Metal3 component. The ${METAL3_VIP} must be set properly to define the IP address to be consumed by the Metal3 component.

    global:
      ironicIP: ${METAL3_VIP}
      enable_vmedia_tls: false
      additionalTrustedCAs: false
    metal3-ironic:
      global:
        predictableNicNames: "true"
      persistence:
        ironic:
          size: "5Gi"
Note
Note

The Media Server is an optional feature included in Metal3 (by default is disabled). To use the Metal3 feature, you need to configure it on the previous manifest. To use the Metal3 media server, specify the following variable:

  • add the enable_metal3_media_server to true to enable the media server feature in the global section.

  • include the following configuration about the media server where ${MEDIA_VOLUME_PATH} is the path to the media volume in the media (e.g /home/metal3/bmh-image-cache)

    metal3-media:
      mediaVolume:
        hostPath: ${MEDIA_VOLUME_PATH}

An external media server can be used to store the images, and in the case you want to use it with TLS, you will need to modify the following configurations:

  • set to true the additionalTrustedCAs in the previous metal3.yaml file to enable the additional trusted CAs from the external media server.

  • include the following secret configuration in the folder kubernetes/manifests/metal3-cacert-secret.yaml to store the CA certificate of the external media server.

    apiVersion: v1
    kind: Namespace
    metadata:
      name: metal3-system
    ---
    apiVersion: v1
    kind: Secret
    metadata:
      name: tls-ca-additional
      namespace: metal3-system
    type: Opaque
    data:
      ca-additional.crt: {{ additional_ca_cert | b64encode }}

The additional_ca_cert is the base64-encoded CA certificate of the external media server. You can use the following command to encode the certificate and generate the secret doing manually:

kubectl -n meta3-system create secret generic tls-ca-additional --from-file=ca-additional.crt=./ca-additional.crt
  • certmanager.yaml: contains the configuration to create the Cert-Manager component (no modifications needed).

    installCRDs: "true"

29.3.5 Networking folder

The network folder contains as many files as nodes in the management cluster. In our case, we have only one node, so we have only one file called mgmt-cluster-node1.yaml. The name of the file must match the host name defined in the mgmt-cluster.yaml definition file into the network/node section described above.

If you need to customize the networking configuration, for example, to use a specific static IP address (DHCP-less scenario), you can use the mgmt-cluster-node1.yaml file under the network folder. It contains the following information:

  • ${MGMT_GATEWAY}: The gateway IP address.

  • ${MGMT_DNS}: The DNS server IP address.

  • ${MGMT_MAC}: The MAC address of the network interface.

  • ${MGMT_NODE_IP}: The IP address of the management cluster.

routes:
  config:
  - destination: 0.0.0.0/0
    metric: 100
    next-hop-address: ${MGMT_GATEWAY}
    next-hop-interface: eth0
    table-id: 254
dns-resolver:
  config:
    server:
    - ${MGMT_DNS}
    - 8.8.8.8
interfaces:
- name: eth0
  type: ethernet
  state: up
  mac-address: ${MGMT_MAC}
  ipv4:
    address:
    - ip: ${MGMT_NODE_IP}
      prefix-length: 24
    dhcp: false
    enabled: true
  ipv6:
    enabled: false

If you want to use DHCP to get the IP address, you can use the following configuration (the MAC address must be set properly using the ${MGMT_MAC} variable):

## This is an example of a dhcp network configuration for a management cluster
interfaces:
- name: eth0
  type: ethernet
  state: up
  mac-address: ${MGMT_MAC}
  ipv4:
    dhcp: true
    enabled: true
  ipv6:
    enabled: false
Note
Note
  • Depending on the number of nodes in the management cluster, you can create more files like mgmt-cluster-node2.yaml, mgmt-cluster-node3.yaml, etc. to configure the rest of the nodes.

  • The routes section is used to define the routing table for the management cluster.

29.4 Image preparation for air-gap environments

This section describes how to prepare the image for air-gap environments showing only the differences from the previous sections. The following changes to the previous section (Image preparation for connected environments (Section 29.3, “Image preparation for connected environments”)) are required to prepare the image for air-gap environments:

  • The mgmt-cluster.yaml file must be modified to include the embeddedArtifactRegistry section with the images field set to all container images to be included into the EIB output image.

  • The custom/scripts/99-register.sh script must be removed when use an air-gap environment.

  • The custom/files/airgap-resources.tar.gz file must be included in the custom/files folder with all the resources needed to run the management cluster in an air-gap environment.

  • The custom/scripts/99-mgmt-setup.sh script must be modified to extract and copy the airgap-resources.tar.gz file to the final location.

  • The custom/files/metal3.sh script must be modified to use the local resources included in the airgap-resources.tar.gz file instead of downloading them from the internet.

29.4.1 Directory structure for air-gap environments

The directory structure for air-gap environments is the same as for connected environments, with the differences explained as follows:

eib
|-- base-images
|   |-- SLE-Micro.x86_64-5.5.0-Default-SelfInstall-GM2.install.iso
|-- custom
|   |-- files
|   |   |-- airgap-resources.tar.gz
|   |   |-- basic-setup.sh
|   |   |-- metal3.sh
|   |   |-- mgmt-stack-setup.service
|   |   |-- rancher.sh
|   |-- scripts
|       |-- 99-alias.sh
|       |-- 99-mgmt-setup.sh
|-- kubernetes
|   |-- config
|   |   |-- server.yaml
|   |-- helm
|   |   |-- values
|   |       |-- certmanager.yaml
|   |       |-- metal3.yaml
|   |       |-- neuvector.yaml
|   |       |-- rancher.yaml
|   |-- manifests
|       |-- neuvector-namespace.yaml
|-- mgmt-cluster.yaml
|-- network
    |-- mgmt-cluster-network.yaml
Note
Note

The image SLE-Micro.x86_64-5.5.0-Default-SelfInstall-GM2.install.iso must be downloaded from the SUSE Customer Center or the SUSE Download page, and it must be located under the base-images folder before starting with the process.

You should check the SHA256 checksum of the image to ensure it has not been tampered with. The checksum can be found in the same location where the image was downloaded.

An example of the directory structure can be found in the SUSE Edge GitHub repository under the "telco-examples" folder.

29.4.2 Modifications in the definition file

The mgmt-cluster.yaml file must be modified to include the embeddedArtifactRegistry section with the images field set to all container images to be included into the EIB output image. The images field must contain the list of all container images to be included in the output image. The following is an example of the mgmt-cluster.yaml file with the embeddedArtifactRegistry section included:

apiVersion: 1.0
image:
  imageType: iso
  arch: x86_64
  baseImage: SLE-Micro.x86_64-5.5.0-Default-SelfInstall-GM2.install.iso
  outputImageName: eib-mgmt-cluster-image.iso
operatingSystem:
  isoConfiguration:
    installDevice: /dev/sda
  users:
  - username: root
    encryptedPassword: ${ROOT_PASSWORD}
  packages:
    packageList:
    - jq
    sccRegistrationCode: ${SCC_REGISTRATION_CODE}
kubernetes:
  version: ${KUBERNETES_VERSION}
  helm:
    charts:
      - name: cert-manager
        repositoryName: jetstack
        version: 1.14.2
        targetNamespace: cert-manager
        valuesFile: certmanager.yaml
        createNamespace: true
        installationNamespace: kube-system
      - name: longhorn-crd
        version: 103.3.0+up1.6.1
        repositoryName: rancher-charts
        targetNamespace: longhorn-system
        createNamespace: true
        installationNamespace: kube-system
      - name: longhorn
        version: 103.3.0+up1.6.1
        repositoryName: rancher-charts
        targetNamespace: longhorn-system
        createNamespace: true
        installationNamespace: kube-system
      - name: metal3-chart
        version: 0.7.4
        repositoryName: suse-edge-charts
        targetNamespace: metal3-system
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: metal3.yaml
      - name: neuvector-crd
        version: 103.0.3+up2.7.6
        repositoryName: rancher-charts
        targetNamespace: neuvector
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: neuvector.yaml
      - name: neuvector
        version: 103.0.3+up2.7.6
        repositoryName: rancher-charts
        targetNamespace: neuvector
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: neuvector.yaml
      - name: rancher
        version: 2.8.8
        repositoryName: rancher-prime
        targetNamespace: cattle-system
        createNamespace: true
        installationNamespace: kube-system
        valuesFile: rancher.yaml
    repositories:
      - name: jetstack
        url: https://charts.jetstack.io
      - name: rancher-charts
        url: https://charts.rancher.io/
      - name: suse-edge-charts
        url: oci://registry.suse.com/edge
      - name: rancher-prime
        url: https://charts.rancher.com/server-charts/prime
    network:
      apiHost: ${API_HOST}
      apiVIP: ${API_VIP}
    nodes:
    - hostname: mgmt-cluster-node1
      initializer: true
      type: server
#   - hostname: mgmt-cluster-node2
#     type: server
#   - hostname: mgmt-cluster-node3
#     type: server
#       type: server
embeddedArtifactRegistry:
  images:
    - name: registry.rancher.com/rancher/backup-restore-operator:v4.0.3
    - name: registry.rancher.com/rancher/calico-cni:v3.27.4-rancher1
    - name: registry.rancher.com/rancher/cis-operator:v1.0.15
    - name: registry.rancher.com/rancher/coreos-kube-state-metrics:v1.9.7
    - name: registry.rancher.com/rancher/coreos-prometheus-config-reloader:v0.38.1
    - name: registry.rancher.com/rancher/coreos-prometheus-operator:v0.38.1
    - name: registry.rancher.com/rancher/flannel-cni:v0.3.0-rancher9
    - name: registry.rancher.com/rancher/fleet-agent:v0.9.9
    - name: registry.rancher.com/rancher/fleet:v0.9.9
    - name: registry.rancher.com/rancher/gitjob:v0.9.13
    - name: registry.rancher.com/rancher/grafana-grafana:7.1.5
    - name: registry.rancher.com/rancher/hardened-addon-resizer:1.8.20-build20240410
    - name: registry.rancher.com/rancher/hardened-calico:v3.28.1-build20240806
    - name: registry.rancher.com/rancher/hardened-cluster-autoscaler:v1.8.10-build20240124
    - name: registry.rancher.com/rancher/hardened-cni-plugins:v1.5.1-build20240805
    - name: registry.rancher.com/rancher/hardened-coredns:v1.11.1-build20240305
    - name: registry.rancher.com/rancher/hardened-dns-node-cache:1.22.28-build20240125
    - name: registry.rancher.com/rancher/hardened-etcd:v3.5.13-k3s1-build20240531
    - name: registry.rancher.com/rancher/hardened-flannel:v0.25.5-build20240801
    - name: registry.rancher.com/rancher/hardened-k8s-metrics-server:v0.7.1-build20240401
    - name: registry.rancher.com/rancher/hardened-kubernetes:v1.28.13-rke2r1-build20240815
    - name: registry.rancher.com/rancher/hardened-multus-cni:v4.0.2-build20240612
    - name: registry.rancher.com/rancher/hardened-node-feature-discovery:v0.15.4-build20240513
    - name: registry.rancher.com/rancher/hardened-whereabouts:v0.7.0-build20240429
    - name: registry.rancher.com/rancher/helm-project-operator:v0.2.1
    - name: registry.rancher.com/rancher/istio-kubectl:1.5.10
    - name: registry.rancher.com/rancher/jimmidyson-configmap-reload:v0.3.0
    - name: registry.rancher.com/rancher/k3s-upgrade:v1.28.13-k3s1
    - name: registry.rancher.com/rancher/klipper-helm:v0.8.4-build20240523
    - name: registry.rancher.com/rancher/klipper-lb:v0.4.9
    - name: registry.rancher.com/rancher/kube-api-auth:v0.2.1
    - name: registry.rancher.com/rancher/kubectl:v1.28.12
    - name: registry.rancher.com/rancher/library-nginx:1.19.2-alpine
    - name: registry.rancher.com/rancher/local-path-provisioner:v0.0.28
    - name: registry.rancher.com/rancher/machine:v0.15.0-rancher116
    - name: registry.rancher.com/rancher/mirrored-cluster-api-controller:v1.4.4
    - name: registry.rancher.com/rancher/nginx-ingress-controller:v1.10.4-hardened2
    - name: registry.rancher.com/rancher/pause:3.6
    - name: registry.rancher.com/rancher/prom-alertmanager:v0.21.0
    - name: registry.rancher.com/rancher/prom-node-exporter:v1.0.1
    - name: registry.rancher.com/rancher/prom-prometheus:v2.18.2
    - name: registry.rancher.com/rancher/prometheus-auth:v0.2.2
    - name: registry.rancher.com/rancher/prometheus-federator:v0.3.4
    - name: registry.rancher.com/rancher/pushprox-client:v0.1.3-rancher2-client
    - name: registry.rancher.com/rancher/pushprox-proxy:v0.1.3-rancher2-proxy
    - name: registry.rancher.com/rancher/rancher-agent:v2.8.8
    - name: registry.rancher.com/rancher/rancher-csp-adapter:v3.0.1
    - name: registry.rancher.com/rancher/rancher-webhook:v0.4.11
    - name: registry.rancher.com/rancher/rancher:v2.8.8
    - name: registry.rancher.com/rancher/rke-tools:v0.1.102
    - name: registry.rancher.com/rancher/rke2-cloud-provider:v1.29.3-build20240515
    - name: registry.rancher.com/rancher/rke2-runtime:v1.28.13-rke2r1
    - name: registry.rancher.com/rancher/rke2-upgrade:v1.28.13-rke2r1
    - name: registry.rancher.com/rancher/security-scan:v0.2.17
    - name: registry.rancher.com/rancher/shell:v0.1.26
    - name: registry.rancher.com/rancher/system-agent-installer-k3s:v1.28.13-k3s1
    - name: registry.rancher.com/rancher/system-agent-installer-rke2:v1.28.13-rke2r1
    - name: registry.rancher.com/rancher/system-agent:v0.3.9-suc
    - name: registry.rancher.com/rancher/system-upgrade-controller:v0.13.4
    - name: registry.rancher.com/rancher/ui-plugin-catalog:2.1.0
    - name: registry.rancher.com/rancher/ui-plugin-operator:v0.1.1
    - name: registry.rancher.com/rancher/webhook-receiver:v0.2.5
    - name: registry.rancher.com/rancher/kubectl:v1.20.2
    - name: registry.rancher.com/rancher/shell:v0.1.24
    - name: registry.rancher.com/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.4.1
    - name: registry.rancher.com/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6
    - name: registry.rancher.com/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20230312-helm-chart-4.5.2-28-g66a760794
    - name: registry.rancher.com/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20231011-8b53cabe0
    - name: registry.rancher.com/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20231226-1a7112e06
    - name: registry.rancher.com/rancher/mirrored-longhornio-csi-attacher:v4.4.2
    - name: registry.rancher.com/rancher/mirrored-longhornio-csi-provisioner:v3.6.2
    - name: registry.rancher.com/rancher/mirrored-longhornio-csi-resizer:v1.9.2
    - name: registry.rancher.com/rancher/mirrored-longhornio-csi-snapshotter:v6.3.2
    - name: registry.rancher.com/rancher/mirrored-longhornio-csi-node-driver-registrar:v2.9.2
    - name: registry.rancher.com/rancher/mirrored-longhornio-livenessprobe:v2.12.0
    - name: registry.rancher.com/rancher/mirrored-longhornio-backing-image-manager:v1.6.1
    - name: registry.rancher.com/rancher/mirrored-longhornio-longhorn-engine:v1.6.1
    - name: registry.rancher.com/rancher/mirrored-longhornio-longhorn-instance-manager:v1.6.1
    - name: registry.rancher.com/rancher/mirrored-longhornio-longhorn-manager:v1.6.1
    - name: registry.rancher.com/rancher/mirrored-longhornio-longhorn-share-manager:v1.6.1
    - name: registry.rancher.com/rancher/mirrored-longhornio-longhorn-ui:v1.6.1
    - name: registry.rancher.com/rancher/mirrored-longhornio-support-bundle-kit:v0.0.36
    - name: registry.suse.com/edge/cluster-api-provider-rke2-bootstrap:v0.4.1
    - name: registry.suse.com/edge/cluster-api-provider-rke2-controlplane:v0.4.1
    - name: registry.suse.com/edge/cluster-api-controller:v1.6.2
    - name: registry.suse.com/edge/cluster-api-provider-metal3:v1.6.0
    - name: registry.suse.com/edge/ip-address-manager:v1.6.0

29.4.3 Modifications in the custom folder

  • The custom/scripts/99-register.sh script must be removed when using an air-gap environment. As you can see in the directory structure, the 99-register.sh script is not included in the custom/scripts folder.

  • The custom/scripts/99-mgmt-setup.sh script must be modified to extract and copy the airgap-resources.tar.gz file to the final location. The following is an example of the 99-mgmt-setup.sh script with the modifications to extract and copy the airgap-resources.tar.gz file:

    #!/bin/bash
    
    # Copy the scripts from combustion to the final location
    mkdir -p /opt/mgmt/bin/
    for script in basic-setup.sh rancher.sh metal3.sh; do
    	cp ${script} /opt/mgmt/bin/
    done
    
    # Copy the systemd unit file and enable it at boot
    cp mgmt-stack-setup.service /etc/systemd/system/mgmt-stack-setup.service
    systemctl enable mgmt-stack-setup.service
    
    # Extract the airgap resources
    tar zxf airgap-resources.tar.gz
    
    # Copy the clusterctl binary to the final location
    cp airgap-resources/clusterctl /opt/mgmt/bin/ && chmod +x /opt/mgmt/bin/clusterctl
    
    # Copy the clusterctl.yaml and override
    mkdir -p /root/cluster-api
    cp -r airgap-resources/clusterctl.yaml airgap-resources/overrides /root/cluster-api/
  • The custom/files/metal3.sh script must be modified to use the local resources included in the airgap-resources.tar.gz file instead of downloading them from the internet. The following is an example of the metal3.sh script with the modifications to use the local resources:

    #!/bin/bash
    set -euo pipefail
    
    BASEDIR="$(dirname "$0")"
    source ${BASEDIR}/basic-setup.sh
    
    METAL3LOCKNAMESPACE="default"
    METAL3LOCKCMNAME="metal3-lock"
    
    trap 'catch $? $LINENO' EXIT
    
    catch() {
      if [ "$1" != "0" ]; then
        echo "Error $1 occurred on $2"
        ${KUBECTL} delete configmap ${METAL3LOCKCMNAME} -n ${METAL3LOCKNAMESPACE}
      fi
    }
    
    # Get or create the lock to run all those steps just in a single node
    # As the first node is created WAY before the others, this should be enough
    # TODO: Investigate if leases is better
    if [ $(${KUBECTL} get cm -n ${METAL3LOCKNAMESPACE} ${METAL3LOCKCMNAME} -o name | wc -l) -lt 1 ]; then
      ${KUBECTL} create configmap ${METAL3LOCKCMNAME} -n ${METAL3LOCKNAMESPACE} --from-literal foo=bar
    else
      exit 0
    fi
    
    # Wait for metal3
    while ! ${KUBECTL} wait --for condition=ready -n ${METAL3_CHART_TARGETNAMESPACE} $(${KUBECTL} get pods -n ${METAL3_CHART_TARGETNAMESPACE} -l app.kubernetes.io/name=metal3-ironic -o name) --timeout=10s; do sleep 2 ; done
    
    # If rancher is deployed
    if [ $(${KUBECTL} get pods -n ${RANCHER_CHART_TARGETNAMESPACE} -l app=rancher -o name | wc -l) -ge 1 ]; then
      cat <<-EOF | ${KUBECTL} apply -f -
    	apiVersion: management.cattle.io/v3
    	kind: Feature
    	metadata:
    	  name: embedded-cluster-api
    	spec:
    	  value: false
    	EOF
    
      # Disable Rancher webhooks for CAPI
      ${KUBECTL} delete mutatingwebhookconfiguration.admissionregistration.k8s.io mutating-webhook-configuration
      ${KUBECTL} delete validatingwebhookconfigurations.admissionregistration.k8s.io validating-webhook-configuration
      ${KUBECTL} wait --for=delete namespace/cattle-provisioning-capi-system --timeout=300s
    fi
    
    # Deploy CAPI
    if [ $(${KUBECTL} get pods -n ${METAL3_CAPISYSTEMNAMESPACE} -o name | wc -l) -lt 1 ]; then
    
      # Try this command 3 times just in case, stolen from https://stackoverflow.com/a/33354419
      if ! (r=3; while ! /opt/mgmt/bin/clusterctl init \
        --core "cluster-api:v${METAL3_CAPICOREVERSION}"\
        --infrastructure "metal3:v${METAL3_CAPIMETAL3VERSION}"\
        --bootstrap "${METAL3_CAPIPROVIDER}:v${METAL3_CAPIRKE2VERSION}"\
        --control-plane "${METAL3_CAPIPROVIDER}:v${METAL3_CAPIRKE2VERSION}"\
        --config /root/cluster-api/clusterctl.yaml ; do
                ((--r))||exit
                echo "Something went wrong, let's wait 10 seconds and retry"
                sleep 10;done) ; then
          echo "clusterctl failed"
          exit 1
      fi
    
      # Wait for capi-controller-manager
      while ! ${KUBECTL} wait --for condition=ready -n ${METAL3_CAPISYSTEMNAMESPACE} $(${KUBECTL} get pods -n ${METAL3_CAPISYSTEMNAMESPACE} -l cluster.x-k8s.io/provider=cluster-api -o name) --timeout=10s; do sleep 2 ; done
    
      # Wait for capm3-controller-manager, there are two pods, the ipam and the capm3 one, just wait for the first one
      while ! ${KUBECTL} wait --for condition=ready -n ${METAL3_CAPM3NAMESPACE} $(${KUBECTL} get pods -n ${METAL3_CAPM3NAMESPACE} -l cluster.x-k8s.io/provider=infrastructure-metal3 -o name | head -n1 ) --timeout=10s; do sleep 2 ; done
    
      # Wait for rke2-bootstrap-controller-manager
      while ! ${KUBECTL} wait --for condition=ready -n ${METAL3_RKE2BOOTSTRAPNAMESPACE} $(${KUBECTL} get pods -n ${METAL3_RKE2BOOTSTRAPNAMESPACE} -l cluster.x-k8s.io/provider=bootstrap-rke2 -o name) --timeout=10s; do sleep 2 ; done
    
      # Wait for rke2-control-plane-controller-manager
      while ! ${KUBECTL} wait --for condition=ready -n ${METAL3_RKE2CONTROLPLANENAMESPACE} $(${KUBECTL} get pods -n ${METAL3_RKE2CONTROLPLANENAMESPACE} -l cluster.x-k8s.io/provider=control-plane-rke2 -o name) --timeout=10s; do sleep 2 ; done
    
    fi
    
    # Clean up the lock cm
    
    ${KUBECTL} delete configmap ${METAL3LOCKCMNAME} -n ${METAL3LOCKNAMESPACE}
  • The custom/files/airgap-resources.tar.gz file must be included in the custom/files folder with all the resources needed to run the management cluster in an air-gap environment. This file must be prepared manually downloading all resources and compressing them into this single file. The airgap-resources.tar.gz file contains the following resources:

    |-- clusterctl
    |-- clusterctl.yaml
    |-- overrides
        |-- bootstrap-rke2
        |   |-- v0.4.1
        |       |-- bootstrap-components.yaml
        |       |-- metadata.yaml
        |-- cluster-api
        |   |-- v1.6.2
        |       |-- core-components.yaml
        |       |-- metadata.yaml
        |-- control-plane-rke2
        |   |-- v0.4.1
        |       |-- control-plane-components.yaml
        |       |-- metadata.yaml
        |-- infrastructure-metal3
            |-- v1.6.0
                |-- cluster-template.yaml
                |-- infrastructure-components.yaml
                |-- metadata.yaml

The clusterctl.yaml file contains the configuration to specify the images location and the overrides to be used by the clusterctl tool. The overrides folder contains yaml file manifests to be used instead of downloading them from the internet.

providers:
  # override a pre-defined provider
  - name: "cluster-api"
    url: "/root/cluster-api/overrides/cluster-api/v1.6.2/core-components.yaml"
    type: "CoreProvider"
  - name: "metal3"
    url: "/root/cluster-api/overrides/infrastructure-metal3/v1.6.0/infrastructure-components.yaml"
    type: "InfrastructureProvider"
  - name: "rke2"
    url: "/root/cluster-api/overrides/bootstrap-rke2/v0.4.1/bootstrap-components.yaml"
    type: "BootstrapProvider"
  - name: "rke2"
    url: "/root/cluster-api/overrides/control-plane-rke2/v0.4.1/control-plane-components.yaml"
    type: "ControlPlaneProvider"
images:
  all:
    repository: registry.suse.com/edge

The clusterctl and the rest of the files included in the overrides folder can be downloaded using the following curls commands:

# clusterctl binary
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.6.2/clusterctl-linux-${GOARCH} -o /usr/local/bin/clusterct

# boostrap-components (boostrap-rke2)
curl -L https://github.com/rancher-sandbox/cluster-api-provider-rke2/releases/download/v0.4.1/bootstrap-components.yaml
curl -L https://github.com/rancher-sandbox/cluster-api-provider-rke2/releases/download/v0.4.1/metadata.yaml

# control-plane-components (control-plane-rke2)
curl -L https://github.com/rancher-sandbox/cluster-api-provider-rke2/releases/download/v0.4.1/control-plane-components.yaml
curl -L https://github.com/rancher-sandbox/cluster-api-provider-rke2/releases/download/v0.4.1/metadata.yaml

# cluster-api components
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.6.2/core-components.yaml
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.6.2/metadata.yaml

# infrastructure-components (infrastructure-metal3)
curl -L https://github.com/metal3-io/cluster-api-provider-metal3/releases/download/v1.6.0/infrastructure-components.yaml
curl -L https://github.com/metal3-io/cluster-api-provider-metal3/releases/download/v1.6.0/metadata.yaml
Note
Note

If you want to use different versions of the components, you can change the version in the URL to download the specific version of the components.

With the previous resources downloaded, you can compress them into a single file using the following command:

tar -czvf airgap-resources.tar.gz clusterctl clusterctl.yaml overrides

29.5 Image creation

Once the directory structure is prepared following the previous sections (for both, connected and air-gap scenarios), run the following command to build the image:

podman run --rm --privileged -it -v $PWD:/eib \
 registry.suse.com/edge/edge-image-builder:1.0.2 \
 build --definition-file mgmt-cluster.yaml

This creates the ISO output image file that, in our case, based on the image definition described above, is eib-mgmt-cluster-image.iso.

29.6 Provision the management cluster

The previous image contains all components explained above, and it can be used to provision the management cluster using a virtual machine or a bare-metal server (using the virtual-media feature).