Advanced Options and Configuration

This section contains advanced information describing the different ways you can run and manage RKE2.

Certificate Rotation

By default, certificates in RKE2 expire in 12 months.

If the certificates are expired or have fewer than 90 days remaining before they expire, the certificates are rotated when RKE2 is restarted.

As of v1.21.8+rke2r1, certificates can also be rotated manually. To do this, it is best to stop the rke2-server process, rotate the certificates, then start the process up again:

systemctl stop rke2-server
rke2 certificate rotate
systemctl start rke2-server

To renew agent certificates, restart rke2-agent in agent nodes. Agent certificates are renewed every time the agent starts.

systemctl restart rke2-agent

It is also possible to rotate an individual service by passing the --service flag, for example: rke2 certificate rotate --service api-server. See Certificate Management for more details.

Auto-Deploying Manifests

Any file found in /var/lib/rancher/rke2/server/manifests will automatically be deployed to Kubernetes in a manner similar to kubectl apply.

For information about deploying Helm charts using the manifests directory, refer to the section about Helm.

Configuring containerd

RKE2 will generate the config.toml for containerd in /var/lib/rancher/rke2/agent/etc/containerd/config.toml.

For advanced customization of this file you can create another file called config.toml.tmpl in the same directory and it will be used instead.

The config.toml.tmpl will be treated as a Go template file, and the config.Node structure is being passed to the template. See this template for an example of how to use the structure to customize the configuration file.

Configuring an HTTP proxy

If you are running RKE2 in an environment, which only has external connectivity through an HTTP proxy, you can configure your proxy settings on the RKE2 systemd service. These proxy settings will then be used in RKE2 and passed down to the embedded containerd and kubelet.

Add the necessary HTTP_PROXY, HTTPS_PROXY and NO_PROXY variables to the environment file of your systemd service, usually:

  • /etc/default/rke2-server

  • /etc/default/rke2-agent

RKE2 will automatically add the cluster internal Pod and Service IP ranges and cluster DNS domain to the list of NO_PROXY entries. You should ensure that the IP address ranges used by the Kubernetes nodes themselves (i.e. the public and private IPs of the nodes) are included in the NO_PROXY list, or that the nodes can be reached through the proxy.

HTTP_PROXY=http://your-proxy.example.com:8888
HTTPS_PROXY=http://your-proxy.example.com:8888
NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16

If you want to configure the proxy settings for containerd without affecting RKE2 and the Kubelet, you can prefix the variables with CONTAINERD_:

CONTAINERD_HTTP_PROXY=http://your-proxy.example.com:8888
CONTAINERD_HTTPS_PROXY=http://your-proxy.example.com:8888
CONTAINERD_NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16

Node Labels and Taints

RKE2 agents can be configured with the options node-label and node-taint which adds a label and taint to the kubelet. The two options only add labels and/or taints at registration time, and can only be added once and not removed after that through rke2 commands.

If you want to change node labels and taints after node registration you should use kubectl. Refer to the official Kubernetes documentation for details on how to add taints and node labels.

How Agent Node Registration Works

Agent nodes are registered via a websocket connection initiated by the rke2 agent process, and the connection is maintained by a client-side load balancer running as part of the agent process.

Agents register with the server using the cluster secret portion of the join token, along with a randomly generated node-specific password, which is stored on the agent at /etc/rancher/node/password. The server will store the passwords for individual nodes as Kubernetes secrets, and any subsequent attempts must use the same password. Node password secrets are stored in the kube-system namespace with names using the template <host>.node-password.rke2. These secrets are deleted when the corresponding Kubernetes node is deleted.

Prior to RKE2 v1.20.2 servers stored passwords on disk at /var/lib/rancher/rke2/server/cred/node-passwd.

If the /etc/rancher/node directory of an agent is removed, the password file should be recreated for the agent prior to startup, or the entry removed from the server or Kubernetes cluster (depending on the RKE2 version).

Starting the Server with the Installation Script

The installation script provides units for systemd, but does not enable or start the service by default.

When running with systemd, logs will be created in /var/log/syslog and viewed using journalctl -u rke2-server or journalctl -u rke2-agent.

An example of installing with the install script:

curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server
systemctl start rke2-server

Disabling Server Charts

The server charts bundled with rke2 deployed during cluster bootstrapping can be disabled and replaced with alternatives. A common use case is replacing the bundled rke2-ingress-nginx chart with an alternative.

To disable any of the bundled system charts, set the disable parameter in the config file before bootstrapping. An example of disabling all available system charts is:

# /etc/rancher/rke2/config.yaml
disable:
  - rke2-coredns
  - rke2-ingress-nginx
  - rke2-metrics-server
  - rke2-snapshot-controller
  - rke2-snapshot-controller-crd
  - rke2-snapshot-validation-webhook

It is the cluster operator’s responsibility to ensure that components are disabled or replaced with care, as the server charts play important roles in cluster operability. Refer to the architecture overview for more information on the individual system charts role within the cluster.

Installation on classified AWS regions or networks with custom AWS API endpoints

In public AWS regions, to ensure RKE2 is cloud-enabled, and capable of auto-provisioning certain cloud resources, config RKE2 with:

# /etc/rancher/rke2/config.yaml
cloud-provider-name: aws

When installing RKE2 on classified regions (such as SC2S or C2S), there are a few additional pre-requisites to be aware of to ensure RKE2 knows how and where to securely communicate with the appropriate AWS endpoints:

  1. Ensure all the common AWS cloud-provider prerequisites are met. These are independent of regions and are always required.

  2. Ensure RKE2 knows where to send API requests for ec2 and elasticloadbalancing services by creating a cloud.conf file, the below is an example for the us-iso-east-1 (C2S) region:

    # /etc/rancher/rke2/cloud.conf
    [Global]
    [ServiceOverride "ec2"]
      Service=ec2
      Region=us-iso-east-1
      URL=https://ec2.us-iso-east-1.c2s.ic.gov
      SigningRegion=us-iso-east-1
    [ServiceOverride "elasticloadbalancing"]
      Service=elasticloadbalancing
      Region=us-iso-east-1
      URL=https://elasticloadbalancing.us-iso-east-1.c2s.ic.gov
      SigningRegion=us-iso-east-1

    Alternatively, if you are using private AWS endpoints, ensure the appropriate URL is used for each of the private endpoints.

  3. Ensure the appropriate AWS CA bundle is loaded into the system’s root ca trust store. This may already be done for you depending on the AMI you are using.

    # on CentOS/RHEL 7/8
    cp <ca.pem> /etc/pki/ca-trust/source/anchors/
    update-ca-trust
  4. Configure RKE2 to use the aws cloud-provider with the custom cloud.conf created in step 1:

    # /etc/rancher/rke2/config.yaml
    ...
    cloud-provider-name: aws
    cloud-provider-config: "/etc/rancher/rke2/cloud.conf"
    ...
  5. Install RKE2 normally (most likely in an airgapped capacity)

  6. Validate successful installation by confirming the existence of AWS metadata on cluster node labels with kubectl get nodes --show-labels

Control Plane Component Resource Requests/Limits

The following options are available under the server sub-command for RKE2. The options allow for specifying CPU requests and limits for the control plane components within RKE2.

   --control-plane-resource-requests value       (components) Control Plane resource requests [$RKE2_CONTROL_PLANE_RESOURCE_REQUESTS]
   --control-plane-resource-limits value         (components) Control Plane resource limits [$RKE2_CONTROL_PLANE_RESOURCE_LIMITS]

Values are a comma-delimited list of [controlplane-component]-(cpu|memory)=[desired-value]. The possible values for controlplane-component are:

kube-apiserver
kube-scheduler
kube-controller-manager
kube-proxy
etcd
cloud-controller-manager

Thus, an example config may value may look like:

# /etc/rancher/rke2/config.yaml
control-plane-resource-requests:
  - kube-apiserver-cpu=500m
  - kube-apiserver-memory=512M
  - kube-scheduler-cpu=250m
  - kube-scheduler-memory=512M
  - etcd-cpu=1000m

The unit values for CPU/memory are identical to Kubernetes resource units (See: Resource Limits in Kubernetes).

Extra Control Plane Component Volume Mounts

The following options are available under the server sub-command for RKE2. These options specify host-path mounting of directories from the node filesystem into the static pod component that corresponds to the prefixed name.

Flag ENV VAR

--kube-apiserver-extra-mount

RKE2_KUBE_APISERVER_EXTRA_MOUNT

kube-apiserver extra volume mounts

--kube-scheduler-extra-mount

RKE2_KUBE_SCHEDULER_EXTRA_MOUNT

kube-scheduler extra volume mounts

--kube-controller-manager-extra-mount

RKE2_KUBE_CONTROLLER_MANAGER_EXTRA_MOUNT

--kube-proxy-extra-mount

RKE2_KUBE_PROXY_EXTRA_MOUNT

--etcd-extra-mount

RKE2_ETCD_EXTRA_MOUNT

--cloud-controller-manager-extra-mount

RKE2_CLOUD_CONTROLLER_MANAGER_EXTRA_MOUNT

RW Host Path Volume Mount

/source/volume/path/on/host:/destination/volume/path/in/staticpod

RO Host Path Volume Mount

In order to mount a volume as read only, append :ro to the end of the volume mount: /source/volume/path/on/host:/destination/volume/path/in/staticpod:ro

Multiple volume mounts can be specified for the same component by passing the flag values as an array in the config file.

Version Gate

Prior to April 2024 releases (v1.27.13+rke2r1, v1.28.9+rke2r1, v1.29.4+rke2r1), only directories can be mounted.

# /etc/rancher/rke2/config.yaml
kube-apiserver-extra-mount:
   - "/tmp/foo:/root/foo"
   - "/tmp/bar.txt:/etc/bar.txt:ro"

Extra Control Plane Component Environment Variables

The following options are available under the server sub-command for RKE2. These options specify additional environment variables in standard format i.e. KEY=VALUE for the static pod component that corresponds to the prefixed name.

Flag ENV VAR

--kube-apiserver-extra-env

RKE2_KUBE_APISERVER_EXTRA_ENV

--kube-scheduler-extra-env

RKE2_KUBE_SCHEDULER_EXTRA_ENV

--kube-controller-manager-extra-env

RKE2_KUBE_CONTROLLER_MANAGER_EXTRA_ENV

--kube-proxy-extra-env

RKE2_KUBE_PROXY_EXTRA_ENV

--etcd-extra-env

RKE2_ETCD_EXTRA_ENV

--cloud-controller-manager-extra-env

RKE2_CLOUD_CONTROLLER_MANAGER_EXTRA_ENV

Multiple environment variables can be specified for the same component by passing the flag values as an array in the config file.

# /etc/rancher/rke2/config.yaml
kube-apiserver-extra-env:
  - "MY_FOO=FOO"
  - "MY_BAR=BAR"
kube-scheduler-extra-env: "TZ=America/Los_Angeles"

Deploy NVIDIA operator

The NVIDIA operator allows administrators of Kubernetes clusters to manage GPUs just like CPUs. It includes everything needed for pods to be able to operate GPUs.

Depending on the underlying OS, some steps need to be fulfilled

  • SLES

  • Ubuntu

  • RHEL

The NVIDIA operator cannot automatically install kernel drivers on SLES. NVIDIA drivers must be manually installed on all GPU nodes before deploying the operator in the cluster. It can be done with the following steps:

# Assuming you are using sle15sp5, if different, change the url accordingly sudo zypper addrepo --refresh 'https://download.nvidia.com/suse/sle15sp5' NVIDIA sudo zypper --gpg-auto-import-keys refresh sudo zypper install -y ---auto-agree-with-licenses nvidia-gl-G06 nvidia-video-G06 nvidia-compute-utils-G06

Then reboot.

If everything worked correctly, after the reboot, you should see the NVRM and GCC version of the driver when executing the command:

cat /proc/driver/nvidia/version

Finally, create the symlink:

sudo ln -s /sbin/ldconfig /sbin/ldconfig.real ```

The NVIDIA operator can automatically install kernel drivers on Ubuntu using the nvidia-driver-daemonset, although not all versions are supported. You can also pre-install them manually and the operator will detect them:

sudo apt install nvidia-driver-535-server

Then reboot.

If everything worked correctly, after the reboot, you should see a correct output when executing the command:

cat /proc/driver/nvidia/version

The NVIDIA operator can automatically install kernel drivers on RHEL using the nvidia-driver-daemonset. You would only need to create the symlink:

sudo ln -s /sbin/ldconfig /sbin/ldconfig.real

Once the OS is ready and RKE2 is running, install the GPU Operator with the following yaml manifest:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: gpu-operator
  namespace: kube-system
spec:
  repo: https://helm.ngc.nvidia.com/nvidia
  chart: gpu-operator
  targetNamespace: gpu-operator
  createNamespace: true
  valuesContent: |-
    toolkit:
      env:
      - name: CONTAINERD_SOCKET
        value: /run/k3s/containerd/containerd.sock

The NVIDIA operator restarts containerd with a hangup call which restarts RKE2.

After one minute approximately, you can make the following checks to verify that everything worked as expected:

  1. Check if the operator detected the driver and GPU correctly:

    kubectl get node $NODENAME -o jsonpath='{.metadata.labels}' | jq | grep "nvidia.com"

    You should see labels specifying driver and GPU (e.g. nvidia.com/gpu.machine or nvidia.com/cuda.driver.major).

  2. Check if the gpu was added (by nvidia-device-plugin-daemonset) as an allocatable resource in the node:

    kubectl get node $NODENAME -o jsonpath='{.status.allocatable}' | jq

    You should see "nvidia.com/gpu": followed by the number of gpus in the node.

  3. Check that the container runtime binary was installed by the operator (in particular by the nvidia-container-toolkit-daemonset):

    ls /usr/local/nvidia/toolkit/nvidia-container-runtime
  4. Verify if containerd config was updated to include the nvidia container runtime:

    grep nvidia /etc/containerd/config.toml
  5. Run a pod to verify that the GPU resource can successfully be scheduled on a pod and the pod can detect it.

    apiVersion: v1
    kind: Pod
    metadata:
      name: nbody-gpu-benchmark
      namespace: default
    spec:
      restartPolicy: OnFailure
      runtimeClassName: nvidia
      containers:
      - name: cuda-container
        image: nvcr.io/nvidia/k8s/cuda-sample:nbody
        args: ["nbody", "-gpu", "-benchmark"]
        resources:
          limits:
            nvidia.com/gpu: 1
        env:
        - name: NVIDIA_VISIBLE_DEVICES
          value: all
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: all