Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE CaaS Platform 4.2.4

2 Cluster Management

Cluster management refers to several processes in the life cycle of a cluster and its individual nodes: bootstrapping, joining and removing nodes. For maximum automation and ease SUSE CaaS Platform uses the skuba tool, which simplifies Kubernetes cluster creation and reconfiguration.

2.1 Prerequisites

You must have the proper SSH keys for accessing the nodes set up and allow passwordless sudo on the nodes in order to perform many of these steps. If you have followed the standard deployment procedures this should already be the case.

Please note: If you are using a different management workstation than the one you have used during the initial deployment, you might have to transfer the SSH identities from the original management workstation.

2.2 Bootstrap and Initial Configuration

Bootstrapping the cluster is the initial process of starting up a minimal viable cluster and joining the first master node. Only the first master node needs to be bootstrapped, later nodes can simply be joined as described in Section 2.3, “Adding Nodes”.

Before bootstrapping any nodes to the cluster, you need to create an initial cluster definition folder (initialize the cluster). This is done using skuba cluster init and its --control-plane flag.

For a step by step guide on how to initialize the cluster, configure updates using kured and subsequently bootstrap nodes to it, refer to the SUSE CaaS Platform Deployment Guide.

2.3 Adding Nodes

Once you have added the first master node to the cluster using skuba node bootstrap, use the skuba node join command to add more nodes. Joining master or worker nodes to an existing cluster should be done sequentially, meaning the nodes have to be added one after another and not more of them in parallel.

skuba node join --role <MASTER/WORKER> --user <USER_NAME> --sudo --target <IP/FQDN> <NODE_NAME>

The mandatory flags for the join command are --role, --user, --sudo and --target.

  • --role serves to specify if the node is a master or worker.

  • --sudo is for running the command with superuser privileges, which is necessary for all node operations.

  • <USER_NAME> is the name of the user that exists on your SLES machine (default: sles).

  • --target <IP/FQDN> is the IP address or FQDN of the relevant machine.

  • <NODE_NAME> is how you decide to name the node you are adding.

Important
Important

New master nodes that you didn’t initially include in your Terraform’s configuration have to be manually added to your load balancer’s configuration.

To add a new worker node, you would run something like:

skuba node join --role worker --user sles --sudo --target 10.86.2.164 worker1

2.3.1 Adding Nodes from Template

If you are using a virtual machine template for creating new cluster nodes, you must make sure that before joining the cloned machine to the cluster it is updated to the same software versions than the other nodes in the cluster.

Refer to Section 4.1, “Update Requirements”.

Nodes with mismatching package or container software versions might not be fully functional.

2.4 Removing Nodes

2.4.1 Temporary Removal

If you wish to remove a node temporarily, the recommended approach is to first drain the node.

When you want to bring the node back, you only have to uncordon it.

Tip
Tip

For instructions on how to perform these operations refer to Section 2.6, “Node Operations”.

2.4.2 Permanent Removal

Important
Important

Nodes removed with this method cannot be added back to the cluster or any other skuba-initiated cluster. You must reinstall the entire node and then join it again to the cluster.

The skuba node remove command serves to permanently remove nodes. Running this command will work even if the target virtual machine is down, so it is the safest way to remove the node.

skuba node remove <NODE_NAME> [flags]
Note
Note

Per default, node removal has an unlimited timeout on waiting for the node to drain. If the node is unreachable it can not be drained and thus the removal will fail or get stuck indefinitely. You can specify a time after which removal will be performed without waiting for the node to drain with the flag --drain-timeout <DURATION>.

For example, waiting for the node to drain for 1 minute and 5 seconds:

skuba node remove caasp-worker1 --drain-timeout 1m5s

For a list of supported time formats run skuba node remove -h.

Important
Important

After the removal of a master node, you have to manually delete its entries from your load balancer’s configuration.

2.5 Reconfiguring Nodes

To reconfigure a node, for example to change the node’s role from worker to master, you will need to use a combination of commands.

  1. Run skuba node remove <NODE_NAME>.

  2. Reinstall the node from scratch.

  3. Run skuba node join --role <DESIRED_ROLE> --user <USER_NAME> --sudo --target <IP/FQDN> <NODE_NAME>.

2.6 Node Operations

2.6.1 Uncordon and Cordon

These to commands respectively define if a node is marked as schedulable or unschedulable. This means that a node is allowed to or not allowed to receive any new workloads. This can be useful when troubleshooting a node.

To mark a node as unschedulable run:

kubectl cordon <NODE_NAME>

To mark a node as schedulable run:

kubectl uncordon <NODE_NAME>

2.6.2 Draining Nodes

Draining a node consists of evicting all the running pods from the current node in order to perform maintenance. This is a mandatory step in order to ensure a proper functioning of the workloads. This is achieved using kubectl.

To drain a node run:

kubectl drain <NODE_NAME>

This action will also implicitly cordon the node. Therefore once the maintenance is done, uncordon the node to set it back to schedulable.

Refer to the official Kubernetes documentation for more information: https://v1-17.docs.kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/#use-kubectl-drain-to-remove-a-node-from-service

Print this page