Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE CaaS Platform 4.2.4

1 Requirements

1.1 Platform

Currently we support the following platforms to deploy on:

  • SUSE OpenStack Cloud 8

  • VMware ESXi 6.7

  • KVM

  • Bare Metal x86_64

  • Amazon Web Services (technological preview)

SUSE CaaS Platform itself is based on SLE 15 SP1.

The steps for obtaining the correct installation image for each platform type are detailed in the respective platform deployment instructions.

1.2 Nodes

SUSE CaaS Platform consists of a number of (virtual) machines that run as a cluster.

You will need at least two machines:

  • 1 master node

  • 1 worker node

SUSE CaaS Platform 4.2.4 supports deployments with a single or multiple master nodes. Production environments must be deployed with multiple master nodes for resilience.

All communication to the cluster is done through a load balancer talking to the respective nodes. For that reason any failure tolerant environment must provide at least two load balancers for incoming communication.

The minimal viable failure tolerant production environment configuration consists of:

Cluster nodes:
  • 3 master nodes

  • 2 worker nodes

Important
Important: Dedicated Cluster Nodes

All cluster nodes must be dedicated (virtual) machines reserved for the purpose of running SUSE CaaS Platform.

Additional systems:
  • Fault tolerant load balancing solution

    (for example SUSE Linux Enterprise High Availability Extension with pacemaker and haproxy)

  • 1 management workstation

1.3 Hardware

1.3.1 Management Workstation

In order to deploy and control a SUSE CaaS Platform cluster you will need at least one machine capable of running skuba. This typically is a regular desktop workstation or laptop running SLE 15 SP1 or later.

The skuba CLI package is available from the SUSE CaaS Platform module. You will need a valid SUSE Linux Enterprise and SUSE CaaS Platform subscription to install this tool on the workstation.

Important
Important: Time Synchronization

It is vital that the management workstation runs an NTP client and that time synchronization is configured to the same NTP servers, which you will use later to synchronize the cluster nodes.

1.3.2 Storage Sizing

The storage sizes in the following lists are absolute minimum configurations.

Sizing of the storage for worker nodes depends largely on the expected amount of container images, their size and change rate. The basic operating system for all nodes might also include snapshots (when using btrfs) that can quickly fill up existing space.

We recommend provisioning a separate storage partition for container images on each (worker) node that can be adjusted in size when needed. Storage for /var/lib/containers on the worker nodes should be approximately 50GB in addition to the base OS storage.

1.3.2.1 Master Nodes

Up to 5 worker nodes (minimum):

  • Storage: 50 GB+

  • (v)CPU: 2

  • RAM: 4 GB

  • Network: Minimum 1Gb/s (faster is preferred)

Up to 10 worker nodes:

  • Storage: 50 GB+

  • (v)CPU: 2

  • RAM: 8 GB

  • Network: Minimum 1Gb/s (faster is preferred)

Up to 100 worker nodes:

  • Storage: 50 GB+

  • (v)CPU: 4

  • RAM: 16 GB

  • Network: Minimum 1Gb/s (faster is preferred)

Up to 250 worker nodes:

  • Storage: 50 GB+

  • (v)CPU: 8

  • RAM: 16 GB

  • Network: Minimum 1Gb/s (faster is preferred)

Important
Important

Using a minimum of 2 (v)CPUs is a hard requirement, deploying a cluster with less processing units is not possible.

1.3.2.2 Worker nodes

Important
Important

The worker nodes must have sufficient memory, CPU and disk space for the Pods/containers/applications that are planned to be hosted on these workers.

A worker node requires the following resources:

  • CPU cores: 1.250

  • RAM: 1.2 GB

Based on these values, the minimal configuration of a worker node is:

  • Storage: Depending on workloads, minimum 20-30 GB to hold the base OS and required packages. Mount additional storage volumes as needed.

  • (v)CPU: 2

  • RAM: 2 GB

  • Network: Minimum 1Gb/s (faster is preferred)

Calculate the size of the required (v)CPU by adding up the base requirements, the estimated additional essential cluster components (logging agent, monitoring agent, configuration management, etc.) and the estimated CPU workloads:

  • 1.250 (base requirements) + 0.250 (estimated additional cluster components) + estimated workload CPU requirements

Calculate the size of the RAM using a similar formula:

  • 1.2 GB (base requirements) + 500 MB (estimated additional cluster components) + estimated workload RAM requirements

Note
Note

These values are provided as a guide to work in most cases. They may vary based on the type of the running workloads.

1.3.3 Storage Performance

For master nodes you must ensure storage performance of at least 50 to 500 sequential IOPS with disk bandwidth depending on your cluster size. It is highly recommended to use SSD.

"Typically 50 sequential IOPS (for example, a 7200 RPM disk) is required.
For heavily loaded clusters, 500 sequential IOPS (for example, a typical local SSD
or a high performance virtualized block device) is recommended."
"Typically 10MB/s will recover 100MB data within 15 seconds.
For large clusters, 100MB/s or higher is suggested for recovering 1GB data
within 15 seconds."

https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#disks

This is extremely important to ensure a proper functioning of the critical component etcd.

It is possible to preliminary validate these requirements by using fio. This tool allows us to simulate etcd I/O (input/output) and to find out from the output statistics whether or not the storage is suitable.

  1. Install the tool:

    zypper in -y fio
  2. Run the testing:

    fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-etcd-dir --size=22m --bs=2300 --name=test-etcd-io
    • Replace test-etcd-dir with a directory located on the same disk as the incoming etcd data under /var/lib/etcd

From the outputs, the interesting part is fsync/fdatasync/sync_file_range where the values are expressed in microseconds (usec). A disk is considered sufficient when the value of the 99.00th percentile is below 10000usec (10ms).

Be careful though, this benchmark is for etcd only and does not take into consideration external disk usage. This means that a value slightly under 10ms should be taken with precaution as other workloads will have an impact on the disks.

Warning
Warning

If the storage is very slow, the values can be expressed directly in milliseconds.

Let’s see two different examples:

[...]
  fsync/fdatasync/sync_file_range:
    sync (usec): min=251, max=1894, avg=377.78, stdev=69.89
    sync percentiles (usec):
     |  1.00th=[  273],  5.00th=[  285], 10.00th=[  297], 20.00th=[  330],
     | 30.00th=[  343], 40.00th=[  355], 50.00th=[  367], 60.00th=[  379],
     | 70.00th=[  400], 80.00th=[  424], 90.00th=[  465], 95.00th=[  506],
     | 99.00th=[  594], 99.50th=[  635], 99.90th=[  725], 99.95th=[  742], 1
     | 99.99th=[ 1188]
[...]

1

Here we get a value of 594usec (0.5ms) so the storage meets the requirements.

[...]
  fsync/fdatasync/sync_file_range:
    sync (msec): min=10, max=124, avg=17.62, stdev= 3.38
    sync percentiles (usec):
     |  1.00th=[11731],  5.00th=[11994], 10.00th=[12911], 20.00th=[16712],
     | 30.00th=[17695], 40.00th=[17695], 50.00th=[17695], 60.00th=[17957],
     | 70.00th=[17957], 80.00th=[17957], 90.00th=[19530], 95.00th=[22676],
     | 99.00th=[28705], 99.50th=[30016], 99.90th=[41681], 99.95th=[59507], 1
     | 99.99th=[89654]
[...]

1

Here we get a value of 28705usec (28ms) so the storage clearly does not meet the requirements.

1.4 Networking

The management workstation needs at least the following networking permissions:

  • SSH access to all machines in the cluster

  • Access to the apiserver (the load balancer should expose it, port 6443), that will in turn talk to any master in the cluster

  • Access to Dex on the configured NodePort (the load balancer should expose it, port 32000) so when the OIDC token has expired, kubectl can request a new token using the refresh token

Important
Important

It is good security practice not to expose the kubernetes API server on the public internet. Use network firewalls that only allow access from trusted subnets.

1.4.1 Sub-Network Sizing

Important
Important

The service subnet and pod subnet must not overlap.

Please plan generously for workload and the expected size of the networks before bootstrapping.

The default pod subnet is 10.244.0.0/16. It allows for 65536 IP addresses overall. Assignment of CIDR’s is by default /24 (254 usable IP addresses per node).

The default node allocation of /24 means a hard cluster node limit of 256 since this is the number of /24 ranges that fit in a /16 range.

Depending on the size of the nodes that you are planning to use (in terms of resources), or on the number of nodes you are planning to have, the CIDR can be adjusted to be bigger on a per node basis but the cluster would accommodate less nodes overall.

If you are planning to use more or less pods per node or have a higher number of nodes, you can adjust these settings to match your requirements. Please make sure that the networks are suitably sized to adjust to future changes in the cluster.

You can also adjust the service subnet size, this subnet must not overlap with the pod CIDR, and it should be big enough to accommodate all services.

For more advanced network requirements please refer to: https://docs.cilium.io/en/v1.6/concepts/ipam/#address-management

1.4.2 Ports

NodePortProtocolAccessibilityDescription

All nodes

22

TCP

Internal

SSH (required in public clouds)

4240

TCP

Internal

Cilium health check

8472

UDP

Internal

Cilium VXLAN

10250

TCP

Internal

Kubelet (API server → kubelet communication)

10256

TCP

Internal

kube-proxy health check

30000 - 32767

TCP + UDP

Internal

Range of ports used by Kubernetes when allocating services of type NodePort

32000

TCP

External

Dex (OIDC Connect)

32001

TCP

External

Gangway (RBAC Authenticate)

Masters

2379

TCP

Internal

etcd (client communication)

2380

TCP

Internal

etcd (server-to-server traffic)

6443

TCP

Internal / External

Kubernetes API server

1.4.3 IP Addresses

Warning
Warning

Using IPv6 addresses is currently not supported.

All nodes must be assigned static IPv4 addresses, which must not be changed manually afterwards.

Important
Important

Plan carefully for required IP ranges and future scenarios as it is not possible to reconfigure the IP ranges once the deployment is complete.

1.4.4 IP Forwarding

The Kubernetes networking model requires that your nodes have IP forwarding enabled in the kernel. skuba checks this value when installing your cluster and installs a rule in /etc/sysctl.d/90-skuba-net-ipv4-ip-forward.conf to make it persistent.

Other software can potentially install rules with higher priority overriding this value and causing machines to not behave as expected after rebooting.

You can manually check if this is enabled using the following command:

# sysctl net.ipv4.ip_forward

net.ipv4.ip_forward = 1

net.ipv4.ip_forward must be set to 1. Additionally, you can check in what order persisted rules are processed by running sysctl --system -a.

1.4.5 Networking Whitelist

Besides the SUSE provided packages and containers, SUSE CaaS Platform is typically used with third party provided containers and charts.

The following SUSE provided resources must be available:

URLNamePurpose

scc.suse.com

SUSE Customer Center

Allow registration and license activation

registry.suse.com

SUSE container registry

Provide container images

*.cloudfront.net

Cloudfront

CDN/distribution backend for registry.suse.com

kubernetes-charts.suse.com

SUSE helm charts repository

Provide helm charts

updates.suse.com

SUSE package update channel

Provide package updates

If you wish to use Upstream / Third-Party resources, please also allow the following:

URLNamePurpose

k8s.gcr.io

Google Container Registry

Provide container images

kubernetes-charts.storage.googleapis.com

Google Helm charts repository

Provide helm charts

docker.io

Docker Container Registry

Provide container images

quay.io

Red Hat Container Registry

Provide container images

Please note that not all installation scenarios will need all of these resources.

Note
Note

If you are deploying into an air gap scenario, you must ensure that the resources required from these locations are present and available on your internal mirror server.

1.4.6 Communication

Please make sure that all your Kubernetes components can communicate with each other. This might require the configuration of routing when using multiple network adapters per node.

Refer to: https://v1-17.docs.kubernetes.io/docs/setup/independent/install-kubeadm/#check-network-adapters.

Configure firewall and other network security to allow communication on the default ports required by Kubernetes: https://v1-17.docs.kubernetes.io/docs/setup/independent/install-kubeadm/#check-required-ports

1.4.7 Performance

All master nodes of the cluster must have a minimum 1Gb/s network connection to fulfill the requirements for etcd.

"1GbE is sufficient for common etcd deployments. For large etcd clusters,
a 10GbE network will reduce mean time to recovery."

https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#network

1.4.8 Security

Do not grant access to the kubeconfig file or any workstation configured with this configuration to unauthorized personnel. In the current state, full administrative access is granted to the cluster.

Authentication is done via the kubeconfig file generated during deployment. This file will grant full access to the cluster and all workloads. Apply best practices for access control to workstations configured to administer the SUSE CaaS Platform cluster.

1.4.9 Replicas

Some addon services are desired to be highly available. These services require enough cluster nodes available to run replicas of their services.

When the cluster is deployed with enough nodes for replica sizes, those service distributions will be balanced across the cluster.

For clusters deployed with a node number lower than the default replica sizes, services will still try to find a suitable node to run on. However it is likely you will see services all running on the same nodes, defeating the purpose of high availability.

You can check the deployment replica size after node bootstrap. The number of cluster nodes should be equal or greater than the DESIRED replica size.

kubectl get rs -n kube-system

After deployment, if the number of healthy nodes falls below the number required for fulfilling the replica sizing, service replicas will show in Pending state until either the unhealthy node recovers or a new node is joined to cluster.

The following describes two methods for replica management if you wish to work with a cluster below the default replica size requirement.

1.4.9.1 Update replica number

One method is to update the number of overall replicas being created by a service. Please consult the documentation of your respective service what the replica limits for proper high availability are. In case the replica number is too high for the cluster, you must increase the cluster size to provide more resources.

  1. Update deployment replica size before node joining.

    Note
    Note

    You can use the same steps to increase the replica size again if more resources become available later on.

    kubectl -n kube-system scale --replicas=<DESIRED_REPLICAS> deployment <NAME>
  2. Join new nodes.

1.4.9.2 Re-distribute replicas

When multiple replicas are running on the same pod you will want to redistribute those manually to ensure proper high availability,

  1. Find the pod for re-distribution. Check the NAME and NODE column for duplicated pods.

    kubectl -n kube-system get pod -o wide
  2. Delete duplicated pod. This will trigger another pod creation.

    kubectl -n kube-system delete pod <POD_NAME>
Print this page