|
This is unreleased documentation for SUSE® Storage 1.12 (Dev). |
Best Practices
The following setup is recommended for production environments.
|
It is recommended to enable only one data engine (V1 or V2) per cluster. When both data engines are enabled simultaneously, each node runs separate instance-manager pods for V1 and V2, each with its own guaranteed CPU reservation. This increases CPU overhead per node (the V2 instance-manager pod alone consumes at least 1 dedicated CPU core, which is configurable, for the |
Minimum Recommended Hardware
V1 Data Engine
-
3 nodes
-
4 vCPUs per node
-
4 GiB per node
-
SSD/NVMe or similar performance block device on the node for storage (recommended)
-
HDD/Spinning Disk or similar performance block device on the node for storage (verified)
-
500/250 max IOPS per volume (1 MiB I/O)
-
500/250 max throughput per volume (MiB/s)
-
|
While SUSE Storage can function with HDDs (spinning disks) as storage, it is important to understand that latency plays a much more important role in volume stability than IOPS or throughput. This is because HDDs are mechanical, relying on spinning platters and moving read or write heads to access data. This physical movement introduces inherent delays (seek time and rotational delay), leading to much higher latency compared to the SSDs or NVMe drives, which utilize flash memory and have no moving parts. This can directly cause instability, especially when multiple input-output intensive tasks are running, such as:
The increased latency due to the use of HDDs, combined with other input-output workloads, can lead to volume instability. Therefore, we recommend SSD or NVMe drives for better performance and stability, especially for production workloads. The mentioned IOPS and throughput (500/250 max IOPS per volume and 500/250 max throughput per volume) are intended as general references based on the test setup but should not be treated as hard requirements. Latency, not just throughput, is the most important factor in ensuring system stability. |
V2 Data Engine
Along with the V1 requirements above, nodes hosting V2 volumes have these additional requirements:
-
3 nodes
-
Additional 1 CPU core per node dedicated to each V2 instance-manager pod (the
spdk_tgtprocess uses intensive polling and consumes 100% of a dedicated CPU core) -
Additional 2 GiB memory per node reserved for huge pages (
1024 × 2MiB-sized pages) -
Local NVMe SSDs are strongly recommended for V2 volumes to achieve optimal storage performance
-
Linux kernel 6.7 or later for NVMe or TCP support and better stability.
-
Required kernel modules:
vfio_pci,uio_pci_generic,nvme-tcp -
AMD64 CPUs require SSE4.2 instruction support
|
The V2 Data Engine leverages the Storage Performance Development Kit (SPDK) with user space NVMe drivers that provide zero-copy, highly parallel, direct access to SSDs. Using local NVMe disks is strongly recommended for enabling V2 volumes to achieve optimal storage performance. For the full setup guide, see V2 Data Engine Requirements. |
Operating System
|
CentOS Linux has been removed from the verified OS list below, as it has been discontinued in favor of CentOS Stream [ref], a rolling-release Linux distribution. Testing for RHEL-based downstream open-source distributions focuses on enterprise-grade versions, such as Rocky Linux and Oracle Linux. |
The following Linux OS distributions and versions have been verified during the v1.12.0 release testing. However, this does not imply that SUSE Storage exclusively supports these distributions. SUSE Storage should function well on any certified Kubernetes cluster running on Linux nodes with a wide range of general-purpose operating systems, as well as verified container-optimized operating systems like SLE Micro.
| No. | OS | Versions |
|---|---|---|
1. |
Ubuntu |
26.04 |
2. |
SUSE Linux Enterprise Server |
16.0 |
3. |
SUSE Linux Enterprise Micro |
6.1 |
4. |
Red Hat Enterprise Linux |
10.1 |
5. |
Oracle Linux |
10.0 |
6. |
Rocky Linux |
10.1 |
7. |
Talos Linux |
1.11.5 |
8. |
Container-Optimized OS (GKE) |
125 |
SUSE Storage relies heavily on kernel functionality and performs better on certain kernel versions. The following activities, in particular, benefit from usage of specific kernel versions.
-
Optimizing or improving the file system: Use a kernel with version
v5.8or later. See Issue #2507 for details. -
Enabling the Freeze Filesystem for Snapshot setting: Use a kernel with version
5.17or later to ensure that a volume crash during a file system freeze cannot lock up a node. -
Enabling the V2 Data Engine: Use a kernel with version
5.19or later to ensure NVMe or TCP support. Use a kernel with version6.7or later for improved stability (avoids potential memory corruption from SPDK upstream issue #3116).
The list below contains known broken kernel versions that users should avoid using:
| No. | Version | Distro | Additional Context |
|---|---|---|---|
1. |
6.5.6 |
Vanilla kernel |
Related to this bug https://longhorn.io/kb/troubleshooting-rwx-volume-fails-to-attached-caused-by-protocol-not-supported/ |
2. |
5.15.0-94 |
Ubuntu |
Related to this bug https://longhorn.io/kb/troubleshooting-rwx-volume-fails-to-attached-caused-by-protocol-not-supported/ |
3. |
6.5.0-21 |
Ubuntu |
Related to this bug https://longhorn.io/kb/troubleshooting-rwx-volume-fails-to-attached-caused-by-protocol-not-supported/ |
4. |
6.5.0-1014-aws |
Ubuntu |
Related to this bug https://longhorn.io/kb/troubleshooting-rwx-volume-fails-to-attached-caused-by-protocol-not-supported/ |
Kubernetes
Kubernetes Version
Ensure that your cluster is running Kubernetes v1.21 or later before upgrading SUSE Storage.
We recommend running your Kubernetes cluster on one of the following versions. These versions are the active supported versions before the SUSE Storage release, and have been tested with SUSE Storage v1.12.0.
| Release | Released | End-of-life |
|---|---|---|
1.36 |
22 Apr 2026 |
28 Jun 2027 |
1.35 |
17 Dec 2025 |
28 Feb 2027 |
1.34 |
27 Aug 2025 |
27 Oct 2026 |
1.33 |
23 Apr 2025 |
28 Jun 2026 |
Referenced to https://endoflife.date/kubernetes.
Nodes and Disk Setup
The following setup is recommended for nodes and disks.
Use a Dedicated Disk
It is recommended to dedicate a disk for SUSE Storage storage for production, instead of using the root disk.
Minimal Available Storage and Over-provisioning
If you need to use the root disk, use the default minimal available storage percentage setup which is 25%, and set overprovisioning percentage to 100% to minimize the chance of DiskPressure.
If you are using a dedicated disk for SUSE Storage, you can lower the setting minimal available storage percentage to 10%.
For the Over-provisioning percentage, it depends on how much space your volume uses on average. For example, if your workload only uses half of the available volume size, you can set the Over-provisioning percentage to 200, which means SUSE Storage considers the disk to have twice the schedulable size as its full size minus the reserved space.
Disk Space Management
Since SUSE Storage does not currently support sharding between the different disks, we recommend using LVM to aggregate all the disks for SUSE Storage into a single partition, so it can be easily extended in the future.
Setting up Extra Disks
Any extra disks must be written in the /etc/fstab file to allow automatic mounting after the machine reboots.
Do not use a symbolic link for the extra disks. Use mount --bind instead of ln -s and make sure it is in the fstab file. For details, see the section about multiple disk support.
V2 Data Engine: Block-Type Disks
Unlike the V1 Data Engine which uses filesystem-type disks, the V2 Data Engine stores volume data on block-type disks. The following best practices apply to V2 block-type disk setup:
-
Use local NVMe disks: SPDK is equipped with a user space NVMe driver that provides zero-copy, highly parallel, direct access to SSDs. Using local NVMe disks is strongly recommended for V2 volumes.
-
Ensure disk is clean before adding: Starting with v1.11.0, SUSE Storage prevents adding block disks that contain an existing file system or partition table. Run
wipefs -a /path/to/block/devicebefore adding a disk. -
IOMMU group isolation: For SPDK to claim a disk via
vfio-pci, the NVMe device must be in an isolatable IOMMU group. If the device shares an IOMMU group with a PCIe bridge, it cannot be used with the SPDK NVMe driver and must be used in AIO mode instead. -
Huge pages configuration: Ensure 2 GiB of 2 MiB-sized huge pages (1024 pages) are configured persistently on each V2 node via kernel boot parameters. See Enable HugePages for detailed instructions.
-
Kernel modules: Ensure
vfio_pci,uio_pci_generic, andnvme-tcpmodules are loaded and configured to load automatically at boot. See Load Kernel Modules.
For the complete V2 Data Engine setup guide, see V2 Data Engine Requirements.
Configuring Default Disks Before and After Installation
To use a directory other than the default /var/lib/longhorn for storage, the Default Data Path setting can be changed before installing the system. For details on changing pre-installation settings, refer to this section.
The Default node/disk configuration feature can be used to customize the default disk after installation. Customizing the default configurations for disks and nodes is useful for scaling the cluster because it eliminates the need to configure SUSE Storage manually for each new node if the node contains more than one disk, or if the disk configuration is different for new nodes. Remember to enable Create default disk only on labeled node if applicable.
Volumes Performance Optimization
Before configuring workloads, ensure that you have set up the following basic requirements for optimal volume performance.
-
SATA/NVMe SSDs or disk drives with similar performance
-
10 Gbps network bandwidth between nodes
-
Dedicated Priority Class for system-managed and user-deployed SUSE Storage components. By default, SUSE Storage installs the default Priority Class
longhorn-critical.
The following sections outline other recommendations for production environments.
IO Performance
-
Storage network: Use a dedicated storage network to improve IO performance and stability.
-
SUSE Storage disk: Use a dedicated disk for SUSE Storage storage instead of using the root disk.
-
Replica count: Set the default replica count to "2" to achieve data availability with better disk space usage or less impact to system performance. This practice is especially beneficial to data-intensive applications.
-
Storage tag: Use storage tags to define storage tiering for data-intensive applications. For example, only high-performance disks can be used for storing performance-sensitive data.
-
Data locality: Use
best-effortas the default data locality of SUSE Storage StorageClasses.For applications that support data replication (for example, a distributed database), you can use the
strict-localoption to ensure that only one replica is created for each volume. This practice prevents the extra disk space usage and IO performance overhead associated with volume replication.For data-intensive applications, you can use pod scheduling functions such as node selector or taint toleration. These functions allow you to schedule the workload to a specific storage-tagged node together with one replica.
Space Efficiency
-
Recurring snapshots: Periodically clean up system-generated snapshots and retain only the number of snapshots that makes sense for your implementation.
For applications with replication capability, periodically delete all types of snapshots.
-
Recurring filesystem trim: Periodically trim the filesystem inside volumes to reclaim disk space.
-
Snapshot space management: Configure global and volume-specific settings to prevent unexpected disk space exhaustion.
Disaster Recovery
-
Recurring backups: Create recurring backup jobs for mission-critical application volumes.
-
System backup: Create periodic system backups.
Deploying Workloads
If you are using ext4 as the file system of the volume, we recommend adding a liveness check to workloads to help automatically recover from a network-caused interruption, a node reboot, or a Docker restart. See this section for details.
Volumes Maintenance
Using SUSE Storage’s built-in backup feature is highly recommended. You can save backups to an object store such as S3 or to an NFS server. Saving to an object store is preferable because it generally offers better reliability. Another advantage is that you do not need to mount and unmount the target, which can complicate failover and upgrades.
For each volume, schedule at least one recurring backup. If you must run SUSE Storage in production without a backupstore, then schedule at least one recurring snapshot for each volume.
SUSE Storage create snapshots automatically when rebuilding a replica. Recurring snapshots or backups can also automatically clean up the system-generated snapshot.
Guaranteed Instance Manager CPU
We recommend setting the CPU request for SUSE Storage instance manager pods.
V1 Data Engine
The Guaranteed Instance Manager CPU setting allows you to reserve a percentage of the total allocatable CPU resources on each node for each instance manager pod when the V1 Data Engine is enabled. The default value is 12.
Set a specific milli-CPU value for the instance manager pods on a particular node by updating the Instance Manager CPU Request field for that node.
|
This field overwrites the above setting for the specified node. |
Refer to Guaranteed Instance Manager CPU for more details.
V2 Data Engine
The Guaranteed Instance Manager CPU setting allows you to reserve a percentage of the total allocatable CPU resources on each node for each instance manager pod when the V2 Data Engine is enabled. This reservation applies to the entire V2 instance manager pod.
The primary CPU consumer in the pod is the Storage Performance Development Kit (SPDK) target daemon (spdk_tgt). By default, spdk_tgt typically uses 1 dedicated CPU core in polling mode. The Data Engine CPU Mask setting controls which CPU cores spdk_tgt runs on.
Reserving sufficient CPU is essential for maintaining engine and replica stability, especially during periods of high node workload. The default value of the Guaranteed Instance Manager CPU setting is 12%.
|
When both V1 and V2 Data Engines are enabled on the same node, separate instance-manager pods are created for each engine. Plan CPU resources accordingly to ensure each instance-manager pod has sufficient CPU allocation. |
StorageClass
Avoid modifying the default StorageClass named longhorn. Changing its parameters can cause issues during future upgrades. To change the parameters set in the StorageClass, you can create a new StorageClass by referring to the StorageClass examples.
Scheduling Settings
Replica Node Level Soft Anti-Affinity
Recommendation: false
This setting should be set to false in production environment to ensure the best availability of the volume. Otherwise, one node down event may bring down more than one replicas of a volume.
Allow Volumes Creation with Degraded Availability
Recommendation: false
Disable this setting (false) in production environments to ensure maximum volume availability upon creation. When enabled (true), volume creation succeeds even if the system can only schedule one replica. This creates a risk where the cluster runs out of space without notifying the user immediately.
Replica Auto Balance
Recommendation: least-effort
For production environments, we recommend setting Replica Auto Balance to least-effort. This setting ensures that at least one replica is placed on a different node in each zone, providing extra high availability (HA).
In certain edge cases, you might consider using the best-effort, which continuously attempts to evenly distribute replicas across nodes and zones. However, this setting can lead to frequent rebuilds if the cluster is unstable.
For most users, having multiple replicas without Replica Auto Balance setting is sufficient to achieve basic HA, especially if you prefer to avoid excessive rebuilds and resource usage.