This is unreleased documentation for SUSE® Storage 1.12 (Dev).

Volume Recovery

SUSE Storage provides two mechanisms for maintaining volume functionality in a variety of situations.

Automatic Workload Pod Deletion

This recovery mechanism is enabled by the setting _Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly.

When one of the following situations occurs, SUSE Storage automatically attempts to delete workload pods that are managed by a controller (for example, Deployment, StatefulSet or DaemonSet). After deletion, the controller restarts the workload pod and Kubernetes handles volume reattachment and remounting.

A volume was unexpectedly detached, because of a Kubernetes upgrade, container runtime reboot, network connectivity issue, or volume engine crash.
A volume was automatically salvaged after all replicas became faulty, possibly because of a network connectivity issue. SUSE Storage attempts to identify the usable replicas and uses them for the volume.
An error occurred on a Share Manager pod that uses an RWX volume.

To prevent SUSE Storage from automatically deleting workload pods, disable the setting _Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly on the SUSE Storage UI.

SUSE Storage does not delete pods without a controller because such pods cannot be restarted after deletion. To recover volumes that are unexpectedly detached, you must manually delete and restart the pods without a controller.

Automatic Volume Remounting

This recovery mechanism is not controlled by any specific setting.

The state of a volume can change to read-only when IO errors occur. IO errors can be caused by a variety of issues, including the following:

Network disconnection: Interrupted connection between the engine and replicas.
High disk latency: Significant delay in the transfer of data between a replica and the corresponding disk.

SUSE Storage checks the state of the volume’s global mount point every 10 seconds. When the volume’s file system changes to read-only, SUSE Storage updates the condition to the volume’s data engine. SUSE Storage then automatically attempts to remount the global mount point on the host to change the state back to read-write. Upon successful remounting, the workload pods continue functioning without disruption. However, if the mount point becomes write-protected and SUSE Storage fails to remount the mount point, you may still need to manually re-create the workload to force it reattach and remount the volume.

This mechanism might not work in some situations. For example, when the volume’s data engine crashes, SUSE Storage automatically detaches and reattaches the volume. The file system changes to read-only in this case. SUSE Storage detects the read-only mode and update the state, but Automatic Volume Remounting cannot change it back to read_write because the device is now write_protected. In this case, you can only rely on the Automatic Workload Pod Deletion mechanism, which enables volume remounting after the workload pod is recreated.

Summary

Automatic Workload Pod Deletion is triggered when unexpected failures happen. The controller deletes and then restarts the workload pod, and Kubernetes handles volume reattachment and remounting. The process may cause interruptions to the workload. To prevent SUSE Storage from automatically deleting workload pods, disable the setting _Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly on the SUSE Storage UI.

Automatic Workload Pod Deletion is triggered when the volume’s file system changes to read-only. SUSE Storage remounts the global mount point on the host to change the state back to read-write.