Disk Health Monitoring

Disk health metrics

Starting with SUSE Storage v1.11.0, disk health monitoring metrics are available for both the V1 and V2 data engines. These metrics provide insights into disk health status.

SUSE Storage collects health data every 10 minutes.
Certain virtualized or cloud environments (for example, AWS EBS) do not expose full SMART data, which results in zero values for certain attributes.
Available health attributes vary depending on disk type and hardware.
The full set of collected health data is available in the nodes.longhorn.io custom resources (CRs).

Data sources

V1 data engine: Health data is collected using the SMART monitoring tool (smartctl).
V2 data engine:
- NVMe disks: Health data is retrieved through SPDK.
- AIO disks: Health data is collected using the SMART monitoring tool (smartctl).

Health data is sourced differently depending on disk type:

V1 disks and V2 AIO disks: via SMART
V2 NVMe disks: via SPDK

Available attributes and formats vary by disk type and hardware. For details:

SMART attributes: smartmontools documentation
SPDK NVMe health data: bdev_nvme_get_controller_health_info JSON-RPC

Health attributes

The longhorn_disk_health_attribute_raw metric exposes raw attribute values with the following labels:

attribute: Name of the attribute.
attribute_id: Attribute ID, when provided by the collection method.
disk: Longhorn disk identifier.
node: Name of the node.

SMART data may not be available on all platforms, especially cloud providers. If SMART is not supported, health metrics appear as 0.

References

Related GitHub issue #12016.