Applies to SUSE Linux Enterprise Server 15 SP4

13 Troubleshooting software RAIDs #

Check the /proc/mdstat file to find out whether a RAID partition has been damaged. If a disk fails, replace the defective hard disk with a new one partitioned the same way. Then restart your system and enter the command mdadm /dev/mdX --add /dev/sdX. Replace X with your particular device identifiers. This integrates the hard disk automatically into the RAID system and fully reconstructs it (for all RAID levels except for RAID 0).

Although you can access all data during the rebuild, you might encounter some performance issues until the RAID has been fully rebuilt.

13.1 Recovery after failing disk is back again #

There are several reasons a disk included in a RAID array may fail. Here is a list of the most common ones:

Problems with the disk media.
Disk drive controller failure.
Broken connection to the disk.

In the case of disk media or controller failure, the device needs to be replaced or repaired. If a hot spare was not configured within the RAID, then manual intervention is required.

In the last case, the failed device can be automatically re-added with the mdadm command after the connection is repaired (which might be automatic).

Because md/mdadm cannot reliably determine what caused the disk failure, it assumes a serious disk error and treats any failed device as faulty until it is explicitly told that the device is reliable.

Under some circumstances—such as storage devices with an internal RAID array—connection problems are very often the cause of the device failure. In such case, you can tell mdadm that it is safe to automatically --re-add the device after it appears. You can do this by adding the following line to /etc/mdadm.conf:

POLICY action=re-add

Note that the device will be automatically re-added after re-appearing only if the udev rules cause mdadm -I DISK_DEVICE_NAME to be run on any device that spontaneously appears (default behavior), and if write-intent bitmaps are configured (they are by default).

If you want this policy to only apply to some devices and not to the others, then the path= option can be added to the POLICY line in /etc/mdadm.conf to restrict the non-default action to only selected devices. Wild cards can be used to identify groups of devices. See man 5 mdadm.conf for more information.