13 Toubleshooting software RAIDs #
Check the /proc/mdstat
file to find out whether a RAID
partition has been damaged. If a disk fails,
replace the defective hard disk with a new one partitioned the same way.
Then restart your system and enter the command mdadm /dev/mdX --add
/dev/sdX
. Replace X
with your particular device
identifiers. This integrates the hard disk automatically into the RAID
system and fully reconstructs it (for all RAID levels except for
RAID 0).
Although you can access all data during the rebuild, you might encounter some performance issues until the RAID has been fully rebuilt.
13.1 Recovery after Failing Disk is Back Again #
There are several reasons a disk included in a RAID array may fail. Here is a list of the most common ones:
Problems with the disk media.
Disk drive controller failure.
Broken connection to the disk.
In the case of the disk media or controller failure, the device needs to be replaced or repaired. If a hot-spare was not configured within the RAID, then manual intervention is required.
In the last case, the failed device can be automatically re-added by the
mdadm
command after the connection is repaired (which
might be automatic).
Because md
/mdadm
cannot reliably
determine what caused the disk failure, it assumes a serious disk error and
treats any failed device as faulty until it is explicitly told that the
device is reliable.
Under some circumstances—such as storage devices with the internal
RAID array— the connection problems are very often the cause of the
device failure. In such case, you can tell mdadm
that it
is safe to automatically --re-add
the device after it
appears. You can do this by adding the following line to
/etc/mdadm.conf
:
POLICY action=re-add
Note that the device will be automatically re-added after re-appearing only
if the udev
rules cause mdadm -I
DISK_DEVICE_NAME
to be run on any
device that spontaneously appears (default behavior), and if write-intent
bitmaps are configured (they are by default).
If you want this policy to only apply to some devices and not to the
others, then the path=
option can be added to the
POLICY
line in /etc/mdadm.conf
to
restrict the non-default action to only selected devices. Wild cards can be
used to identify groups of devices. See man 5 mdadm.conf
for more information.