25 Cluster multi-device (Cluster MD) #
The cluster multi-device (Cluster MD) is a software based RAID
storage solution for a cluster. Currently, Cluster MD provides the redundancy of
RAID1 mirroring to the cluster. With SUSE Linux Enterprise High Availability 15 SP4, RAID10 is
included as a technology preview. If you want to try RAID10, replace mirror
with 10
in the related mdadm
command.
This chapter shows you how to create and use Cluster MD.
25.1 Conceptual overview #
The Cluster MD provides support for use of RAID1 across a cluster environment. The disks or devices used by Cluster MD are accessed by each node. If one device of the Cluster MD fails, it can be replaced at runtime by another device and it is re-synced to provide the same amount of redundancy. The Cluster MD requires Corosync and Distributed Lock Manager (DLM) for co-ordination and messaging.
A Cluster MD device is not automatically started on boot like the rest of the regular MD devices. A clustered device needs to be started using resource agents to ensure the DLM resource has been started.
25.2 Creating a clustered MD RAID device #
A running cluster with pacemaker.
A resource agent for DLM (see Section 20.2, “Configuring DLM cluster resources”).
At least two shared disk devices. You can use an additional device as a spare which will fail over automatically in case of device failure.
An installed package cluster-md-kmp-default.
Always use cluster-wide persistent device names, such as
/dev/disk/by-id/DEVICE_ID
.
Unstable device names like /dev/sdX
or
/dev/dm-X
might become mismatched on different
nodes, causing major problems across the cluster.
Make sure the DLM resource is up and running on every node of the cluster and check the resource status with the command:
#
crm_resource -r dlm -W
Create the Cluster MD device:
If you do not have an existing normal RAID device, create the Cluster MD device on the node running the DLM resource with the following command:
#
mdadm --create /dev/md0 --bitmap=clustered \ --metadata=1.2 --raid-devices=2 --level=mirror \ /dev/disk/by-id/DEVICE_ID1 /dev/disk/by-id/DEVICE_ID2
As Cluster MD only works with version 1.2 of the metadata, it is recommended to specify the version using the
--metadata
option. For other useful options, refer to the man page ofmdadm
. Monitor the progress of the re-sync in/proc/mdstat
.If you already have an existing normal RAID, first clear the existing bitmap and then create the clustered bitmap:
#
mdadm --grow /dev/mdX --bitmap=none
#
mdadm --grow /dev/mdX --bitmap=clustered
Optionally, to create a Cluster MD device with a spare device for automatic failover, run the following command on one cluster node:
#
mdadm --create /dev/md0 --bitmap=clustered --raid-devices=2 \ --level=mirror --spare-devices=1 --metadata=1.2 \ /dev/disk/by-id/DEVICE_ID1 /dev/disk/by-id/DEVICE_ID2 /dev/disk/by-id/DEVICE_ID3
Get the UUID and the related md path:
#
mdadm --detail --scan
The UUID must match the UUID stored in the superblock. For details on the UUID, refer to the
mdadm.conf
man page.Open
/etc/mdadm.conf
and add the md device name and the devices associated with it. Use the UUID from the previous step:DEVICE /dev/disk/by-id/DEVICE_ID1 /dev/disk/by-id/DEVICE_ID2 ARRAY /dev/md0 UUID=1d70f103:49740ef1:af2afce5:fcf6a489
Open Csync2's configuration file
/etc/csync2/csync2.cfg
and add/etc/mdadm.conf
:group ha_group { # ... list of files pruned ... include /etc/mdadm.conf }
25.3 Configuring a resource agent #
Configure a CRM resource as follows:
Create a
Raid1
primitive:crm(live)configure#
primitive raider Raid1 \ params raidconf="/etc/mdadm.conf" raiddev=/dev/md0 \ force_clones=true \ op monitor timeout=20s interval=10 \ op start timeout=20s interval=0 \ op stop timeout=20s interval=0
Add the
raider
resource to the base group for storage that you have created for DLM:crm(live)configure#
modgroup g-storage add raider
The
add
sub-command appends the new group member by default.If not already done, clone the
g-storage
group so that it runs on all nodes:crm(live)configure#
clone cl-storage g-storage \ meta interleave=true target-role=Started
Review your changes with
show
.If everything seems correct, submit your changes with
commit
.
25.4 Adding a device #
To add a device to an existing, active Cluster MD device, first ensure that
the device is “visible” on each node with the command
cat /proc/mdstat
.
If the device is not visible, the command will fail.
Use the following command on one cluster node:
#
mdadm --manage /dev/md0 --add /dev/disk/by-id/DEVICE_ID
The behavior of the new device added depends on the state of the Cluster MD device:
If only one of the mirrored devices is active, the new device becomes the second device of the mirrored devices and a recovery is initiated.
If both devices of the Cluster MD device are active, the new added device becomes a spare device.
25.5 Re-adding a temporarily failed device #
Quite often the failures are transient and limited to a single node. If any of the nodes encounters a failure during an I/O operation, the device will be marked as failed for the entire cluster.
This could happen, for example, because of a cable failure on one of the nodes. After correcting the problem, you can re-add the device. Only the outdated parts will be synchronized as opposed to synchronizing the entire device by adding a new one.
To re-add the device, run the following command on one cluster node:
#
mdadm --manage /dev/md0 --re-add /dev/disk/by-id/DEVICE_ID
25.6 Removing a device #
Before removing a device at runtime for replacement, do the following:
Make sure the device is failed by introspecting
/proc/mdstat
. Look for an(F)
before the device.Run the following command on one cluster node to make a device fail:
#
mdadm --manage /dev/md0 --fail /dev/disk/by-id/DEVICE_ID
Remove the failed device using the command on one cluster node:
#
mdadm --manage /dev/md0 --remove /dev/disk/by-id/DEVICE_ID
25.7 Assembling Cluster MD as normal RAID at the disaster recovery site #
In the event of disaster recovery, you might face the situation that you do not have a Pacemaker cluster stack in the infrastructure on the disaster recovery site, but applications still need to access the data on the existing Cluster MD disks, or from the backups.
You can convert a Cluster MD RAID to a normal RAID by using the --assemble
operation with the -U no-bitmap
option to change the metadata
of the RAID disks accordingly.
Find an example below of how to assemble all arrays on the data recovery site:
while read i; do NAME=`echo $i | sed 's/.*name=//'|awk '{print $1}'|sed 's/.*://'` UUID=`echo $i | sed 's/.*UUID=//'|awk '{print $1}'` mdadm -AR "/dev/md/$NAME" -u $UUID -U no-bitmap echo "NAME =" $NAME ", UUID =" $UUID ", assembled." done < <(mdadm -Es)