SUSE Linux Enterprise High Availability Documentation|Administration Guide|Storage and data replication|Cluster Logical Volume Manager (Cluster LVM)
Applies to SUSE Linux Enterprise High Availability 15 SP7

28 Cluster Logical Volume Manager (Cluster LVM)

The term Cluster LVM indicates that LVM is being used in a cluster environment. When managing shared storage on a cluster, every node must be informed about changes to the storage subsystem. Logical Volume Manager (LVM) supports transparent management of volume groups across the whole cluster. Volume groups shared among multiple nodes can be managed using the same commands as local storage.

Important

From SUSE Linux Enterprise 15 onward, the High Availability extension uses lvmlockd instead of clvmd.

28.1 Conceptual overview

Cluster LVM is coordinated with different tools:

Logical Volume Manager (LVM)

LVM provides a virtual pool of disk space and enables flexible distribution of one logical volume over several disks.

Volume groups and logical volumes

Volume groups (VGs) and logical volumes (LVs) are basic concepts of LVM. A volume group is a storage pool of multiple physical disks. A logical volume belongs to a volume group and can be seen as an elastic volume on which you can create a file system.

LVM-activate resource agent

In a cluster environment, VGs consist of shared storage and can be used either by multiple nodes concurrently, or by one node at a time with the ability to migrate to other nodes. To protect the LVM metadata on shared storage, the cluster uses LVM-activate to manage the activation of LVs in a particular VG. LVM-activate has two LV activation modes:

  • In shared mode, LVs can be active on multiple nodes concurrently. Locking is managed by lvmlockd. This mode is useful for cluster file systems like GFS2, for example.

  • In exclusive mode, LVs can only be active on one node at a time. Exclusive access to the LV is managed either by lvmlockd or by using the system_id. This mode is useful for virtual machine disks or for local file systems like ext4, for example.

Important

lvmlockd with sanlock is not officially supported.

Distributed Lock Manager (DLM)

DLM coordinates cluster-wide locking. It is required when using lvmlockd.

For more information
  • crm ra info LVM-activate

  • man lvmlockd

  • man lvmsystemid

28.2 Requirements

  • A shared storage device is available, such as Fibre Channel, FCoE, SCSI, iSCSI SAN, or NVMe-oF, for example. Volume groups must have at least one disk, but typically have multiple disks.

  • Make sure the following packages are installed: lvm2 and lvm2-lockd.

  • From SUSE Linux Enterprise 15 onward, the High Availability extension uses lvmlockd instead of clvmd. Make sure the clvmd daemon is not running, otherwise lvmlockd will fail to start.

28.3 Configuring Cluster LVM with lvmlockd

You can configure Cluster LVM in one of the following ways, depending on your storage and cluster setup:

28.3.1 Configuring Cluster LVM in shared mode

Perform the following steps on one node to configure a shared VG for an active/active cluster:

Procedure 28.1: Creating a DLM resource
  1. Start a shell and log in as root.

  2. Check the current configuration of the cluster resources:

    # crm configure show
  3. If you have already configured a DLM resource (and a corresponding base group and base clone), continue with Procedure 28.2, “Creating an lvmlockd resource”.

    Otherwise, configure a DLM resource and a corresponding base group and base clone as described in Procedure 24.1, “Configuring a base group for DLM”.

Procedure 28.2: Creating an lvmlockd resource
  1. Start a shell and log in as root.

  2. Run the following command to see the usage of this resource:

    # crm configure ra info lvmlockd
  3. Configure an lvmlockd resource as follows:

    # crm configure primitive lvmlockd lvmlockd \
        op start timeout="90" \
        op stop timeout="100" \
        op monitor interval="30" timeout="90"
  4. To ensure the lvmlockd resource is started on every node, add the primitive resource to the base group for storage you have created in Procedure 28.1, “Creating a DLM resource”:

    # crm configure modgroup g-storage add lvmlockd
  5. Review your changes:

    # crm configure show
  6. Check the status of the resources:

    # crm status
Procedure 28.3: Creating a shared VG and LV
  1. Start a shell and log in as root.

  2. Assuming you already have two shared disks, create a shared VG with them:

    # vgcreate --shared vg1 /dev/disk/by-id/DEVICE_ID1 /dev/disk/by-id/DEVICE_ID2
  3. Create an LV and do not activate it initially:

    # lvcreate --activate n --size 10G --name lv1 vg1
Procedure 28.4: Creating an LVM-activate resource
  1. Start a shell and log in as root.

  2. Run the following command to see the usage of this resource:

    # crm configure ra info LVM-activate
  3. Configure a resource to manage the activation of your VG. For shared mode managed by lvmlockd, you must specify vg_access_mode=lvmlockd and activation_mode=shared:

    # crm configure primitive vg1 LVM-activate \
        params vgname=vg1 vg_access_mode=lvmlockd activation_mode=shared \
        op start timeout=90s interval=0 \
        op stop timeout=90s interval=0 \
        op monitor interval=30s timeout=90s
  4. Make sure the VG can only be activated on nodes where the DLM and lvmlockd resources are already running:

    • One VG:

      Because this VG is active on multiple nodes, you can add it to the cloned g-storage group, which already has internal colocation and order constraints:

      # crm configure modgroup g-storage add vg1
    • Multiple VGs:

      Do not add multiple VGs to the group, because this creates a dependency between the VGs. For multiple VGs, clone the resources and add constraints to the clones:

      # crm configure clone cl-vg1 vg1 meta interleave=true
      # crm configure clone cl-vg2 vg2 meta interleave=true
      # crm configure colocation col-vg-with-dlm inf: ( cl-vg1 cl-vg2 ) cl-storage
      # crm configure order o-dlm-before-vg Mandatory: cl-storage ( cl-vg1 cl-vg2 )
  5. Check the status of the resources:

    # crm status

28.3.2 Configuring Cluster LVM in exclusive mode

Perform the following steps on the active node to configure a local VG for an active/passive cluster:

Procedure 28.5: Creating a DLM resource
  1. Start a shell and log in as root.

  2. Check the current configuration of the cluster resources:

    # crm configure show
  3. If you have already configured a DLM resource (and a corresponding base group and base clone), continue with Procedure 28.2, “Creating an lvmlockd resource”.

    Otherwise, configure a DLM resource and a corresponding base group and base clone as described in Procedure 24.1, “Configuring a base group for DLM”.

Procedure 28.6: Creating an lvmlockd resource
  1. Start a shell and log in as root.

  2. Run the following command to see the usage of this resource:

    # crm configure ra info lvmlockd
  3. Configure an lvmlockd resource as follows:

    # crm configure primitive lvmlockd lvmlockd \
        op start timeout="90" \
        op stop timeout="100" \
        op monitor interval="30" timeout="90"
  4. To ensure the lvmlockd resource is started on every node, add the primitive resource to the base group for storage you have created in Procedure 28.1, “Creating a DLM resource”:

    # crm configure modgroup g-storage add lvmlockd
  5. Review your changes:

    # crm configure show
  6. Check the status of the resources:

    # crm status
Procedure 28.7: Creating a local VG and LV on shared storage
  1. Start a shell and log in as root.

  2. Assuming you already have two shared disks, create a local VG with them:

    # vgcreate vg1 /dev/disk/by-id/DEVICE_ID1 /dev/disk/by-id/DEVICE_ID2
  3. Create an LV and do not activate it initially:

    # lvcreate --activate n --size 10G --name lv1 vg1
Procedure 28.8: Creating an LVM-activate resource
  1. Start a shell and log in as root.

  2. Run the following command to see the usage of this resource:

    # crm configure ra info LVM-activate
  3. Configure a resource to manage the activation of your VG. For exclusive mode managed by lvmlockd, you must specify vg_access_mode=lvmlockd:

    # crm configure primitive vg1 LVM-activate \
        params vgname=vg1 vg_access_mode=lvmlockd \
        op start timeout=90s interval=0 \
        op stop timeout=90s interval=0 \
        op monitor interval=30s timeout=90s

    Exclusive mode is the default setting, so you don't need to specify activation_mode=exclusive.

  4. Make sure the VG can only be activated on nodes where the DLM and lvmlockd resources are already running:

    • One VG:

      Because this VG is only active on a single node, do not add it to the cloned g-storage group. Instead, add constraints directly to the resource:

      # crm configure colocation col-vg-with-dlm inf: vg1 cl-storage
      # crm configure order o-dlm-before-vg Mandatory: cl-storage vg1
    • Multiple VGs:

      For multiple VGs, you can add constraints to multiple resources at once:

      # crm configure colocation col-vg-with-dlm inf: ( vg1 vg2 ) cl-storage
      # crm configure order o-dlm-before-vg Mandatory: cl-storage ( vg1 vg2 )
  5. Check the status of the resources:

    # crm status

28.4 Configuring LVM with system_id

Perform the following steps on the active node to configure a local VG for an active/passive cluster:

Procedure 28.9: Configuring LVM to use system_id
  1. Start a shell and log in as root.

  2. Open the /etc/lvm/lvm.conf file.

  3. Make sure the use_lvmlockd line is commented out or set to 0.

  4. Uncomment the system_id_source line and change the value to uname:

    system_id_source = "uname"

    This means that the VG gets its system_id from the host name (uname -n) of whichever node the VG is currently active on. If the VG moves to another node, the system_id changes to the host name of the new node.

  5. Save and exit the file.

  6. Copy the updated configuration to all nodes:

    # crm cluster copy /etc/lvm/lvm.conf
Procedure 28.10: Creating a local VG and LV on shared storage
  1. Start a shell and log in as root.

  2. Assuming you already have two shared disks, create a local VG with them:

    # vgcreate vg1 /dev/disk/by-id/DEVICE_ID1 /dev/disk/by-id/DEVICE_ID2
  3. Create an LV and do not activate it initially:

    # lvcreate --activate n --size 10G --name lv1 vg1
  4. Reboot the other cluster nodes to refresh their LVM metadata. When LVM metadata is not protected by lvmlockd, the other nodes might still use the old disk layout cached in their memory and thus remain unaware that the on-disk metadata has changed.

    Tip
    Tip: Refreshing LVM metadata without rebooting the nodes

    If you can't reboot the other nodes immediately, you can run the following commands on each node to help refresh their LVM metadata. However, we still recommend rebooting the nodes as soon as you can.

    1. Rescan the physical volumes:

      # pvscan --cache
    2. Rescan the volume groups:

      # vgscan
    3. Refresh the status of the logical volumes:

      # lvscan
Procedure 28.11: Creating an LVM-activate resource
  1. Start a shell and log in as root.

  2. Run the following command to see the usage of this resource:

    # crm configure ra info LVM-activate
  3. Configure a resource to manage the activation of your VG. For exclusive mode managed by the system_id, you must specify vg_access_mode=system_id:

    # crm configure primitive vg1 LVM-activate \
        params vgname=vg1 vg_access_mode=system_id \
        op start timeout=90s interval=0 \
        op stop timeout=90s interval=0 \
        op monitor interval=30s timeout=90s

    Exclusive mode is the default setting, so you don't need to specify activation_mode=exclusive.

  4. Check the status of the resource:

    # crm status

28.5 Scenario: Cluster LVM with iSCSI on SANs

The following scenario uses two SAN boxes, which export their iSCSI targets to several clients. The general idea is displayed in Figure 28.1, “Setup of a shared disk with Cluster LVM”.

This diagram shows an LV on top of a VG on top of a PV. The PV is then connected to two separate iSCSI instances from SAN 1 and SAN 2, respectively.
Figure 28.1: Setup of a shared disk with Cluster LVM
Warning
Warning: Data loss

The following procedures will destroy any data on your disks.

Configure only one SAN box first. Each SAN box needs to export its own iSCSI target. Proceed as follows:

Procedure 28.12: Configuring iSCSI targets (SAN)
  1. Run YaST and click Network Services › iSCSI LIO Target to start the iSCSI Server module.

  2. If you want to start the iSCSI target whenever your computer is booted, choose When Booting, otherwise choose Manually.

  3. If you have a firewall running, enable Open Port in Firewall.

  4. Switch to the Global tab. If you need authentication, enable incoming or outgoing authentication or both. In this example, we select No Authentication.

  5. Add a new iSCSI target:

    1. Switch to the Targets tab.

    2. Click Add.

    3. Enter a target name. The name needs to be formatted like this:

      iqn.DATE.DOMAIN

      For more information about the format, refer to Section 3.2.6.3.1. Type "iqn." (iSCSI Qualified Name) at https://www.ietf.org/rfc/rfc3720.txt.

    4. If you want a more descriptive name, you can change it if your identifier is unique for your different targets.

    5. Click Add.

    6. Enter the device name in Path and use a Scsiid.

    7. Click Next twice.

  6. Confirm the warning box with Yes.

  7. Open the configuration file /etc/iscsi/iscsid.conf and change the parameter node.startup to automatic.

Now set up your iSCSI initiators as follows:

Procedure 28.13: Configuring iSCSI initiators
  1. Run YaST and click Network Services › iSCSI Initiator.

  2. If you want to start the iSCSI initiator whenever your computer is booted, choose When Booting, otherwise set Manually.

  3. Change to the Discovery tab and click the Discovery button.

  4. Add the IP address and the port of your iSCSI target (see Procedure 28.12, “Configuring iSCSI targets (SAN)”). Normally, you can leave the port as it is and use the default value.

  5. If you use authentication, insert the incoming and outgoing user name and password, otherwise activate No Authentication.

  6. Select Next. The found connections are displayed in the list.

  7. Proceed with Finish.

  8. Open a shell, log in as root.

  9. Test if the iSCSI initiator has been started successfully:

    # iscsiadm -m discovery -t st -p 192.168.3.100
    192.168.3.100:3260,1 iqn.2010-03.de.jupiter:san1
  10. Establish a session:

    # iscsiadm -m node -l -p 192.168.3.100 -T iqn.2010-03.de.jupiter:san1
    Logging in to [iface: default, target: iqn.2010-03.de.jupiter:san1, portal: 192.168.3.100,3260]
    Login to [iface: default, target: iqn.2010-03.de.jupiter:san1, portal: 192.168.3.100,3260]: successful

    See the device names with lsscsi:

    ...
    [4:0:0:2]    disk    IET      ...     0     /dev/sdd
    [5:0:0:1]    disk    IET      ...     0     /dev/sde

    Look for entries with IET in their third column. In this case, the devices are /dev/sdd and /dev/sde.

Procedure 28.14: Creating the shared volume groups
  1. Open a root shell on one of the nodes you have run the iSCSI initiator from Procedure 28.13, “Configuring iSCSI initiators”.

  2. Create the shared volume group on disks /dev/sdd and /dev/sde, using their stable device names (for example, in /dev/disk/by-id/):

    # vgcreate --shared testvg /dev/disk/by-id/DEVICE_ID /dev/disk/by-id/DEVICE_ID
  3. Create logical volumes as needed:

    # lvcreate --name lv1 --size 500M testvg
  4. Check the volume group with vgdisplay:

      --- Volume group ---
          VG Name               testvg
          System ID
          Format                lvm2
          Metadata Areas        2
          Metadata Sequence No  1
          VG Access             read/write
          VG Status             resizable
          MAX LV                0
          Cur LV                0
          Open LV               0
          Max PV                0
          Cur PV                2
          Act PV                2
          VG Size               1016,00 MB
          PE Size               4,00 MB
          Total PE              254
          Alloc PE / Size       0 / 0
          Free  PE / Size       254 / 1016,00 MB
          VG UUID               UCyWw8-2jqV-enuT-KH4d-NXQI-JhH3-J24anD
  5. Check the shared state of the volume group with the command vgs:

    # vgs
      VG       #PV #LV #SN Attr   VSize     VFree
      vgshared   1   1   0 wz--ns 1016.00m  1016.00m

    The Attr column shows the volume attributes. In this example, the volume group is writable (w), resizable (z), the allocation policy is normal (n), and it is a shared resource (s). See the man page of vgs for details.

After you have created the volumes and started your resources you should have new device names under /dev/testvg, for example /dev/testvg/lv1. This indicates the LV has been activated for use.

28.6 Configuring eligible LVM devices explicitly

When several devices seemingly share the same physical volume signature (as can be the case for multipath devices or DRBD), we recommend to explicitly configure the devices which LVM scans for PVs.

For example, if the command vgcreate uses the physical device instead of using the mirrored block device, DRBD will be confused. This may result in a split-brain condition for DRBD.

To deactivate a single device for LVM, do the following:

  1. Edit the file /etc/lvm/lvm.conf and search for the line starting with filter.

  2. The patterns there are handled as regular expressions. A leading a means to accept a device pattern to the scan, a leading r rejects the devices that follow the device pattern.

  3. To remove a device named /dev/sdb1, add the following expression to the filter rule:

    "r|^/dev/sdb1$|"

    The complete filter line will look like the following:

    filter = [ "r|^/dev/sdb1$|", "r|/dev/.*/by-path/.*|", "r|/dev/.*/by-id/.*|", "a/.*/" ]

    A filter line that accepts DRBD and MPIO devices but rejects all other devices would look like this:

    filter = [ "a|/dev/drbd.*|", "a|/dev/.*/by-id/dm-uuid-mpath-.*|", "r/.*/" ]
  4. Write the configuration file and copy it to all cluster nodes.

28.7 Online migration from mirror LV to cluster MD

Starting with SUSE Linux Enterprise High Availability 15, cmirrord in Cluster LVM is deprecated. We highly recommend to migrate the mirror logical volumes in your cluster to cluster MD. Cluster MD stands for cluster multi-device and is a software-based RAID storage solution for a cluster.

28.7.1 Example setup before migration

Let us assume you have the following example setup:

  • You have a two-node cluster consisting of the nodes alice and bob.

  • A mirror logical volume named test-lv was created from a volume group named cluster-vg2.

  • The volume group cluster-vg2 is composed of the disks /dev/vdb and /dev/vdc.

# lsblk
NAME                                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                                   253:0    0   40G  0 disk
├─vda1                                253:1    0    4G  0 part [SWAP]
└─vda2                                253:2    0   36G  0 part /
vdb                                   253:16   0   20G  0 disk
├─cluster--vg2-test--lv_mlog_mimage_0 254:0    0    4M  0 lvm
│ └─cluster--vg2-test--lv_mlog        254:2    0    4M  0 lvm
│   └─cluster--vg2-test--lv           254:5    0   12G  0 lvm
└─cluster--vg2-test--lv_mimage_0      254:3    0   12G  0 lvm
  └─cluster--vg2-test--lv             254:5    0   12G  0 lvm
vdc                                   253:32   0   20G  0 disk
├─cluster--vg2-test--lv_mlog_mimage_1 254:1    0    4M  0 lvm
│ └─cluster--vg2-test--lv_mlog        254:2    0    4M  0 lvm
│   └─cluster--vg2-test--lv           254:5    0   12G  0 lvm
└─cluster--vg2-test--lv_mimage_1      254:4    0   12G  0 lvm
  └─cluster--vg2-test--lv             254:5    0   12G  0 lvm
Important
Important: Avoiding migration failures

Before you start the migration procedure, check the capacity and degree of utilization of your logical and physical volumes. If the logical volume uses 100% of the physical volume capacity, the migration might fail with an insufficient free space error on the target volume. How to prevent this migration failure depends on the options used for mirror log:

  • Is the mirror log itself mirrored (mirrored option) and allocated on the same device as the mirror leg? (For example, this might be the case if you have created the logical volume for a cmirrord setup on SUSE Linux Enterprise High Availability 11 or 12 as described in the Administration Guide for those versions.)

    By default, mdadm reserves a certain amount of space between the start of a device and the start of array data. During migration, you can check for the unused padding space and reduce it with the data-offset option as shown in Step 1.d and following.

    The data-offset must leave enough space on the device for cluster MD to write its metadata to it. However, the offset must be small enough for the remaining capacity of the device to accommodate all physical volume extents of the migrated volume. Because the volume may have spanned the complete device minus the mirror log, the offset must be smaller than the size of the mirror log.

    We recommend to set the data-offset to 128 kB. If no value is specified for the offset, its default value is 1 kB (1024 bytes).

  • Is the mirror log written to a different device (disk option) or kept in memory (core option)? Before starting the migration, either enlarge the size of the physical volume or reduce the size of the logical volume (to free more space for the physical volume).

28.7.2 Migrating a mirror LV to cluster MD

The following procedure is based on Section 28.7.1, “Example setup before migration”. Adjust the instructions to match your setup and replace the names for the LVs, VGs, disks, and the cluster MD device accordingly.

The migration does not involve any downtime. The file system can still be mounted during the migration procedure.

  1. On node alice, execute the following steps:

    1. Convert the mirror logical volume test-lv to a linear logical volume:

      # lvconvert -m0 cluster-vg2/test-lv /dev/vdc
    2. Remove the physical volume /dev/vdc from the volume group cluster-vg2:

      # vgreduce cluster-vg2 /dev/vdc
    3. Remove this physical volume from LVM:

      # pvremove /dev/vdc

      When you run lsblk now, you get:

      NAME                                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
      vda                     253:0    0   40G  0 disk
      ├─vda1                  253:1    0    4G  0 part [SWAP]
      └─vda2                  253:2    0   36G  0 part /
      vdb                     253:16   0   20G  0 disk
      └─cluster--vg2-test--lv 254:5    0   12G  0 lvm
      vdc                     253:32   0   20G  0 disk
    4. Create a cluster MD device /dev/md0 with the disk /dev/vdc:

      # mdadm --create /dev/md0 --bitmap=clustered \
           --metadata=1.2 --raid-devices=1 --force --level=mirror \
           /dev/vdc --data-offset=128

      For details on why to use the data-offset option, see Important: Avoiding migration failures.

  2. On node bob, assemble this MD device:

    # mdadm --assemble md0 /dev/vdc

    If your cluster consists of more than two nodes, execute this step on all remaining nodes in your cluster.

  3. Back on node alice:

    1. Initialize the MD device /dev/md0 as physical volume for use with LVM:

      # pvcreate /dev/md0
    2. Add the MD device /dev/md0 to the volume group cluster-vg2:

      # vgextend cluster-vg2 /dev/md0
    3. Move the data from the disk /dev/vdb to the /dev/md0 device:

      # pvmove /dev/vdb /dev/md0
    4. Remove the physical volume /dev/vdb from the volume group cluster-vg2:

      # vgreduce cluster-vg2 /dev/vdb
    5. Remove the label from the device so that LVM no longer recognizes it as physical volume:

      # pvremove /dev/vdb
    6. Add /dev/vdb to the MD device /dev/md0:

      # mdadm --grow /dev/md0 --raid-devices=2 --add /dev/vdb

28.7.3 Example setup after migration

When you run lsblk now, you get:

NAME                      MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
vda                       253:0    0   40G  0 disk
├─vda1                    253:1    0    4G  0 part  [SWAP]
└─vda2                    253:2    0   36G  0 part  /
vdb                       253:16   0   20G  0 disk
└─md0                       9:0    0   20G  0 raid1
  └─cluster--vg2-test--lv 254:5    0   12G  0 lvm
vdc                       253:32   0   20G  0 disk
└─md0                       9:0    0   20G  0 raid1
  └─cluster--vg2-test--lv 254:5    0   12G  0 lvm