Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE Linux Enterprise High Availability Extension 11 SP4

16 Cluster Logical Volume Manager (cLVM)

When managing shared storage on a cluster, every node must be informed about changes that are done to the storage subsystem. The Linux Volume Manager 2 (LVM2), which is widely used to manage local storage, has been extended to support transparent management of volume groups across the whole cluster. Clustered volume groups can be managed using the same commands as local storage.

16.1 Conceptual Overview

Clustered LVM is coordinated with different tools:

Distributed Lock Manager (DLM)

Coordinates disk access for cLVM.

Logical Volume Manager2 (LVM2)

Enables flexible distribution of one file system over several disks. LVM provides a virtual pool of disk space.

Clustered Logical Volume Manager (cLVM)

Coordinates access to the LVM2 metadata so every node knows about changes. cLVM does not coordinate access to the shared data itself; to enable cLVM to do so, you must configure OCFS2 or other cluster-aware applications on top of the cLVM-managed storage.

16.2 Configuration of cLVM

Depending on your scenario it is possible to create a RAID 1 device with cLVM with the following layers:

  • LVM.  This is a very flexible solution if you want to increase or decrease your file system size, add more physical storage, or create snapshots of your file systems. This method is described in Section 16.2.3, “Scenario: cLVM With iSCSI on SANs”.

  • DRBD.  This solution only provides RAID 0 (striping) and RAID 1 (mirroring). The last method is described in Section 16.2.4, “Scenario: cLVM With DRBD”.

  • MD Devices (Linux Software RAID or mdadm).  Although this solution provides all RAID levels, it does not support clusters yet.

Although MD Devices (Linux Software RAID or mdadm) provides all RAID levels, it does not support clusters yet. Therefore it is not covered in the above list.

Make sure you have fulfilled the following prerequisites:

  • A shared storage device is available, such as provided by a Fibre Channel, FCoE, SCSI, iSCSI SAN, or DRBD*.

  • In case of DRBD, both nodes must be primary (as described in the following procedure).

  • Check if the locking type of LVM2 is cluster-aware. The keyword locking_type in /etc/lvm/lvm.conf must contain the value 3 (should be the default). Copy the configuration to all nodes, if necessary.

Note: Create Cluster Resources First

First create your cluster resources as described in Section 16.2.2, “Creating the Cluster Resources” and then your LVM volumes. Otherwise it is impossible to remove the volumes later.

16.2.1 Configuring Cmirrord

To track mirror log information in a cluster, the cmirrord daemon is used. Cluster mirrors are not possible without this daemon running.

We assume that /dev/sda and /dev/sdb are the shared storage devices. Replace these with your own device name(s), if necessary. Proceed as follows:

  1. Create a cluster with at least two nodes.

  2. Configure your cluster to run dlm, clvmd, and STONITH:

    root # crm configure
    crm(live)configure# primitive clvmd ocf:lvm2:clvmd \
            op stop interval="0" timeout="100" \
            op start interval="0" timeout="90" \
            op monitor interval="20" timeout="20"
    crm(live)configure# primitive dlm ocf:pacemaker:controld \
            op start interval="0" timeout="90" \
            op stop interval="0" timeout="100" \
            op monitor interval="60" timeout="60"
    crm(live)configure# primitive sbd_stonith stonith:external/sbd \
            params pcmk_delay_max=30
    crm(live)configure# group base-group dlm clvmd
    crm(live)configure# clone base-clone base-group \
            meta interleave="true"
  3. Leave crmsh with exit and commit your changes.

  4. Create a clustered volume group (VG):

    root # pvcreate /dev/sda /dev/sdb
    root # vgcreate -cy vg /dev/sda /dev/sdb
  5. Create a mirrored-log logical volume (LV) in your cluster:

    root # lvcreate -nlv -m1 -l10%VG vg --mirrorlog mirrored
  6. Use lvs to show the progress. If the percentage number has reached 100%, the mirrored disk is successfully synchronized.

  7. To test the clustered volume /dev/vg/lv, use the following steps:

    1. Read or write to /dev/vg/lv.

    2. Deactivate your LV with lvchange -an.

    3. Activate your LV with lvchange -ay.

    4. Use lvconvert to convert a mirrored log to a disk log.

  8. Create a mirrored-log LV in another cluster VG. This is a different volume group from the previous one.

The current cLVM can only handle one physical volume (PV) per mirror side. If one mirror is actually made up of several PVs that need to be concatenated or striped, lvcreate does not understand this. For this reason, lvcreate and cmirrord metadata needs to understand grouping of PVs into one side, effectively supporting RAID10.

To support RAID10 for cmirrord, use the following procedure (assuming that /dev/sda and /dev/sdb are the shared storage devices):

  1. Create a volume group (VG):

    root # pvcreate /dev/sda /dev/sdb
    root # vgcreate vg /dev/sda /dev/sdb
  2. Open the file /etc/lvm/lvm.conf and go to the section allocation. Set the following line and save the file:

    mirror_legs_require_separate_pvs = 1
  3. Add your tags to your PVs:

    root # pvchange --addtag @a /dev/sda
    root # pvchange --addtag @b /dev/sdb

    A tag is an unordered keyword or term assigned to the metadata of a storage object. Tagging allows you to classify collections of LVM storage objects in ways that you find useful by attaching an unordered list of tags to their metadata.

  4. List your tags:

    root # pvs -o pv_name,vg_name,pv_tags /dev/sd{a,b}

    You should receive this output:

      PV         VG     PV Tags
      /dev/sda vgtest a
      /dev/sdb vgtest b

If you need further information regarding LVM, refer to the SUSE Linux Enterprise Server 11 SP4 Storage Administration Guide, chapter LVM Configuration. It is available from http://documentation.suse.com/.

16.2.2 Creating the Cluster Resources

Preparing the cluster for use of cLVM includes the following basic steps:

Procedure 16.1: Creating a DLM Resource
Note: DLM Resource for Both cLVM and OCFS2

Both cLVM and OCFS2 need a DLM resource that runs on all nodes in the cluster and therefore is usually configured as a clone. If you have a setup that includes both OCFS2 and cLVM, configuring one DLM resource for both OCFS2 and cLVM is enough.

  1. Start a shell and log in as root.

  2. Run crm configure.

  3. Check the current configuration of the cluster resources with show.

  4. If you have already configured a DLM resource (and a corresponding base group and base clone), continue with Procedure 16.2, “Creating LVM and cLVM Resources”.

    Otherwise, configure a DLM resource and a corresponding base group and base clone as described in Procedure 14.1, “Configuring DLM and O2CB Resources”.

  5. Leave the crm live configuration with exit.

Procedure 16.2: Creating LVM and cLVM Resources
  1. Start a shell and log in as root.

  2. Run crm configure.

  3. Configure a cLVM resource as follows:

    crm(live)configure# primitive clvm ocf:lvm2:clvmd \
          params daemon_timeout="30"
  4. Configure an LVM resource for the volume group as follows:

    crm(live)configure# primitive vg1 ocf:heartbeat:LVM \
          params volgrpname="cluster-vg" \
          op monitor interval="60" timeout="60"
  5. If you want the volume group to be activated exclusively on one node, configure the LVM resource as described below and omit Step 6:

    crm(live)configure# primitive vg1 ocf:heartbeat:LVM \
          params volgrpname="cluster-vg" exclusive="yes" \
          op monitor interval="60" timeout="60"

    In this case, cLVM will protect all logical volumes within the VG from being activated on multiple nodes, as an additional measure of protection for non-clustered applications.

  6. To ensure that the cLVM and LVM resources are activated cluster-wide, add both primitives to the base group you have created in Procedure 14.1, “Configuring DLM and O2CB Resources”:

    1. Enter

      crm(live)configure# edit base-group
    2. In the vi editor that opens, modify the group as follows and save your changes:

      crm(live)configure# group base-group dlm clvm vg1 ocfs2-1
      Important: Setup Without OCFS2

      If your setup does not include OCFS2, omit the ocfs2-1 primitive from the base group. The oc2cb primitive can be configured and included in the group anyway, regardless of whether you use OCFS2 or not.

  7. Review your changes with show. To check if you have configured all needed resources, also refer to Appendix C, Example Configuration for OCFS2 and cLVM.

  8. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

16.2.3 Scenario: cLVM With iSCSI on SANs

The following scenario uses two SAN boxes which export their iSCSI targets to several clients. The general idea is displayed in Figure 16.1, “Setup of iSCSI with cLVM”.

Setup of iSCSI with cLVM
Figure 16.1: Setup of iSCSI with cLVM
Warning: Data Loss

The following procedures will destroy any data on your disks!

Configure only one SAN box first. Each SAN box needs to export its own iSCSI target. Proceed as follows:

Procedure 16.3: Configuring iSCSI Targets (SAN)
  1. Run YaST and click Network Services › iSCSI Target to start the iSCSI Server module.

  2. If you want to start the iSCSI target whenever your computer is booted, choose When Booting, otherwise choose Manually.

  3. If you have a firewall running, enable Open Port in Firewall.

  4. Switch to the Global tab. If you need authentication enable incoming or outgoing authentication or both. In this example, we select No Authentication.

  5. Add a new iSCSI target:

    1. Switch to the Targets tab.

    2. Click Add.

    3. Enter a target name. The name needs to be formatted like this:


      For more information about the format, refer to Section Type "iqn." (iSCSI Qualified Name) at http://www.ietf.org/rfc/rfc3720.txt.

    4. If you want a more descriptive name, you can change it as long as your identifier is unique for your different targets.

    5. Click Add.

    6. Enter the device name in Path and use a Scsiid.

    7. Click Next twice.

  6. Confirm the warning box with Yes.

  7. Open the configuration file /etc/iscsi/iscsi.conf and change the parameter node.startup to automatic.

Now set up your iSCSI initiators as follows:

Procedure 16.4: Configuring iSCSI Initiators
  1. Run YaST and click Network Services › iSCSI Initiator.

  2. If you want to start the iSCSI initiator whenever your computer is booted, choose When Booting, otherwise set Manually.

  3. Change to the Discovery tab and click the Discovery button.

  4. Add your IP address and your port of your iSCSI target (see Procedure 16.3, “Configuring iSCSI Targets (SAN)”). Normally, you can leave the port as it is and use the default value.

  5. If you use authentication, insert the incoming and outgoing user name and password, otherwise activate No Authentication.

  6. Select Next. The found connections are displayed in the list.

  7. To test if the iSCSI initiator has been started successfully, select one of the displayed targets and click Log In.

Procedure 16.5: Creating the LVM Volume Groups
  1. Open a root shell on one of the nodes you have run the iSCSI initiator from Procedure 16.4, “Configuring iSCSI Initiators”.

  2. Prepare the physical volume for LVM with the command pvcreate on the disks /dev/sdd and /dev/sde:

    root # pvcreate /dev/sdd
    root # pvcreate /dev/sde
  3. Create the cluster-aware volume group on both disks:

    root # vgcreate --clustered y clustervg /dev/sdd /dev/sde
  4. Create logical volumes as needed:

    root # lvcreate --name clusterlv --size 500M clustervg
  5. Check the physical volume with pvdisplay:

      --- Physical volume ---
          PV Name               /dev/sdd
          VG Name               clustervg
          PV Size               509,88 MB / not usable 1,88 MB
          Allocatable           yes
          PE Size (KByte)       4096
          Total PE              127
          Free PE               127
          Allocated PE          0
          PV UUID               52okH4-nv3z-2AUL-GhAN-8DAZ-GMtU-Xrn9Kh
          --- Physical volume ---
          PV Name               /dev/sde
          VG Name               clustervg
          PV Size               509,84 MB / not usable 1,84 MB
          Allocatable           yes
          PE Size (KByte)       4096
          Total PE              127
          Free PE               127
          Allocated PE          0
          PV UUID               Ouj3Xm-AI58-lxB1-mWm2-xn51-agM2-0UuHFC
  6. Check the volume group with vgdisplay:

      --- Volume group ---
          VG Name               clustervg
          System ID
          Format                lvm2
          Metadata Areas        2
          Metadata Sequence No  1
          VG Access             read/write
          VG Status             resizable
          Clustered             yes
          Shared                no
          MAX LV                0
          Cur LV                0
          Open LV               0
          Max PV                0
          Cur PV                2
          Act PV                2
          VG Size               1016,00 MB
          PE Size               4,00 MB
          Total PE              254
          Alloc PE / Size       0 / 0
          Free  PE / Size       254 / 1016,00 MB
          VG UUID               UCyWw8-2jqV-enuT-KH4d-NXQI-JhH3-J24anD

After you have created the volumes and started your resources you should have a new device named /dev/dm-*. It is recommended to use a clustered file system on top of your LVM resource, for example OCFS. For more information, see Chapter 14, OCFS2.

16.2.4 Scenario: cLVM With DRBD

The following scenarios can be used if you have data centers located in different parts of your city, country, or continent.

Procedure 16.6: Creating a Cluster-Aware Volume Group With DRBD
  1. Create a primary/primary DRBD resource:

    1. First, set up a DRBD device as primary/secondary as described in Procedure 15.1, “Manually Configuring DRBD”. Make sure the disk state is up-to-date on both nodes. Check this with cat /proc/drbd or with rcdrbd status.

    2. Add the following options to your configuration file (usually something like /etc/drbd.d/r0.res):

      resource r0 {
        startup {
          become-primary-on both;
        net {
    3. Copy the changed configuration file to the other node, for example:

      root # scp /etc/drbd.d/r0.res venus:/etc/drbd.d/
    4. Run the following commands on both nodes:

      root # drbdadm disconnect r0
      root # drbdadm connect r0
      root # drbdadm primary r0
    5. Check the status of your nodes:

      root # cat /proc/drbd
       0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
  2. Include the clvmd resource as a clone in the pacemaker configuration, and make it depend on the DLM clone resource. See Procedure 16.1, “Creating a DLM Resource” for detailed instructions. Before proceeding, confirm that these resources have started successfully on your cluster. You may use crm_mon or the Web interface to check the running services.

  3. Prepare the physical volume for LVM with the command pvcreate. For example, on the device /dev/drbd_r0 the command would look like this:

    root # pvcreate /dev/drbd_r0
  4. Create a cluster-aware volume group:

    root # vgcreate --clustered y myclusterfs /dev/drbd_r0
  5. Create logical volumes as needed. You may probably want to change the size of the logical volume. For example, create a 4 GB logical volume with the following command:

    root # lvcreate --name testlv -L 4G myclusterfs
  6. The logical volumes within the VG are now available as file system mounts or raw usage. Ensure that services using them have proper dependencies to collocate them with and order them after the VG has been activated.

After finishing these configuration steps, the LVM2 configuration can be done like on any stand-alone workstation.

16.3 Configuring Eligible LVM2 Devices Explicitly

When several devices seemingly share the same physical volume signature (as can be the case for multipath devices or DRBD), it is recommended to explicitly configure the devices which LVM2 scans for PVs.

For example, if the command vgcreate uses the physical device instead of using the mirrored block device, DRBD will be confused which may result in a split brain condition for DRBD.

To deactivate a single device for LVM2, do the following:

  1. Edit the file /etc/lvm/lvm.conf and search for the line starting with filter.

  2. The patterns there are handled as regular expressions. A leading a means to accept a device pattern to the scan, a leading r rejects the devices that follow the device pattern.

  3. To remove a device named /dev/sdb1, add the following expression to the filter rule:


    The complete filter line will look like the following:

    filter = [ "r|^/dev/sdb1$|", "r|/dev/.*/by-path/.*|", "r|/dev/.*/by-id/.*|", "a/.*/" ]

    A filter line, that accepts DRBD and MPIO devices but rejects all other devices would look like this:

    filter = [ "a|/dev/drbd.*|", "a|/dev/.*/by-id/dm-uuid-mpath-.*|", "r/.*/" ]
  4. Write the configuration file and copy it to all cluster nodes.

16.4 For More Information

Thorough information is available from the pacemaker mailing list, available at http://www.clusterlabs.org/wiki/Help:Contents.

The official cLVM FAQ can be found at http://sources.redhat.com/cluster/wiki/FAQ/CLVM.

Print this page