Jump to content
documentation.suse.com / Highly Available NFS Storage with DRBD and Pacemaker
SUSE Linux Enterprise High Availability 15 SP3

Highly Available NFS Storage with DRBD and Pacemaker

Publication Date: December 12, 2024

This document describes how to set up highly available NFS storage in a two-node cluster, using the following components: DRBD* (Distributed Replicated Block Device), LVM (Logical Volume Manager), and Pacemaker as cluster resource manager.

Copyright © 2006–2024 SUSE LLC and contributors. All rights reserved.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled GNU Free Documentation License.

For SUSE trademarks, see https://www.suse.com/company/legal/. All third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.

All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof.

1 Usage scenario

This document helps you set up a highly available NFS server. The cluster used for the highly available NFS storage has the following properties:

  • Two nodes: alice (IP: 192.168.1.1) and bob (IP: 192.168.1.2), connected to each other via network.

  • Two floating, virtual IP addresses (192.168.1.10 and 192.168.1.11), allowing clients to connect to a service no matter which physical node it is running on. One IP address is used for cluster administration with Hawk2, and the other IP address is used exclusively for the NFS exports.

  • SBD used as a STONITH fencing device to avoid split-brain scenarios. STONITH is mandatory for the HA cluster.

  • Failover of resources from one node to the other if the active host breaks down (active/passive setup).

  • Local storage on each node. The data is synchronized between the nodes using DRBD on top of LVM.

  • A file system exported through NFS and a separate file system used to track the NFS client states.

After installing and setting up the basic two-node cluster, and extending it with storage and cluster resources for NFS, you will have a highly available NFS storage server.

2 Preparing a two-node cluster

Before you can set up highly available NFS storage, you must prepare a High Availability cluster:

Procedure 1: Preparing a two-node cluster for NFS storage
  1. Install and set up a basic two-node cluster as described in Installation and Setup Quick Start.

  2. On both nodes, install the package nfs-kernel-server:

    # zypper install nfs-kernel-server
  3. On both nodes, set the NFS server scope:

    1. Create a new directory named nfs-server.service.d:

      # mkdir -p /etc/systemd/system/nfs-server.service.d
    2. Create the file /etc/systemd/system/nfs-server.service.d/scope.conf and add the following content:

      [Service]
      ExecStart= 1
      ExecStart=/usr/bin/unshare -u /bin/sh -c "hostname SUSE; /usr/sbin/rpc.nfsd" 2

      1

      A service can only have one ExecStart setting, so the empty ExecStart line in this override file is used to undo any existing ExecStart setting in the NFS service file.

      2

      The scope must be the same on all nodes in the cluster that run the NFS server. All clusters using SUSE software can use the same scope, so we recommend setting the value to SUSE.

    3. Reload the systemd files:

      # systemctl daemon-reload

3 Creating LVM devices

LVM (Logical Volume Manager) enables flexible distribution of storage space across several file systems.

Use crm cluster run to run these commands on both nodes at once.

Procedure 2: Creating LVM devices for DRBD
  1. Create an LVM physical volume, replacing /dev/disk/by-id/DEVICE_ID with your corresponding device for LVM:

    # crm cluster run "pvcreate /dev/disk/by-id/DEVICE_ID"
  2. Create an LVM volume group nfs that includes this physical volume:

    # crm cluster run "vgcreate nfs /dev/disk/by-id/DEVICE_ID"
  3. Create a logical volume named share in the volume group nfs:

    # crm cluster run "lvcreate -n share -L 20G nfs"

    This volume is for the NFS exports.

  4. Create a logical volume named state in the volume group nfs:

    # crm cluster run "lvcreate -n state -L 8G nfs"

    This volume is for the NFS client states. The 8 GB volume size used in this example should support several thousand concurrent NFS clients.

  5. Activate the volume group:

    # crm cluster run "vgchange -ay nfs"

You should now see the following devices on the system: /dev/nfs/share and /dev/nfs/state.

4 Creating DRBD devices

This section describes how to set up DRBD devices on top of LVM. Using LVM as a back-end of DRBD has the following benefits:

  • Easier setup than with LVM on top of DRBD.

  • Easier administration in case the LVM disks need to be resized or more disks are added to the volume group.

The following procedures result in two DRBD devices: one device for the NFS exports, and a second device to track the NFS client states.

4.1 Creating the DRBD configuration

DRBD configuration files are kept in the /etc/drbd.d/ directory and must end with a .res extension. In this procedure, the configuration file is named /etc/drbd.d/nfs.res.

Procedure 3: Creating a DRBD configuration
  1. Create the file /etc/drbd.d/nfs.res with the following contents:

    resource nfs {
       volume 0 { 1
          device           /dev/drbd0; 2
          disk             /dev/nfs/state; 3
          meta-disk        internal; 4
       }
       volume 1 {
          device           /dev/drbd1;
          disk             /dev/nfs/share;
          meta-disk        internal;
       }
    
       net {
          protocol C; 5
          fencing resource-and-stonith; 6
       }
    
       handlers { 7
          fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
          after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";
       }
    
       connection-mesh { 8
          hosts     alice bob;
       }
       on alice { 9
          address   192.168.1.1:7790;
          node-id   0;
       }
       on bob {
          address   192.168.1.2:7790;
          node-id   1;
       }
    }

    1

    The volume number for each DRBD device you want to create.

    2

    The DRBD device that applications will access.

    3

    The lower-level block device used by DRBD to store the actual data. This is the LVM device that was created in Section 3, “Creating LVM devices”.

    4

    Where the metadata is stored. Using internal, the metadata is stored together with the user data on the same device. See the man page for further information.

    5

    The protocol to use for this connection. Protocol C is the default option. It provides better data availability and does not consider a write to be complete until it has reached all local and remote disks.

    6

    Specifies the fencing policy resource-and-stonith at the DRBD level. This policy immediately suspends active I/O operations until STONITH completes.

    7

    Enables resource-level fencing to prevent Pacemaker from starting a service with outdated data. If the DRBD replication link becomes disconnected, the crm-fence-peer.9.sh script stops the DRBD resource from being promoted to another node until the replication link becomes connected again and DRBD completes its synchronization process.

    8

    Defines all nodes of a mesh. The hosts parameter contains all host names that share the same DRBD setup.

    9

    Contains the IP address and a unique identifier for each node.

  2. Open /etc/csync2/csync2.cfg and check whether the following two lines exist:

    include /etc/drbd.conf;
    include /etc/drbd.d;

    If not, add them to the file.

  3. Copy the file to the other nodes:

    # csync2 -xv

    For information about Csync2, see Book “Administration Guide”, Chapter 4 “Using the YaST cluster module”, Section 4.7 “Transferring the configuration to all nodes”.

4.2 Activating the DRBD devices

After preparing the DRBD configuration, activate the devices:

Procedure 4: Activating DRBD devices
  1. If you use a firewall in the cluster, open port 7790 in the firewall configuration.

  2. Initialize the metadata storage:

    # crm cluster run "drbdadm create-md nfs"
  3. Create the DRBD devices:

    # crm cluster run "drbdadm up nfs"
  4. The devices do not have data yet, so you can run these commands to skip the initial synchronization:

    # drbdadm new-current-uuid --clear-bitmap nfs/0
    # drbdadm new-current-uuid --clear-bitmap nfs/1
  5. Make alice primary:

    # drbdadm primary --force nfs
  6. Check the DRBD status of nfs:

    # drbdadm status nfs

    This returns the following message:

    nfs role:Primary
      volume:0 disk:UpToDate
      volume:1 disk:UpToDate
      bob role:Secondary
        volume:0 peer-disk:UpToDate
        volume:1 peer-disk:UpToDate

You can access the DRBD resources on the block devices /dev/drbd0 and /dev/drbd1.

4.3 Creating the file systems

After activating the DRBD devices, create file systems on them:

Procedure 5: Creating file systems for DRBD
  1. Create an ext4 file system on /dev/drbd0:

    # mkfs.ext4 /dev/drbd0
  2. Create an ext4 file system on /dev/drbd1:

    # mkfs.ext4 /dev/drbd1

5 Creating cluster resources

The following procedures describe how to configure the resources required for a highly available NFS cluster.

Overview of cluster resources
DRBD primitive and promotable clone resources

These resources are used to replicate data. The promotable clone resource is switched to and from the primary and secondary roles as deemed necessary by the cluster resource manager.

File system resources

These resources manage the file system that will be exported, and the file system that will track NFS client states.

NFS kernel server resource

This resource manages the NFS server daemon.

NFS exports

This resource is used to export the directory /srv/nfs/share to clients.

Virtual IP address

The initial installation creates an administrative virtual IP address for Hawk2. Create another virtual IP address exclusively for NFS exports. This makes it easier to apply security restrictions later.

Example NFS scenario
  • The following configuration examples assume that 192.168.1.11 is the virtual IP address to use for an NFS server which serves clients in the 192.168.1.x/24 subnet.

  • The service exports data served from /srv/nfs/share.

  • Into this export directory, the cluster mounts an ext4 file system from the DRBD device /dev/drbd1. This DRBD device sits on top of an LVM logical volume named /dev/nfs/share.

  • The DRBD device /dev/drbd0 is used to share the NFS client states from /var/lib/nfs. This DRBD device sits on top of an LVM logical volume named /dev/nfs/state.

5.1 Creating DRBD primitive and promotable clone resources

Create a cluster resource to manage the DRBD devices, and a promotable clone to allow this resource to run on both nodes:

Procedure 6: Creating a DRBD resource for NFS
  1. Start the crm interactive shell:

    # crm configure
  2. Create a primitive for the DRBD configuration nfs:

    crm(live)configure# primitive drbd-nfs ocf:linbit:drbd \
      params drbd_resource="nfs" \
      op monitor interval=15 role=Master \
      op monitor interval=30 role=Slave
  3. Create a promotable clone for the drbd-nfs primitive:

    crm(live)configure# ms ms-drbd-nfs drbd-nfs \
      meta master-max="1" master-node-max="1" \
      clone-max="2" clone-node-max="1" notify="true"
  4. Commit this configuration:

    crm(live)configure# commit

Pacemaker activates the DRBD resources on both nodes and promotes them to the primary role on one of the nodes. Check the state of the cluster with the crm status command, or run drbdadm status.

5.2 Creating file system resources

Create cluster resources to manage the file systems for export and state tracking:

Procedure 7: Creating file system resources for NFS
  1. Create a primitive for the NFS client states on /dev/drbd0:

    crm(live)configure# primitive fs-nfs-state Filesystem \
      params device=/dev/drbd0 directory=/var/lib/nfs fstype=ext4
  2. Create a primitive for the file system to be exported on /dev/drbd1:

    crm(live)configure# primitive fs-nfs-share Filesystem \
      params device=/dev/drbd1 directory=/srv/nfs/share fstype=ext4

    Do not commit this configuration until after you add the colocation and order constraints.

  3. Add both of these resources to a resource group named g-nfs:

    crm(live)configure# group g-nfs fs-nfs-state fs-nfs-share

    Resources start in the order they are added to the group and stop in reverse order.

  4. Add a colocation constraint to make sure that the resource group always starts on the node where the DRBD promotable clone is in the primary role:

    crm(live)configure# colocation col-nfs-on-drbd inf: g-nfs ms-drbd-nfs:Master
  5. Add an order constraint to make sure the DRBD promotable clone always starts before the resource group:

    crm(live)configure# order o-drbd-before-nfs Mandatory: ms-drbd-nfs:promote g-nfs:start
  6. Commit this configuration:

    crm(live)configure# commit

Pacemaker mounts /dev/drbd0 to /var/lib/nfs, and /dev/drbd1 to srv/nfs/share. Confirm this with mount, or by looking at /proc/mounts.

5.3 Creating an NFS kernel server resource

Create a cluster resource to manage the NFS server daemon:

Procedure 8: Creating an NFS kernel server resource
  1. Create a primitive to manage the NFS server daemon:

    crm(live)configure# primitive nfsserver nfsserver \
      params nfs_shared_infodir="/var/lib/nfs"
    Warning
    Warning: Low lease time can cause loss of file state

    NFS clients regularly renew their state with the NFS server. If the lease time is too low, system or network delays can cause the timer to expire before the renewal is complete. This can lead to I/O errors and loss of file state.

    NFSV4LEASETIME is set on the NFS server in the file /etc/sysconfig/nfs. The default is 90 seconds. If lowering the lease time is necessary, we recommend a value of 60 or higher. We strongly discourage values lower than 30.

  2. Append this resource to the existing g-nfs resource group:

    crm(live)configure# modgroup g-nfs add nfsserver
  3. Commit this configuration:

    crm(live)configure# commit

5.4 Creating an NFS export resource

Create a cluster resource to manage the NFS exports:

Procedure 9: Creating an NFS export resource
  1. Create a primitive for the NFS exports:

    crm(live)configure# primitive exportfs-nfs exportfs \
      params directory="/srv/nfs/share" \
      options="rw,mountpoint" clientspec="192.168.1.0/24" fsid=101 \1
      op monitor interval=30s timeout=90s2

    1

    The fsid must be unique for each NFS export resource.

    2

    The value of op monitor timeout must be higher than the value of stonith-timeout. To find the stonith-timeout value, run crm configure show and look under the property section.

    Important
    Important: Do not set wait_for_leasetime_on_stop=true

    Setting this option to true in a highly available NFS setup can cause unnecessary delays and loss of locks.

    The default value for wait_for_leasetime_on_stop is false. There is no need to set it to true when /var/lib/nfs and nfsserver are configured as described in this guide.

  2. Append this resource to the existing g-nfs resource group:

    crm(live)configure# modgroup g-nfs add exportfs-nfs
  3. Commit this configuration:

    crm(live)configure# commit
  4. Confirm that the NFS exports are set up properly:

    # exportfs -v
    /srv/nfs/share   IP_ADDRESS_OF_CLIENT(OPTIONS)

5.5 Creating a virtual IP address for NFS exports

Create a cluster resource to manage the virtual IP address for the NFS exports:

Procedure 10: Creating a virtual IP address for NFS exports
  1. Create a primitive for the virtual IP address:

    crm(live)configure# primitive vip-nfs IPaddr2 params ip=192.168.1.11
  2. Append this resource to the existing g-nfs resource group:

    crm(live)configure# modgroup g-nfs add vip-nfs
  3. Commit this configuration:

    crm(live)configure# commit
  4. Leave the crm interactive shell:

    crm(live)configure# quit
  5. Check the status of the cluster. The resources in the g-nfs group should appear in the following order:

    # crm status
      [...]
      Full List of Resources
        [...]
        * Resource Group: g-nfs:
          * fs-nfs-state    (ocf:heartbeat:Filesystem):   Started alice
          * fs-nfs-share    (ocf:heartbeat:Filesystem):   Started alice
          * nfsserver       (ocf:heartbeat:nfsserver):    Started alice
          * exportfs-nfs    (ocf:heartbeat:exportfs):     Started alice
          * vip-nfs         (ocf:heartbeat:IPaddr2):      Started alice

6 Using the NFS service

This section outlines how to use the highly available NFS service from an NFS client.

To connect to the NFS service, make sure to use the virtual IP address to connect to the cluster rather than a physical IP configured on one of the cluster nodes' network interfaces. For compatibility reasons, use the full path of the NFS export on the server.

The command to mount the NFS export looks like this:

# mount 192.168.1.11:/srv/nfs/share /home/share

If you need to configure other mount options, such as a specific transport protocol (proto), maximum read and write request sizes (rsize and wsize), or a specific NFS version (vers), use the -o option. For example:

# mount -o proto=tcp,rsize=32768,wsize=32768,vers=3 \
192.168.1.11:/srv/nfs/share /home/share

For further NFS mount options, see the nfs man page.

Note
Note: Loopback mounts

Loopback mounts are only supported for NFS version 3, not NFS version 4. For more information, see https://www.suse.com/support/kb/doc/?id=000018709.

7 Adding more NFS shares to the cluster

If you need to increase the available storage, you can add more NFS shares to the cluster.

In this example, a new DRBD device named /dev/drbd2 sits on top of an LVM logical volume named /dev/nfs/share2.

Procedure 11: Adding more NFS shares to the cluster
  1. Create an LVM logical volume for the new share:

    # crm cluster run "lvcreate -n share2 -L 20G nfs"
  2. Update the file /etc/drbd.d/nfs.res to add the new volume under the existing volumes:

       volume 2 {
          device           /dev/drbd2;
          disk             /dev/nfs/share2;
          meta-disk        internal;
       }
  3. Copy the updated file to the other nodes:

    # csync2 -xv
  4. Initialize the metadata storage for the new volume:

    # crm cluster run "drbdadm create-md nfs/2 --force"
  5. Update the nfs configuration to create the new device:

    # crm cluster run "drbdadm adjust nfs"
  6. Skip the initial synchronization for the new device:

    # drbdadm new-current-uuid --clear-bitmap nfs/2
  7. The NFS cluster resources might have moved to another node since they were created. Check the DRBD status with drbdadm status nfs, and make a note of which node is in the Primary role.

  8. On the node that is in the Primary role, create an ext4 file system on /dev/drbd2:

    # mkfs.ext4 /dev/drbd2
  9. Start the crm interactive shell:

    # crm configure
  10. Create a primitive for the file system to be exported on /dev/drbd2:

    crm(live)configure# primitive fs-nfs-share2 Filesystem \
      params device="/dev/drbd2" directory="/srv/nfs/share2" fstype=ext4
  11. Add the new file system resource to the g-nfs group before the nfsserver resource:

    crm(live)configure# modgroup g-nfs add fs-nfs-share2 before nfsserver
  12. Create a primitive for NFS exports from the new share:

    crm(live)configure# primitive exportfs-nfs2 exportfs \
      params directory="/srv/nfs/share2" \
      options="rw,mountpoint" clientspec="192.168.1.0/24" fsid=102 \
      op monitor interval=30s timeout=90s
  13. Add the new NFS export resource to the g-nfs group before the vip-nfs resource:

    crm(live)configure# modgroup g-nfs add exportfs-nfs2 before vip-nfs
  14. Commit this configuration:

    crm(live)configure# commit
  15. Leave the crm interactive shell:

    crm(live)configure# quit
  16. Check the status of the cluster. The resources in the g-nfs group should appear in the following order:

    # crm status
    [...]
    Full List of Resources
      [...]
      * Resource Group: g-nfs:
        * fs-nfs-state    (ocf:heartbeat:Filesystem):   Started alice
        * fs-nfs-share    (ocf:heartbeat:Filesystem):   Started alice
        * fs-nfs-share2   (ocf:heartbeat:Filesystem):   Started alice
        * nfsserver       (ocf:heartbeat:nfsserver):    Started alice
        * exportfs-nfs    (ocf:heartbeat:exportfs):     Started alice
        * exportfs-nfs2   (ocf:heartbeat:exportfs):     Started alice
        * vip-nfs         (ocf:heartbeat:IPaddr2):      Started alice
  17. Confirm that the NFS exports are set up properly:

    # exportfs -v
    /srv/nfs/share   IP_ADDRESS_OF_CLIENT(OPTIONS)
    /srv/nfs/share2  IP_ADDRESS_OF_CLIENT(OPTIONS)

8 For more information