The distributed replicated block device (DRBD*) allows you to create a mirror of two block devices that are located at two different sites across an IP network. When used with Corosync, DRBD supports distributed high-availability Linux clusters. This chapter shows you how to install and set up DRBD.
DRBD replicates data on the primary device to the secondary device in a way that ensures that both copies of the data remain identical. Think of it as a networked RAID 1. It mirrors data in real-time, so its replication occurs continuously. Applications do not need to know that in fact their data is stored on different disks.
DRBD is a Linux Kernel module and sits between the I/O scheduler at the
lower end and the file system at the upper end, see
Figure 20.1, “Position of DRBD within Linux”. To communicate with DRBD, users
use the high-level command drbdadm
. For maximum
flexibility DRBD comes with the low-level tool
drbdsetup
.
The data traffic between mirrors is not encrypted. For secure data exchange, you should deploy a Virtual Private Network (VPN) solution for the connection.
DRBD allows you to use any block device supported by Linux, usually:
partition or complete hard disk
software RAID
Logical Volume Manager (LVM)
Enterprise Volume Management System (EVMS)
By default, DRBD uses the TCP ports 7788
and higher
for communication between DRBD nodes. Make sure that your firewall does
not prevent communication on the used ports.
You must set up the DRBD devices before creating file systems on them.
Everything pertaining to user data should be done solely via the
/dev/drbdN
device and
not on the raw device, as DRBD uses the last part of the raw device for
metadata. Using the raw device will cause inconsistent data.
With udev integration, you will also get symbolic links in the form
/dev/drbd/by-res/RESOURCES
which are easier to use and provide safety against misremembering the
devices' minor number.
For example, if the raw device is 1024 MB in size, the DRBD device has only 1023 MB available for data, with about 70 KB hidden and reserved for the metadata. Any attempt to access the remaining kilobytes via raw disks fails because it is not available for user data.
Install the High Availability Extension on both SUSE Linux Enterprise Server machines in your networked cluster as described in Part I, “Installation, Setup and Upgrade”. Installing High Availability Extension also installs the DRBD program files.
If you do not need the complete cluster stack but only want to use DRBD, install the packages drbd, drbd-kmp-FLAVOR, drbd-utils, and yast2-drbd.
To simplify the work with drbdadm
, use the Bash
completion support.
If you want to enable it in your current shell session, insert the
following command:
root #
source
/etc/bash_completion.d/drbdadm.sh
To use it permanently for root
, create, or extend a file
/root/.bashrc
and insert the previous line.
The following procedure uses the server names alice and bob,
and the cluster resource name r0
. It sets up
alice as the primary node and /dev/sda1
for
storage. Make sure to modify the instructions to use your own nodes and
file names.
The following sections assumes you have two nodes, alice
and bob, and that they should use the TCP port 7788
.
Make sure this port is open in your firewall.
Prepare your system:
Make sure the block devices in your Linux nodes are ready and partitioned (if needed).
If your disk already contains a file system that you do not need anymore, destroy the file system structure with the following command:
root #
dd
if=/dev/zero of=YOUR_DEVICE count=16 bs=1M
If you have more file systems to destroy, repeat this step on all devices you want to include into your DRBD setup.
If the cluster is already using DRBD, put your cluster in maintenance mode:
root #
crm
configure property maintenance-mode=true
If you skip this step when your cluster uses already DRBD, a syntax error in the live configuration will lead to a service shutdown.
As an alternative, you can also use
drbdadm
-c FILE
to
test a configuration file.
Configure DRBD by choosing your method:
If you have configured Csync2 (which should be the default), the DRBD configuration files are already included in the list of files need to be synchronized. To synchronize them, use:
root #
csync2
-xv /etc/drbd.d/
If you do not have Csync2 (or do not want to use it), copy the DRBD configuration files manually to the other node:
root #
scp
/etc/drbd.conf bob:/etc/root #
scp
/etc/drbd.d/* bob:/etc/drbd.d/
Perform the initial synchronization (see Section 20.3.3, “Initializing and Formatting DRBD Resource”).
Reset the cluster's maintenance mode flag:
root #
crm
configure property maintenance-mode=false
The DRBD9 feature “auto promote” can use a clone and file system resource instead of a master/slave connection. When using this feature while a file system is being mounted, DRBD will change to primary mode automatically.
The auto promote feature has currently restricted support. With DRBD 9, SUSE supports the same use cases that were also supported with DRBD 8. Use cases beyond that, such as setups with more than two nodes, are not supported.
To set up DRBD manually, proceed as follows:
Beginning with DRBD version 8.3, the former configuration file is
split into separate files, located under the directory
/etc/drbd.d/
.
Open the file /etc/drbd.d/global_common.conf
. It
contains already some global, pre-defined values. Go to the
startup
section and insert these lines:
startup { # wfc-timeout degr-wfc-timeout outdated-wfc-timeout # wait-after-sb; wfc-timeout 100; degr-wfc-timeout 120; }
These options are used to reduce the timeouts when booting, see https://docs.linbit.com/docs/users-guide-9.0/#ch-configure for more details.
Create the file /etc/drbd.d/r0.res
. Change the
lines according to your situation and save it:
resource r0 { 1 device /dev/drbd0; 2 disk /dev/sda1; 3 meta-disk internal; 4 on alice { 5 address 192.168.1.10:7788; 6 node-id 0; 7 } on bob { 5 address 192.168.1.11:7788; 6 node-id 1; 7 } disk { resync-rate 10M; 8 } connection-mesh { 9 hosts alice bob; } }
DRBD resource name that allows some association to the service that needs them.
For example, | |
The device name for DRBD and its minor number.
In the example above, the minor number 0 is used for DRBD. The udev
integration scripts will give you a symbolic link
| |
The raw device that is replicated between nodes. Note, in this
example the devices are the same on both nodes.
If you need different devices, move the | |
The meta-disk parameter usually contains the value
| |
The | |
The IP address and port number of the respective node. Each
resource needs an individual port, usually starting with
| |
The node ID is required when configuring more than two nodes. It is a unique, non-negative integer to distinguish the different nodes. | |
The synchronization rate. Set it to one third of the lower of the disk- and network bandwidth. It only limits the resynchronization, not the replication. | |
Defines all nodes of a mesh.
The |
Check the syntax of your configuration file(s). If the following command returns an error, verify your files:
root #
drbdadm
dump all
Continue with Section 20.3.3, “Initializing and Formatting DRBD Resource”.
YaST can be used to start with an initial setup of DRBD. After you have created your DRBD setup, you can fine-tune the generated files manually.
However, when you have changed the configuration files, do not use the YaST DRBD module anymore. The DRBD module supports only a limited set of basic configuration. If you use it again, it is very likely that the module will not show your changes.
To set up DRBD with YaST, proceed as follows:
Start YaST and select the configuration module *.YaSTsave
.
Leave the booting flag in off
); do not change that as
Pacemaker manages this service.
If you have a firewall running, enable
.Go to the Figure 20.2, “Resource Configuration”).
entry. Click to create a new resource (seeThe following parameters need to be set:
The name of the DRBD resource (mandatory)
The host name of the relevant node
The IP address and port number (default
7788
) for the respective
node
The block device path that is used to access the replicated data.
If the device contains a minor number, the associated block device
is usually named /dev/drbdX
, where
X is the device minor number. If the
device does not contain a minor number, make sure to add
minor 0
after the device name.
The raw device that is replicated between both nodes. If you use LVM, insert your LVM device name.
The internal
or specifies an explicit device
extended by an index to hold the meta data needed by DRBD.
A real device may also be used for multiple drbd resources. For
example, if your /dev/sda6[0]
for the first resource, you may
use /dev/sda6[1]
for the second resource.
However, there must be at least 128 MB space for each resource
available on this disk. The fixed metadata size limits the maximum
data size that you can replicate.
All of these options are explained in the examples in the
/usr/share/doc/packages/drbd/drbd.conf
file and in
the man page of drbd.conf(5)
.
Click
.Click
to enter the second DRBD resource and finish with .Close the resource configuration with
and .If you use LVM with DRBD, it is necessary to change some options in the LVM configuration file (see the
entry). This change can be done by the YaST DRBD module automatically.
The disk name of localhost for the DRBD resource and the default filter
will be rejected in the LVM filter. Only /dev/drbd
can be scanned for an LVM device.
For example, if /dev/sda1
is used as a DRBD disk,
the device name will be inserted as the first entry in the LVM filter.
To change the filter manually, click the
check box.
Save your changes with
.Continue with Section 20.3.3, “Initializing and Formatting DRBD Resource”.
After you have prepared your system and configured DRBD, initialize your disk for the first time:
On both nodes (alice and bob), initialize the meta data storage:
root #
drbdadm
create-md r0root #
drbdadm
up r0
To shorten the initial resynchronization of your DRBD resource check the following:
If the DRBD devices on all nodes have the same data (for example,
by destroying the file system structure with the
dd
command as shown in
Section 20.3, “Setting Up DRBD Service”), then skip the initial
resynchronization with the following command (on both nodes):
root #
drbdadm
new-current-uuid --clear-bitmap r0/0
The state will be Secondary/Secondary UpToDate/UpToDate
Otherwise, proceed with the next step.
On the primary node alice, start the resynchronization process:
root #
drbdadm
primary --force r0
Check the status with:
root #
drbdadm
status r0 r0 role:Primary disk:UpToDate bob role:Secondary peer-disk:UpToDate
Create your file system on top of your DRBD device, for example:
root #
mkfs.ext3
/dev/drbd0
Mount the file system and use it:
root #
mount
/dev/drbd0 /mnt/
Between DRBD 8 (shipped with SUSE Linux Enterprise High Availability Extension 12 SP1) and DRBD 9 (shipped with SUSE Linux Enterprise High Availability Extension 12 SP2), the metadata format has changed. DRBD 9 does not automatically convert previous metadata files to the new format.
After migrating to 12 SP2 and before starting DRBD, convert the DRBD
metadata to the version 9 format manually. To do so, use
drbdadm
create-md
. No configuration
needs to be changed.
With DRBD 9, SUSE supports the same use cases that were also supported with DRBD 8. Use cases beyond that, such as setups with more than two nodes, are not supported.
DRBD 9 will fall back to be compatible with version 8. For three nodes and more, you need to re-create the metadata to use DRBD version 9 specific options.
If you have a stacked DRBD resource, refer also to Section 20.5, “Creating a Stacked DRBD Device” for more information.
To keep your data and allow to add new nodes without recreating new resources, do the following:
Set one node in standby mode.
Update all the DRBD packages on all of your nodes, see Section 20.2, “Installing DRBD Services”.
Add the new node information to your resource configuration:
node-id on every on
section.
connection-mesh section contains all host names in the hosts parameter.
See the example configuration in Procedure 20.1, “Manually Configuring DRBD”.
Enlarge the space of your DRBD disks when using internal
as meta-disk
key. Use a device that supports enlarging
the space like LVM.
As an alternative, change to an external disk for metadata
and use meta-disk DEVICE;
.
Re-create the metadata based on the new configuration:
root #
drbdadm
create-md RESOURCE
Cancel the standby mode.
A stacked DRBD device contains two other devices of which at least one device is also a DRBD resource. In other words, DRBD adds an additional node on top of an already existing DRBD resource (see Figure 20.3, “Resource Stacking”). Such a replication setup can be used for backup and disaster recovery purposes.
Three-way replication uses asynchronous (DRBD protocol A) and synchronous replication (DRBD protocol C). The asynchronous part is used for the stacked resource whereas the synchronous part is used for the backup.
Your production environment uses the stacked device. For example,
if you have a DRBD device /dev/drbd0
and a stacked
device /dev/drbd10
on top, the file system will
be created on /dev/drbd10
, see Example 20.1, “Configuration of a Three-Node Stacked DRBD Resource” for more details.
# /etc/drbd.d/r0.res resource r0 { protocol C; device /dev/drbd0; disk /dev/sda1; meta-disk internal; on amsterdam-alice { address 192.168.1.1:7900; } on amsterdam-bob { address 192.168.1.2:7900; } } resource r0-U { protocol A; device /dev/drbd10; stacked-on-top-of r0 { address 192.168.2.1:7910; } on berlin-charlie { disk /dev/sda10; address 192.168.2.2:7910; # Public IP of the backup node meta-disk internal; } }
When a DRBD replication link becomes interrupted, Pacemaker tries to promote the DRBD resource to another node. To prevent Pacemaker from starting a service with outdated data, enable resource-level fencing in the DRBD configuration file as shown in Example 20.2, “Configuration of DRBD with Resource-Level Fencing Using the Cluster Information Base (CIB)”.
resource RESOURCE { net { fencing resource-only; # ... } handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh"; # ... } ... }
If the DRBD replication link becomes disconnected, DRBD does the following:
DRBD calls the crm-fence-peer.9.sh
script.
The script contacts the cluster manager.
The script determines the Pacemaker resource associated with this DRBD resource.
The script ensures that the DRBD resource no longer gets promoted to any other node. It stays on the currently active one.
If the replication link becomes connected again and DRBD completes its synchronization process, then the constraint is removed. The cluster manager is now free to promote the resource.
If the install and configuration procedures worked as expected, you are ready to run a basic test of the DRBD functionality. This test also helps with understanding how the software works.
Test the DRBD service on alice.
Open a terminal console, then log in as
root
.
Create a mount point on alice, such as
/srv/r0
:
root #
mkdir
-p /srv/r0
Mount the drbd
device:
root #
mount
-o rw /dev/drbd0 /srv/r0
Create a file from the primary node:
root #
touch
/srv/r0/from_alice
Unmount the disk on alice:
root #
umount
/srv/r0
Downgrade the DRBD service on alice by typing the following command on alice:
root #
drbdadm
secondary r0
Test the DRBD service on bob.
Open a terminal console, then log in as root
on bob.
On bob, promote the DRBD service to primary:
root #
drbdadm
primary r0
On bob, check to see if bob is primary:
root #
drbdadm
status r0
On bob, create a mount point such as
/srv/r0
:
root #
mkdir
/srv/r0
On bob, mount the DRBD device:
root #
mount
-o rw /dev/drbd0 /srv/r0
Verify that the file you created on alice exists:
root #
ls
/srv/r0/from_alice
The /srv/r0/from_alice
file should be
listed.
If the service is working on both nodes, the DRBD setup is complete.
Set up alice as the primary again.
Dismount the disk on bob by typing the following command on bob:
root #
umount
/srv/r0
Downgrade the DRBD service on bob by typing the following command on bob:
root #
drbdadm
secondary
On alice, promote the DRBD service to primary:
root #
drbdadm
primary
On alice, check to see if alice is primary:
root #
drbdadm
status r0
To get the service to automatically start and fail over if the server has a problem, you can set up DRBD as a high availability service with Pacemaker/Corosync. For information about installing and configuring for SUSE Linux Enterprise 12 SP5 see Part II, “Configuration and Administration”.
There are several ways to tune DRBD:
Use an external disk for your metadata. This might help, at the cost of maintenance ease.
Tune your network connection, by changing the receive and send buffer
settings via sysctl
.
Change the max-buffers
,
max-epoch-size
or both in the DRBD
configuration.
Increase the al-extents
value, depending on
your IO patterns.
If you have a hardware RAID controller with a BBU (Battery
Backup Unit), you might benefit from setting
no-disk-flushes
,
no-disk-barrier
and/or
no-md-flushes
.
Enable read-balancing depending on your workload. See https://www.linbit.com/en/read-balancing/ for more details.
The DRBD setup involves many components and problems may arise from different sources. The following sections cover several common scenarios and recommend various solutions.
If the initial DRBD setup does not work as expected, there is probably something wrong with your configuration.
To get information about the configuration:
Open a terminal console, then log in as root
.
Test the configuration file by running drbdadm
with
the -d
option. Enter the following command:
root #
drbdadm
-d adjust r0
In a dry run of the adjust
option,
drbdadm
compares the actual configuration of the
DRBD resource with your DRBD configuration file, but it does not
execute the calls. Review the output to make sure you know the source
and cause of any errors.
If there are errors in the /etc/drbd.d/*
and
drbd.conf
files, correct them before continuing.
If the partitions and settings are correct, run
drbdadm
again without the -d
option.
root #
drbdadm
adjust r0
This applies the configuration file to the DRBD resource.
For DRBD, host names are case-sensitive (Node0
would be a different host than node0
), and
compared to the host name as stored in the Kernel (see the
uname -n
output).
If you have several network devices and want to use a dedicated network
device, the host name will likely not resolve to the used IP address. In
this case, use the parameter disable-ip-verification
.
If your system cannot connect to the peer, this might be a problem with
your local firewall. By default, DRBD uses the TCP port
7788
to access the other node. Make sure that this
port is accessible on both nodes.
In cases when DRBD does not know which of the real devices holds the latest data, it changes to a split brain condition. In this case, the respective DRBD subsystems come up as secondary and do not connect to each other. In this case, the following message can be found in the logging data:
Split-Brain detected, dropping connection!
To resolve this situation, enter the following commands on the node which has data to be discarded:
root #
drbdadm
secondary r0
If the state is in WFconnection
, disconnect first:
root #
drbdadm
disconnect r0
On the node which has the latest data enter the following:
root #
drbdadm
connect --discard-my-data r0
That resolves the issue by overwriting one node's data with the peer's data, therefore getting a consistent view on both nodes.
The following open source resources are available for DRBD:
The project home page http://www.drbd.org.
http://clusterlabs.org/wiki/DRBD_HowTo_1.0 by the Linux Pacemaker Cluster Stack Project.
The following man pages for DRBD are available in the distribution:
drbd(8)
, drbdmeta(8)
,
drbdsetup(8)
, drbdadm(8)
,
drbd.conf(5)
.
Find a commented example configuration for DRBD at
/usr/share/doc/packages/drbd-utils/drbd.conf.example
.
Furthermore, for easier storage administration across your cluster, see the recent announcement about the DRBD-Manager at https://www.linbit.com/en/drbd-manager/.