Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE Enterprise Storage 5.5 (SES 5 & SES 5.5)

9 RADOS Block Device Edit source

A block is a sequence of bytes, for example a 4MB block of data. Block-based storage interfaces are the most common way to store data with rotating media, such as hard disks, CDs, floppy disks. The ubiquity of block device interfaces makes a virtual block device an ideal candidate to interact with a mass data storage system like Ceph.

Ceph block devices allow sharing of physical resources, and are resizable. They store data striped over multiple OSDs in a Ceph cluster. Ceph block devices leverage RADOS capabilities such as snapshotting, replication, and consistency. Ceph's RADOS Block Devices (RBD) interact with OSDs using kernel modules or the librbd library.

RADOS Protocol
Figure 9.1: RADOS Protocol

Ceph's block devices deliver high performance with infinite scalability to kernel modules. They support virtualization solutions such as QEMU, or cloud-based computing systems such as OpenStack that rely on libvirt. You can use the same cluster to operate the Object Gateway, CephFS, and RADOS Block Devices simultaneously.

9.1 Block Device Commands Edit source

The rbd command enables you to create, list, introspect, and remove block device images. You can also use it, for example, to clone images, create snapshots, rollback an image to a snapshot, or view a snapshot.

9.1.1 Creating a Block Device Image in a Replicated Pool Edit source

Before you can add a block device to a client, you need to create a related image in an existing pool (see Chapter 8, Managing Storage Pools):

cephadm > rbd create --size MEGABYTES POOL-NAME/IMAGE-NAME

For example, to create a 1GB image named 'myimage' that stores information in a pool named 'mypool', execute the following:

cephadm > rbd create --size 1024 mypool/myimage
Tip
Tip: Image Size Units

If you omit a size unit shortcut ('G' or 'T'), the image's size is in megabytes. Use 'G' or 'T' after the size number to specify gigabytes or terabytes.

9.1.2 Creating a Block Device Image in an Erasure Coded Pool Edit source

As of SUSE Enterprise Storage 5.5, it is possible to store data of a block device image directly in erasure coded (EC) pools. RADOS Block Device image consists of data and metadata parts. You can store only the 'data' part of an RADOS Block Device image in an EC pool. The pool needs to have the 'overwrite' flag set to true, and that is only possible if all OSDs where the pool is stored use BlueStore.

You cannot store the image's 'metadata' part in an EC pool. You need to specify the replicated pool for storing image's metadata with the --pool= option of the rbd create command.

Use the following steps to create an RBD image in a newly created EC pool:

cephadm > ceph osd pool create POOL_NAME 12 12 erasure
cephadm > ceph osd pool set POOL_NAME allow_ec_overwrites true

#Metadata will reside in pool "OTHER_POOL", and data in pool "POOL_NAME"
cephadm > rbd create IMAGE_NAME --size=1G --data-pool POOL_NAME --pool=OTHER_POOL

9.1.3 Listing Block Device Images Edit source

To list block devices in a pool named 'mypool', execute the following:

cephadm > rbd ls mypool

9.1.4 Retrieving Image Information Edit source

To retrieve information from an image 'myimage' within a pool named 'mypool', run the following:

cephadm > rbd info mypool/myimage

9.1.5 Resizing a Block Device Image Edit source

RADOS Block Device images are thin provisioned—they do not actually use any physical storage until you begin saving data to them. However, they do have a maximum capacity that you set with the --size option. If you want to increase (or decrease) the maximum size of the image, run the following:

cephadm > rbd resize --size 2048 POOL_NAME/IMAGE_NAME # to increase
cephadm > rbd resize --size 2048 POOL_NAME/IMAGE_NAME --allow-shrink # to decrease

9.1.6 Removing a Block Device Image Edit source

To remove a block device that corresponds to an image 'myimage' in a pool named 'mypool', run the following:

cephadm > rbd rm mypool/myimage

9.2 Mounting and Unmounting Edit source

After you create a RADOS Block Device, you can use it as any other disk device: format it, mount it to be able to exchange files, and unmount it when done.

  1. Make sure your Ceph cluster includes a pool with the disk image you want to map. Assume the pool is called mypool and the image is myimage.

    cephadm > rbd list mypool
  2. Map the image to a new block device.

    cephadm > rbd map --pool mypool myimage
    Tip
    Tip: User Name and Authentication

    To specify a user name, use --id user-name. If you use cephx authentication, you also need to specify a secret. It may come from a keyring or a file containing the secret:

    cephadm > rbd map --pool rbd myimage --id admin --keyring /path/to/keyring

    or

    cephadm > rbd map --pool rbd myimage --id admin --keyfile /path/to/file
  3. List all mapped devices:

    cephadm > rbd showmapped
     id pool   image   snap device
     0  mypool myimage -    /dev/rbd0

    The device we want to work on is /dev/rbd0.

    Tip
    Tip: RBD Device Path

    Instead of /dev/rbdDEVICE_NUMBER, you can use /dev/rbd/POOL_NAME/IMAGE_NAME as a persistent device path. For example:

    /dev/rbd/mypool/myimage
  4. Make an XFS file system on the /dev/rbd0 device.

    root # mkfs.xfs /dev/rbd0
     log stripe unit (4194304 bytes) is too large (maximum is 256KiB)
     log stripe unit adjusted to 32KiB
     meta-data=/dev/rbd0              isize=256    agcount=9, agsize=261120 blks
              =                       sectsz=512   attr=2, projid32bit=1
              =                       crc=0        finobt=0
     data     =                       bsize=4096   blocks=2097152, imaxpct=25
              =                       sunit=1024   swidth=1024 blks
     naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
     log      =internal log           bsize=4096   blocks=2560, version=2
              =                       sectsz=512   sunit=8 blks, lazy-count=1
     realtime =none                   extsz=4096   blocks=0, rtextents=0
  5. Mount the device and check it is correctly mounted. Replace /mnt with your mount point.

    root # mount /dev/rbd0 /mnt
    root # mount | grep rbd0
    /dev/rbd0 on /mnt type xfs (rw,relatime,attr2,inode64,sunit=8192,...

    Now you can move data to and from the device as if it was a local directory.

    Tip
    Tip: Increasing the Size of RBD Device

    If you find that the size of the RBD device is no longer enough, you can easily increase it.

    1. Increase the size of the RBD image, for example up to 10GB.

      root # rbd resize --size 10000 mypool/myimage
       Resizing image: 100% complete...done.
    2. Grow the file system to fill up the new size of the device.

      root # xfs_growfs /mnt
       [...]
       data blocks changed from 2097152 to 2560000
  6. After you finish accessing the device, you can unmap and unmount it.

    cephadm > rbd unmap /dev/rbd0
    root # unmount /mnt
Tip
Tip: Manual (Un)mounting

Since manually mapping and mounting RBD images after boot and unmounting and unmapping them before shutdown can be tedious, an rbdmap script and systemd unit is provided. Refer to Section 9.2.1, “rbdmap: Map RBD Devices at Boot Time”.

9.2.1 rbdmap: Map RBD Devices at Boot Time Edit source

rbdmap is a shell script that automates rbd map and rbd unmap operations on one or more RBD images. Although you can run the script manually at any time, the main advantage is automatic mapping and mounting of RBD images at boot time (and unmounting and unmapping at shutdown), as triggered by the Init system. A systemd unit file, rbdmap.service is included with the ceph-common package for this purpose.

The script takes a single argument, which can be either map or unmap. In either case, the script parses a configuration file. It defaults to /etc/ceph/rbdmap, but can be overridden via an environment variable RBDMAPFILE. Each line of the configuration file corresponds to an RBD image which is to be mapped, or unmapped.

The configuration file has the following format:

image_specification rbd_options
image_specification

Path to an image within a pool. Specify as pool_name/image_name.

rbd_options

An optional list of parameters to be passed to the underlying rbd map command. These parameters and their values should be specified as a comma-separated string, for example:

PARAM1=VAL1,PARAM2=VAL2,...

The example makes the rbdmap script run the following command:

cephadm > rbd map POOL_NAME/IMAGE_NAME --PARAM1 VAL1 --PARAM2 VAL2

In the following example you can see how to specify a user name and a keyring with a corresponding secret:

cephadm > rbdmap map mypool/myimage id=rbd_user,keyring=/etc/ceph/ceph.client.rbd.keyring

When run as rbdmap map, the script parses the configuration file, and for each specified RBD image, it attempts to first map the image (using the rbd map command) and then mount the image.

When run as rbdmap unmap, images listed in the configuration file will be unmounted and unmapped.

rbdmap unmap-all attempts to unmount and subsequently unmap all currently mapped RBD images, regardless of whether they are listed in the configuration file.

If successful, the rbd map operation maps the image to a /dev/rbdX device, at which point a udev rule is triggered to create a friendly device name symbolic link /dev/rbd/pool_name/image_name pointing to the real mapped device.

In order for mounting and unmounting to succeed, the 'friendly' device name needs to have a corresponding entry in /etc/fstab. When writing /etc/fstab entries for RBD images, specify the 'noauto' (or 'nofail') mount option. This prevents the Init system from trying to mount the device too early—before the device in question even exists, as rbdmap.service is typically triggered quite late in the boot sequence.

For a complete list of rbd options, see the rbd manual page (man 8 rbd).

For examples of the rbdmap usage, see the rbdmap manual page (man 8 rbdmap).

9.2.2 Increasing the Size of RBD Device Edit source

If you find that the size of the RBD device is no longer enough, you can easily increase it.

  1. Increase the size of the RBD image, for example up to 10GB.

    cephadm > rbd resize --size 10000 mypool/myimage
     Resizing image: 100% complete...done.
  2. Grow the file system to fill up the new size of the device.

    root # xfs_growfs /mnt
     [...]
     data blocks changed from 2097152 to 2560000

9.3 Snapshots Edit source

An RBD snapshot is a snapshot of a RADOS Block Device image. With snapshots, you retain a history of the image's state. Ceph also supports snapshot layering, which allows you to clone VM images quickly and easily. Ceph supports block device snapshots using the rbd command and many higher-level interfaces, including QEMU, libvirt, OpenStack, and CloudStack.

Note
Note

Stop input and output operations and flush all pending writes before snapshotting an image. If the image contains a file system, the file system must be in a consistent state at the time of snapshotting.

9.3.1 Cephx Notes Edit source

When cephx is enabled (see http://ceph.com/docs/master/rados/configuration/auth-config-ref/ for more information), you must specify a user name or ID and a path to the keyring containing the corresponding key for the user. See User Management for more details. You may also add the CEPH_ARGS environment variable to avoid re-entry of the following parameters.

cephadm > rbd --id user-ID --keyring=/path/to/secret commands
cephadm > rbd --name username --keyring=/path/to/secret commands

For example:

cephadm > rbd --id admin --keyring=/etc/ceph/ceph.keyring commands
cephadm > rbd --name client.admin --keyring=/etc/ceph/ceph.keyring commands
Tip
Tip

Add the user and secret to the CEPH_ARGS environment variable so that you do not need to enter them each time.

9.3.2 Snapshot Basics Edit source

The following procedures demonstrate how to create, list, and remove snapshots using the rbd command on the command line.

9.3.2.1 Create Snapshot Edit source

To create a snapshot with rbd, specify the snap create option, the pool name, and the image name.

cephadm > rbd --pool pool-name snap create --snap snap-name image-name
cephadm > rbd snap create pool-name/image-name@snap-name

For example:

cephadm > rbd --pool rbd snap create --snap snapshot1 image1
cephadm > rbd snap create rbd/image1@snapshot1

9.3.2.2 List Snapshots Edit source

To list snapshots of an image, specify the pool name and the image name.

cephadm > rbd --pool pool-name snap ls image-name
cephadm > rbd snap ls pool-name/image-name

For example:

cephadm > rbd --pool rbd snap ls image1
cephadm > rbd snap ls rbd/image1

9.3.2.3 Rollback Snapshot Edit source

To rollback to a snapshot with rbd, specify the snap rollback option, the pool name, the image name, and the snapshot name.

cephadm > rbd --pool pool-name snap rollback --snap snap-name image-name
cephadm > rbd snap rollback pool-name/image-name@snap-name

For example:

cephadm > rbd --pool pool1 snap rollback --snap snapshot1 image1
cephadm > rbd snap rollback pool1/image1@snapshot1
Note
Note

Rolling back an image to a snapshot means overwriting the current version of the image with data from a snapshot. The time it takes to execute a rollback increases with the size of the image. It is faster to clone from a snapshot than to rollback an image to a snapshot, and it is the preferred method of returning to a pre-existing state.

9.3.2.4 Delete a Snapshot Edit source

To delete a snapshot with rbd, specify the snap rm option, the pool name, the image name, and the user name.

cephadm > rbd --pool pool-name snap rm --snap snap-name image-name
cephadm > rbd snap rm pool-name/image-name@snap-name

For example:

cephadm > rbd --pool pool1 snap rm --snap snapshot1 image1
cephadm > rbd snap rm pool1/image1@snapshot1
Note
Note

Ceph OSDs delete data asynchronously, so deleting a snapshot does not free up the disk space immediately.

9.3.2.5 Purge Snapshots Edit source

To delete all snapshots for an image with rbd, specify the snap purge option and the image name.

cephadm > rbd --pool pool-name snap purge image-name
cephadm > rbd snap purge pool-name/image-name

For example:

cephadm > rbd --pool pool1 snap purge image1
cephadm > rbd snap purge pool1/image1

9.3.3 Layering Edit source

Ceph supports the ability to create multiple copy-on-write (COW) clones of a block device snapshot. Snapshot layering enables Ceph block device clients to create images very quickly. For example, you might create a block device image with a Linux VM written to it, then, snapshot the image, protect the snapshot, and create as many copy-on-write clones as you like. A snapshot is read-only, so cloning a snapshot simplifies semantics—making it possible to create clones rapidly.

Note
Note

The terms 'parent' and 'child' mentioned in the command line examples below mean a Ceph block device snapshot (parent) and the corresponding image cloned from the snapshot (child).

Each cloned image (child) stores a reference to its parent image, which enables the cloned image to open the parent snapshot and read it.

A COW clone of a snapshot behaves exactly like any other Ceph block device image. You can read to, write from, clone, and resize cloned images. There are no special restrictions with cloned images. However, the copy-on-write clone of a snapshot refers to the snapshot, so you must protect the snapshot before you clone it.

Note
Note: --image-format 1 not Supported

You cannot create snapshots of images created with the deprecated rbd create --image-format 1 option. Ceph only supports cloning of the default format 2 images.

9.3.3.1 Getting Started with Layering Edit source

Ceph block device layering is a simple process. You must have an image. You must create a snapshot of the image. You must protect the snapshot. After you have performed these steps, you can begin cloning the snapshot.

The cloned image has a reference to the parent snapshot, and includes the pool ID, image ID, and snapshot ID. The inclusion of the pool ID means that you may clone snapshots from one pool to images in another pool.

  • Image Template: A common use case for block device layering is to create a master image and a snapshot that serves as a template for clones. For example, a user may create an image for a Linux distribution (for example, SUSE Linux Enterprise Server), and create a snapshot for it. Periodically, the user may update the image and create a new snapshot (for example, zypper ref && zypper patch followed by rbd snap create). As the image matures, the user can clone any one of the snapshots.

  • Extended Template: A more advanced use case includes extending a template image that provides more information than a base image. For example, a user may clone an image (a VM template) and install other software (for example, a database, a content management system, or an analytics system), and then snapshot the extended image, which itself may be updated in the same way as the base image.

  • Template Pool: One way to use block device layering is to create a pool that contains master images that act as templates, and snapshots of those templates. You may then extend read-only privileges to users so that they may clone the snapshots without the ability to write or execute within the pool.

  • Image Migration/Recovery: One way to use block device layering is to migrate or recover data from one pool into another pool.

9.3.3.2 Protecting a Snapshot Edit source

Clones access the parent snapshots. All clones would break if a user inadvertently deleted the parent snapshot. To prevent data loss, you need to protect the snapshot before you can clone it.

cephadm > rbd --pool pool-name snap protect \
 --image image-name --snap snapshot-name
cephadm > rbd snap protect pool-name/image-name@snapshot-name

For example:

cephadm > rbd --pool pool1 snap protect --image image1 --snap snapshot1
cephadm > rbd snap protect pool1/image1@snapshot1
Note
Note

You cannot delete a protected snapshot.

9.3.3.3 Cloning a Snapshot Edit source

To clone a snapshot, you need to specify the parent pool, image, snapshot, the child pool, and the image name. You need to protect the snapshot before you can clone it.

cephadm > rbd clone --pool pool-name --image parent-image \
 --snap snap-name --dest-pool pool-name \
 --dest child-image
cephadm > rbd clone pool-name/parent-image@snap-name \
pool-name/child-image-name

For example:

cephadm > rbd clone pool1/image1@snapshot1 pool1/image2
Note
Note

You may clone a snapshot from one pool to an image in another pool. For example, you may maintain read-only images and snapshots as templates in one pool, and writable clones in another pool.

9.3.3.4 Unprotecting a Snapshot Edit source

Before you can delete a snapshot, you must unprotect it first. Additionally, you may not delete snapshots that have references from clones. You need to flatten each clone of a snapshot before you can delete the snapshot.

cephadm > rbd --pool pool-name snap unprotect --image image-name \
 --snap snapshot-name
cephadm > rbd snap unprotect pool-name/image-name@snapshot-name

For example:

cephadm > rbd --pool pool1 snap unprotect --image image1 --snap snapshot1
cephadm > rbd snap unprotect pool1/image1@snapshot1

9.3.3.5 Listing Children of a Snapshot Edit source

To list the children of a snapshot, execute the following:

cephadm > rbd --pool pool-name children --image image-name --snap snap-name
cephadm > rbd children pool-name/image-name@snapshot-name

For example:

cephadm > rbd --pool pool1 children --image image1 --snap snapshot1
cephadm > rbd children pool1/image1@snapshot1

9.3.3.6 Flattening a Cloned Image Edit source

Cloned images retain a reference to the parent snapshot. When you remove the reference from the child clone to the parent snapshot, you effectively 'flatten' the image by copying the information from the snapshot to the clone. The time it takes to flatten a clone increases with the size of the snapshot. To delete a snapshot, you must flatten the child images first.

cephadm > rbd --pool pool-name flatten --image image-name
cephadm > rbd flatten pool-name/image-name

For example:

cephadm > rbd --pool pool1 flatten --image image1
cephadm > rbd flatten pool1/image1
Note
Note

Since a flattened image contains all the information from the snapshot, a flattened image will take up more storage space than a layered clone.

9.4 Mirroring Edit source

RBD images can be asynchronously mirrored between two Ceph clusters. This capability uses the RBD journaling image feature to ensure crash-consistent replication between clusters. Mirroring is configured on a per-pool basis within peer clusters and can be configured to automatically mirror all images within a pool or only a specific subset of images. Mirroring is configured using the rbd command. The rbd-mirror daemon is responsible for pulling image updates from the remote peer cluster and applying them to the image within the local cluster.

Note
Note: rbd-mirror Daemon

To use RBD mirroring, you need to have two Ceph clusters, each running the rbd-mirror daemon.

Important
Important: RADOS Block Devices Exported via iSCSI

You cannot mirror RBD devices that are exported via iSCSI using lrbd.

Refer to Chapter 14, Ceph iSCSI Gateway for more details on iSCSI.

9.4.1 rbd-mirror Daemon Edit source

The two rbd-mirror daemons are responsible for watching image journals on the remote, peer cluster and replaying the journal events against the local cluster. The RBD image journaling feature records all modifications to the image in the order they occur. This ensures that a crash-consistent mirror of the remote image is available locally.

The rbd-mirror daemon is available in the rbd-mirror package. You can install the package on OSD nodes, gateway nodes, or even on dedicated nodes. We do not recommend installing the rbd-mirror on the Salt master/admin node. Install, enable, and start rbd-mirror:

root # zypper install rbd-mirror
root # systemctl enable ceph-rbd-mirror@server_name.service
root # systemctl start ceph-rbd-mirror@server_name.service
Important
Important

Each rbd-mirror daemon requires the ability to connect to both clusters simultaneously.

9.4.2 Pool Configuration Edit source

The following procedures demonstrate how to perform the basic administrative tasks to configure mirroring using the rbd command. Mirroring is configured on a per-pool basis within the Ceph clusters.

You need to perform the pool configuration steps on both peer clusters. These procedures assume two clusters, named 'local' and 'remote', are accessible from a single host for clarity.

See the rbd manual page (man 8 rbd) for additional details on how to connect to different Ceph clusters.

Tip
Tip: Multiple Clusters

The cluster name in the following examples corresponds to a Ceph configuration file of the same name /etc/ceph/remote.conf. See the ceph-conf documentation for how to configure multiple clusters.

9.4.2.1 Enable Mirroring on a Pool Edit source

To enable mirroring on a pool, specify the mirror pool enable subcommand, the pool name, and the mirroring mode. The mirroring mode can either be pool or image:

pool

All images in the pool with the journaling feature enabled are mirrored.

image

Mirroring needs to be explicitly enabled on each image. See Section 9.4.3.2, “Enable Image Mirroring” for more information.

For example:

cephadm > rbd --cluster local mirror pool enable POOL_NAME pool
cephadm > rbd --cluster remote mirror pool enable POOL_NAME pool

9.4.2.2 Disable Mirroring Edit source

To disable mirroring on a pool, specify the mirror pool disable subcommand and the pool name. When mirroring is disabled on a pool in this way, mirroring will also be disabled on any images (within the pool) for which mirroring was enabled explicitly.

cephadm > rbd --cluster local mirror pool disable POOL_NAME
cephadm > rbd --cluster remote mirror pool disable POOL_NAME

9.4.2.3 Add Cluster Peer Edit source

In order for the rbd-mirror daemon to discover its peer cluster, the peer needs to be registered to the pool. To add a mirroring peer cluster, specify the mirror pool peer add subcommand, the pool name, and a cluster specification:

cephadm > rbd --cluster local mirror pool peer add POOL_NAME client.remote@remote
cephadm > rbd --cluster remote mirror pool peer add POOL_NAME client.local@local

9.4.2.4 Remove Cluster Peer Edit source

To remove a mirroring peer cluster, specify the mirror pool peer remove subcommand, the pool name, and the peer UUID (available from the rbd mirror pool info command):

cephadm > rbd --cluster local mirror pool peer remove POOL_NAME \
 55672766-c02b-4729-8567-f13a66893445
cephadm > rbd --cluster remote mirror pool peer remove POOL_NAME \
 60c0e299-b38f-4234-91f6-eed0a367be08

9.4.3 Image Configuration Edit source

Unlike pool configuration, image configuration only needs to be performed against a single mirroring peer Ceph cluster.

Mirrored RBD images are designated as either primary or non-primary. This is a property of the image and not the pool. Images that are designated as non-primary cannot be modified.

Images are automatically promoted to primary when mirroring is first enabled on an image (either implicitly if the pool mirror mode was 'pool' and the image has the journaling image feature enabled, or explicitly (see Section 9.4.3.2, “Enable Image Mirroring”) by the rbd command).

9.4.3.1 Image Journaling Support Edit source

RBD mirroring uses the RBD journaling feature to ensure that the replicated image always remains crash-consistent. Before an image can be mirrored to a peer cluster, the journaling feature must be enabled. The feature can be enabled at the time of image creation by providing the --image-feature exclusive-lock,journaling option to the rbd command.

Alternatively, the journaling feature can be dynamically enabled on pre-existing RBD images. To enable journaling, specify the feature enable subcommand, the pool and image name, and the feature name:

cephadm > rbd --cluster local feature enable POOL_NAME/IMAGE_NAME journaling
Note
Note: Option Dependency

The journaling feature is dependent on the exclusive-lock feature. If the exclusive-lock feature is not already enabled, you need to enable it prior to enabling the journaling feature.

Warning
Warning: Journaling on All New Images

You can enable journaling on all new images by default by appending the journaling value to the rbd default features option in the Ceph configuration file. For example:

rbd default features = layering,exclusive-lock,object-map,deep-flatten,journaling

Before applying such change, carefully consider if enabling journaling on all new images is good for your deployment because it can have negative performance impact.

9.4.3.2 Enable Image Mirroring Edit source

If mirroring is configured in the 'image' mode, then it is necessary to explicitly enable mirroring for each image within the pool. To enable mirroring for a specific image, specify the mirror image enable subcommand along with the pool and image name:

cephadm > rbd --cluster local mirror image enable POOL_NAME/IMAGE_NAME

9.4.3.3 Disable Image Mirroring Edit source

To disable mirroring for a specific image, specify the mirror image disable subcommand along with the pool and image name:

cephadm > rbd --cluster local mirror image disable POOL_NAME/IMAGE_NAME

9.4.3.4 Image Promotion and Demotion Edit source

In a failover scenario where the primary designation needs to be moved to the image in the peer cluster, you need to stop access to the primary image, demote the current primary image, promote the new primary image, and resume access to the image on the alternate cluster.

Note
Note: Forced Promotion

Promotion can be forced using the --force option. Forced promotion is needed when the demotion cannot be propagated to the peer cluster (for example, in case of cluster failure or communication outage). This will result in a split-brain scenario between the two peers, and the image will no longer be synchronized until a resyncsubcommand is issued.

To demote a specific image to non-primary, specify the mirror image demote subcommand along with the pool and image name:

cephadm > rbd --cluster local mirror image demote POOL_NAME/IMAGE_NAME

To demote all primary images within a pool to non-primary, specify the mirror pool demote subcommand along with the pool name:

cephadm > rbd --cluster local mirror pool demote POOL_NAME

To promote a specific image to primary, specify the mirror image promote subcommand along with the pool and image name:

cephadm > rbd --cluster remote mirror image promote POOL_NAME/IMAGE_NAME

To promote all non-primary images within a pool to primary, specify the mirror pool promote subcommand along with the pool name:

cephadm > rbd --cluster local mirror pool promote POOL_NAME
Tip
Tip: Split I/O Load

Since the primary or non-primary status is per-image, it is possible to have two clusters split the IO load and stage failover or failback.

9.4.3.5 Force Image Resync Edit source

If a split-brain event is detected by the rbd-mirror daemon, it will not attempt to mirror the affected image until corrected. To resume mirroring for an image, first demote the image determined to be out of date and then request a resync to the primary image. To request an image resync, specify the mirror image resync subcommand along with the pool and image name:

cephadm > rbd mirror image resync POOL_NAME/IMAGE_NAME

9.4.4 Mirror Status Edit source

The peer cluster replication status is stored for every primary mirrored image. This status can be retrieved using the mirror image status and mirror pool status subcommands:

To request the mirror image status, specify the mirror image status subcommand along with the pool and image name:

cephadm > rbd mirror image status POOL_NAME/IMAGE_NAME

To request the mirror pool summary status, specify the mirror pool status subcommand along with the pool name:

cephadm > rbd mirror pool status POOL_NAME
Tip
Tip:

Adding the --verbose option to the mirror pool status subcommand will additionally output status details for every mirroring image in the pool.

9.5 Advanced Features Edit source

RADOS Block Device supports advanced features that enhance the functionality of RBD images. You can specify the features either on the command line when creating an RBD image, or in the Ceph configuration file by using the rbd_default_features option.

You can specify the values of the rbd_default_features option in two ways:

  • As a sum of features' internal values. Each feature has its own internal value—for example 'layering' has 1 and 'fast-diff' has 16. Therefore to activate these two feature by default, include the following:

    rbd_default_features = 17
  • As a comma-separated list of features. The previous example will look as follows:

    rbd_default_features = layering,fast-diff
Note
Note: Features not Supported by iSCSI

RBD images with the following features will not be supported by iSCSI: deep-flatten, striping, exclusive-lock, object-map, journaling, fast-diff

List of advanced RBD features follows:

layering

Layering enables you to use cloning.

Internal value is 1, default is 'yes'.

striping

Striping spreads data across multiple objects and helps with parallelism for sequential read/write workloads. It prevents single node bottleneck for large or busy RADOS Block Devices.

Internal value is 2, default is 'yes'.

exclusive-lock

When enabled, it requires a client to get a lock on an object before making a write. Enable the exclusive lock only when a single client is accessing an image at the same time. Internal value is 4. Default is 'yes'.

object-map

Object map support depends on exclusive lock support. Block devices are thin provisioned meaning that they only store data that actually exists. Object map support helps track which objects actually exist (have data stored on a drive). Enabling object map support speeds up I/O operations for cloning, importing and exporting a sparsely populated image, and deleting.

Internal value is 8, default is 'yes'.

fast-diff

Fast-diff support depends on object map support and exclusive lock support. It adds another property to the object map, which makes it much faster to generate diffs between snapshots of an image, and the actual data usage of a snapshot.

Internal value is 16, default is 'yes'.

deep-flatten

Deep-flatten makes the rbd flatten (see Section 9.3.3.6, “Flattening a Cloned Image”) work on all the snapshots of an image, in addition to the image itself. Without it, snapshots of an image will still rely on the parent, therefore you will not be able to delete the parent image until the snapshots are deleted. Deep-flatten makes a parent independent of its clones, even if they have snapshots.

Internal value is 32, default is 'yes'.

journaling

Journaling support depends on exclusive lock support. Journaling records all modifications to an image in the order they occur. RBD mirroring (see Section 9.4, “Mirroring”) utilizes the journal to replicate a crash consistent image to a remote cluster.

Internal value is 64, default is 'no'.

Print this page