Operational tasks | Administration and Operations Guide

13.1 Modifying the cluster configuration #

To modify the configuration of an existing Ceph cluster, follow these steps:

Export the current configuration of the cluster to a file:

cephuser@adm > ceph orch ls --export --format yaml > cluster.yaml

Edit the file with the configuration and update the relevant lines. Find specification examples in Chapter 8, Deploying the remaining core services using cephadm and Section 13.4.3, “Adding OSDs using DriveGroups specification”.

Apply the new configuration:

cephuser@adm > ceph orch apply -i cluster.yaml

13.2 Adding nodes #

To add a new node to a Ceph cluster, follow these steps:

Install SUSE Linux Enterprise Server and SUSE Enterprise Storage on the new host. Refer to Chapter 5, Installing and configuring SUSE Linux Enterprise Server for more information.
Configure the host as a Salt Minion of an already existing Salt Master. Refer to Chapter 6, Deploying Salt for more information.

Add the new host to ceph-salt and make cephadm aware of it, for example:

root@master # ceph-salt config /ceph_cluster/minions add ses-node5.example.com
root@master # ceph-salt config /ceph_cluster/roles/cephadm add ses-node5.example.com

Refer to Section 7.2.2, “Adding Salt Minions” for more information.

Verify that the node was added to ceph-salt:

root@master # ceph-salt config /ceph_cluster/minions ls
o- minions ................................................. [Minions: 5]
[...]
  o- ses-node5.example.com ................................... [no roles]

Apply the configuration to the new cluster host:

root@master # ceph-salt apply ses-node5.example.com

Verify that the newly added host now belongs to the cephadm environment:

cephuser@adm > ceph orch host ls
HOST                    ADDR                    LABELS   STATUS
[...]
ses-node5.example.com   ses-node5.example.com

13.3 Removing nodes #

Tip: Remove OSDs

If the node that you are going to remove runs OSDs, remove the OSDs from it first and check that no OSDs are running on that node. Refer to Section 13.4.4, “Removing OSDs” for more details on removing OSDs.

To remove a node from a cluster, do the following:

For all Ceph service types except for node-exporter and crash, remove the node's host name from the cluster placement specification file (for example, cluster.yml). Refer to Section 8.2, “Service and placement specification” for more details. For example, if you are removing the host named ses-node2, remove all occurrences of - ses-node2 from all placement: sections:
Update
```
service_type: rgw
service_id: EXAMPLE_NFS
placement:
  hosts:
  - ses-node2
  - ses-node3
```
to
```
service_type: rgw
service_id: EXAMPLE_NFS
placement:
  hosts:
  - ses-node3
```
Apply your changes to the configuration file:
```
cephuser@adm > ceph orch apply -i rgw-example.yaml
```

If the node is running crash.osd.1 and crash.osd.2 services, remove them by running the following command on the host:

root@minion > cephadm rm-daemon --fsid CLUSTER_ID --name SERVICE_NAME

For example:

root@minion > cephadm rm-daemon --fsid b4b30c6e... --name crash.osd.1
root@minion > cephadm rm-daemon --fsid b4b30c6e... --name crash.osd.2

Remove the node from cephadm's environment:

cephuser@adm > ceph orch host rm ses-node2

Remove all the roles from the minion you want to delete:

cephuser@adm > ceph-salt config /ceph_cluster/roles/tuned/throughput remove ses-node2
cephuser@adm > ceph-salt config /ceph_cluster/roles/tuned/latency remove ses-node2
cephuser@adm > ceph-salt config /ceph_cluster/roles/cephadm remove ses-node2
cephuser@adm > ceph-salt config /ceph_cluster/roles/admin remove ses-node2

If the minion you want to remove is the bootstrap minion, you also need to remove the bootstrap role:

cephuser@adm > ceph-salt config /ceph_cluster/roles/bootstrap reset

After removing all OSDs on a single host, remove the host from the CRUSH map:
```
cephuser@adm > ceph osd crush remove bucket-name
```
Note
The bucket name should be the same as the host name.

You can now remove the minion from the cluster:

cephuser@adm > ceph-salt config /ceph_cluster/minions remove ses-node2

Important

In the event of a failure and the minion you are trying to remove is in a permanently powered-off state, you will need to remove the node from the Salt Master:

root@master # salt-key -d minion_id

Then, manually remove the node from pillar_root/ceph-salt.sls. This is typically located in /srv/pillar/ceph-salt.sls.

13.4 OSD management #

This section describes how to add, erase, or remove OSDs in a Ceph cluster.

13.4.1 Listing disk devices #

To identify used and unused disk devices on all cluster nodes, list them by running the following command:

cephuser@adm > ceph orch device ls
HOST       PATH      TYPE SIZE  DEVICE  AVAIL REJECT REASONS
ses-admin  /dev/vda  hdd  42.0G         False locked
ses-node1  /dev/vda  hdd  42.0G         False locked
ses-node1  /dev/vdb  hdd  8192M  387836 False locked, LVM detected, Insufficient space (<5GB) on vgs
ses-node2  /dev/vdc  hdd  8192M  450575 True

13.4.2 Erasing disk devices #

To re-use a disk device, you need to erase (or zap) it first:

ceph orch device zap HOST_NAME DISK_DEVICE

For example:

cephuser@adm > ceph orch device zap ses-node2 /dev/vdc

Note

If you previously deployed OSDs by using DriveGroups or the --all-available-devices option while the unmanaged flag was not set, cephadm will deploy these OSDs automatically after you erase them.

13.4.3 Adding OSDs using DriveGroups specification #

DriveGroups specify the layouts of OSDs in the Ceph cluster. They are defined in a single YAML file. In this section, we will use drive_groups.yml as an example.

An administrator should manually specify a group of OSDs that are interrelated (hybrid OSDs that are deployed on a mixture of HDDs and SDDs) or share identical deployment options (for example, the same object store, same encryption option, stand-alone OSDs). To avoid explicitly listing devices, DriveGroups use a list of filter items that correspond to a few selected fields of ceph-volume's inventory reports. cephadm will provide code that translates these DriveGroups into actual device lists for inspection by the user.

The command to apply the OSD specification to the cluster is:

cephuser@adm > ceph orch apply osd -i drive_groups.yml

To see a preview of actions and test your application, you can use the --dry-run option together with the ceph orch apply osd command. For example:

cephuser@adm > ceph orch apply osd -i drive_groups.yml --dry-run
...
+---------+------+------+----------+----+-----+
|SERVICE  |NAME  |HOST  |DATA      |DB  |WAL  |
+---------+------+------+----------+----+-----+
|osd      |test  |mgr0  |/dev/sda  |-   |-    |
|osd      |test  |mgr0  |/dev/sdb  |-   |-    |
+---------+------+------+----------+----+-----+

If the --dry-run output matches your expectations, then simply re-run the command without the --dry-run option.

13.4.3.1 Unmanaged OSDs #

All available clean disk devices that match the DriveGroups specification will be used as OSDs automatically after you add them to the cluster. This behavior is called a managed mode.

To disable the managed mode, add the unmanaged: true line to the relevant specifications, for example:

service_type: osd
service_id: example_drvgrp_name
placement:
 hosts:
 - ses-node2
 - ses-node3
encrypted: true
unmanaged: true

Tip

To change already deployed OSDs from the managed to unmanaged mode, add the unmanaged: true lines where applicable during the procedure described in Section 13.1, “Modifying the cluster configuration”.

13.4.3.2 DriveGroups specification #

Following is an example DriveGroups specification file:

service_type: osd
service_id: example_drvgrp_name
placement:
  host_pattern: '*'
data_devices:
  drive_spec: DEVICE_SPECIFICATION
db_devices:
  drive_spec: DEVICE_SPECIFICATION
wal_devices:
  drive_spec: DEVICE_SPECIFICATION
block_wal_size: '5G'  # (optional, unit suffixes permitted)
block_db_size: '5G'   # (optional, unit suffixes permitted)
encrypted: true       # 'True' or 'False' (defaults to 'False')

Note

The option previously called "encryption" in DeepSea has been renamed to "encrypted". When applying DriveGroups in SUSE Enterprise Storage 7, ensure you use this new terminology in your service specification, otherwise the ceph orch apply operation will fail.

13.4.3.3 Matching disk devices #

You can describe the specification using the following filters:

By a disk model:
```
model: DISK_MODEL_STRING
```

By a disk vendor:

vendor: DISK_VENDOR_STRING

Tip

Always enter the DISK_VENDOR_STRING in lowercase.

To obtain details about disk model and vendor, examine the output of the following command:

cephuser@adm > ceph orch device ls
HOST      PATH     TYPE  SIZE DEVICE_ID                  MODEL            VENDOR
ses-node1 /dev/sdb ssd  29.8G SATA_SSD_AF34075704240015  SATA SSD         ATA
ses-node2 /dev/sda ssd   223G Micron_5200_MTFDDAK240TDN  Micron_5200_MTFD ATA
[...]

Whether a disk is rotational or not. SSDs and NVMe drives are not rotational.
```
rotational: 0
```
Deploy a node using all available drives for OSDs:
```
data_devices:
  all: true
```
Additionally, by limiting the number of matching disks:
```
limit: 10
```

13.4.3.4 Filtering devices by size #

You can filter disk devices by their size—either by an exact size, or a size range. The size: parameter accepts arguments in the following form:

'10G' - Includes disks of an exact size.
'10G:40G' - Includes disks whose size is within the range.
':10G' - Includes disks less than or equal to 10 GB in size.
'40G:' - Includes disks equal to or greater than 40 GB in size.

Example 13.1: Matching by disk size #

service_type: osd
service_id: example_drvgrp_name
placement:
  host_pattern: '*'
data_devices:
  size: '40TB:'
db_devices:
  size: ':2TB'

Note: Quotes required

When using the ':' delimiter, you need to enclose the size in quotes, otherwise the ':' sign will be interpreted as a new configuration hash.

Tip: Unit shortcuts

Instead of Gigabytes (G), you can specify the sizes in Megabytes (M) or Terabytes (T).

13.4.3.5 DriveGroups examples #

This section includes examples of different OSD setups.

Example 13.2: Simple setup #

This example describes two nodes with the same setup:

20 HDDs
- Vendor: Intel
- Model: SSD-123-foo
- Size: 4 TB
2 SSDs
- Vendor: Micron
- Model: MC-55-44-ZX
- Size: 512 GB

The corresponding drive_groups.yml file will be as follows:

service_type: osd
service_id: example_drvgrp_name
placement:
  host_pattern: '*'
data_devices:
  model: SSD-123-foo
db_devices:
  model: MC-55-44-XZ

Such a configuration is simple and valid. The problem is that an administrator may add disks from different vendors in the future, and these will not be included. You can improve it by reducing the filters on core properties of the drives:

service_type: osd
service_id: example_drvgrp_name
placement:
  host_pattern: '*'
data_devices:
  rotational: 1
db_devices:
  rotational: 0

In the previous example, we are enforcing all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as 'shared devices' (wal, db).

If you know that drives with more than 2 TB will always be the slower data devices, you can filter by size:

service_type: osd
service_id: example_drvgrp_name
placement:
  host_pattern: '*'
data_devices:
  size: '2TB:'
db_devices:
  size: ':2TB'

Example 13.3: Advanced setup #

This example describes two distinct setups: 20 HDDs should share 2 SSDs, while 10 SSDs should share 2 NVMes.

20 HDDs
- Vendor: Intel
- Model: SSD-123-foo
- Size: 4 TB
12 SSDs
- Vendor: Micron
- Model: MC-55-44-ZX
- Size: 512 GB
2 NVMes
- Vendor: Samsung
- Model: NVME-QQQQ-987
- Size: 256 GB

Such a setup can be defined with two layouts as follows:

service_type: osd
service_id: example_drvgrp_name
placement:
  host_pattern: '*'
data_devices:
  rotational: 0
db_devices:
  model: MC-55-44-XZ

service_type: osd
service_id: example_drvgrp_name2
placement:
  host_pattern: '*'
data_devices:
  model: MC-55-44-XZ
db_devices:
  vendor: samsung
  size: 256GB

Example 13.4: Advanced setup with non-uniform nodes #

The previous examples assumed that all nodes have the same drives. However, that is not always the case:

Nodes 1-5:

20 HDDs
- Vendor: Intel
- Model: SSD-123-foo
- Size: 4 TB
2 SSDs
- Vendor: Micron
- Model: MC-55-44-ZX
- Size: 512 GB

Nodes 6-10:

5 NVMes
- Vendor: Intel
- Model: SSD-123-foo
- Size: 4 TB
20 SSDs
- Vendor: Micron
- Model: MC-55-44-ZX
- Size: 512 GB

You can use the 'target' key in the layout to target specific nodes. Salt target notation helps to keep things simple:

service_type: osd
service_id: example_drvgrp_one2five
placement:
  host_pattern: 'node[1-5]'
data_devices:
  rotational: 1
db_devices:
  rotational: 0

followed by

service_type: osd
service_id: example_drvgrp_rest
placement:
  host_pattern: 'node[6-10]'
data_devices:
  model: MC-55-44-XZ
db_devices:
  model: SSD-123-foo

Example 13.5: Expert setup #

All previous cases assumed that the WALs and DBs use the same device. It is however possible to deploy the WAL on a dedicated device as well:

20 HDDs
- Vendor: Intel
- Model: SSD-123-foo
- Size: 4 TB
2 SSDs
- Vendor: Micron
- Model: MC-55-44-ZX
- Size: 512 GB
2 NVMes
- Vendor: Samsung
- Model: NVME-QQQQ-987
- Size: 256 GB

service_type: osd
service_id: example_drvgrp_name
placement:
  host_pattern: '*'
data_devices:
  model: MC-55-44-XZ
db_devices:
  model: SSD-123-foo
wal_devices:
  model: NVME-QQQQ-987

Example 13.6: Complex (and unlikely) setup #

In the following setup, we are trying to define:

20 HDDs backed by 1 NVMe
2 HDDs backed by 1 SSD(db) and 1 NVMe (wal)
8 SSDs backed by 1 NVMe
2 SSDs stand-alone (encrypted)
1 HDD is spare and should not be deployed

The summary of used drives is as follows:

23 HDDs
- Vendor: Intel
- Model: SSD-123-foo
- Size: 4 TB
10 SSDs
- Vendor: Micron
- Model: MC-55-44-ZX
- Size: 512 GB
1 NVMe
- Vendor: Samsung
- Model: NVME-QQQQ-987
- Size: 256 GB

The DriveGroups definition will be the following:

service_type: osd
service_id: example_drvgrp_hdd_nvme
placement:
  host_pattern: '*'
data_devices:
  rotational: 0
db_devices:
  model: NVME-QQQQ-987

service_type: osd
service_id: example_drvgrp_hdd_ssd_nvme
placement:
  host_pattern: '*'
data_devices:
  rotational: 0
db_devices:
  model: MC-55-44-XZ
wal_devices:
  model: NVME-QQQQ-987

service_type: osd
service_id: example_drvgrp_ssd_nvme
placement:
  host_pattern: '*'
data_devices:
  model: SSD-123-foo
db_devices:
  model: NVME-QQQQ-987

service_type: osd
service_id: example_drvgrp_standalone_encrypted
placement:
  host_pattern: '*'
data_devices:
  model: SSD-123-foo
encrypted: True

One HDD will remain as the file is being parsed from top to bottom.

13.4.4 Removing OSDs #

Before removing an OSD node from the cluster, verify that the cluster has more free disk space than the OSD disk you are going to remove. Be aware that removing an OSD results in rebalancing of the whole cluster.

Identify which OSD to remove by getting its ID:

cephuser@adm > ceph orch ps --daemon_type osd
NAME   HOST            STATUS        REFRESHED  AGE  VERSION
osd.0  target-ses-090  running (3h)  7m ago     3h   15.2.7.689 ...
osd.1  target-ses-090  running (3h)  7m ago     3h   15.2.7.689 ...
osd.2  target-ses-090  running (3h)  7m ago     3h   15.2.7.689 ...
osd.3  target-ses-090  running (3h)  7m ago     3h   15.2.7.689 ...

Remove one or more OSDs from the cluster:

cephuser@adm > ceph orch osd rm OSD1_ID OSD2_ID ...

For example:

cephuser@adm > ceph orch osd rm 1 2

You can query the state of the removal operation:

cephuser@adm > ceph orch osd rm status
OSD_ID  HOST         STATE                    PG_COUNT  REPLACE  FORCE  STARTED_AT
2       cephadm-dev  done, waiting for purge  0         True     False  2020-07-17 13:01:43.147684
3       cephadm-dev  draining                 17        False    True   2020-07-17 13:01:45.162158
4       cephadm-dev  started                  42        False    True   2020-07-17 13:01:45.162158

13.4.4.1 Stopping OSD removal #

After you have scheduled an OSD removal, you can stop the removal if needed. The following command will reset the initial state of the OSD and remove it from the queue:

cephuser@adm > ceph orch osd rm stop OSD_SERVICE_ID

13.4.5 Replacing OSDs #

There are several reasons why you may need to replace an OSD disk. For example:

The OSD disk failed or is soon going to fail based on SMART information, and can no longer be used to store data safely.
You need to upgrade the OSD disk, for example to increase its size.
You need to change the OSD disk layout.
You plan to move from a non-LVM to a LVM-based layout.

To replace an OSD while preserving its ID, run:

cephuser@adm > ceph orch osd rm OSD_SERVICE_ID --replace

For example:

cephuser@adm > ceph orch osd rm 4 --replace

Replacing an OSD is identical to removing an OSD (see Section 13.4.4, “Removing OSDs” for more details) with the exception that the OSD is not permanently removed from the CRUSH hierarchy and is assigned a destroyed flag instead.

The destroyed flag is used to determined OSD IDs that will be reused during the next OSD deployment. Newly added disks that match the DriveGroups specification (see Section 13.4.3, “Adding OSDs using DriveGroups specification” for more details) will be assigned OSD IDs of their replaced counterpart.

Note

In the case of replacing an OSD after a failure, we highly recommend triggering a deep scrub of the placement groups. See Section 17.6, “Scrubbing placement groups” for more details.

Run the following command to initiate a deep scrub:

cephuser@adm > ceph osd deep-scrub osd.OSD_NUMBER

Important: Shared device failure

If a shared device for DB/WAL fails you will need to perform the replacement procedure for all OSDs that share the failed device.

13.4.6 Migrating OSD's DB device #

DB device belongs to an OSD and stores its metadata (see Section 1.4, “BlueStore” for more details). There are several reasons why you may want to migrate an existing DB device to a new one—for example, when OSDs have different DB sizes and you need to align them.

Tip: ceph-volume naming convention

Some clusters may have old volume group (VG) or logical volume (LV) names prefixed with ceph-block-dbs and osd-block-db, for example:

ceph-block-dbs-c3dc9227-ca3e-49bc-992c-00602cb3eec7/osd-block-db-b346b9ff-dbbe-40db-a95e-2419ccd31f2c

The current naming convention is as follows:

ceph-c3dc9227-ca3e-49bc-992c-00602cb3eec7/osd-db-b346b9ff-dbbe-40db-a95e-2419ccd31f2c

Procedure 13.1: Migrating a DB device to a new device #

Identify the db device and osd fsid values by running the following command:

cephuser@adm > cephadm ceph-volume lvm list
[...]
====== osd.0 =======

[block]       /dev/ceph-b03b5ad4-98e8-446a-9a9f-840ecd90215c/osd-block-c276d2a4-5578-4847-94c6-8e2e6abf81c4

block device              /dev/ceph-b03b5ad4-98e8-446a-9a9f-840ecd90215c/osd-block-c276d2a4-5578-4847-94c6-8e2e6abf81c4
block uuid                Kg3ySP-ykP8-adFE-UrHY-OSiv-0WQ5-uuUEJ9
cephx lockbox secret
cluster fsid              9c8d3126-9faf-11ec-a2cf-52540035cdc1
cluster name              ceph
crush device class
db device                 /dev/ceph-block-dbs-c3dc9227-ca3e-49bc-992c-00602cb3eec7/osd-block-db-b346b9ff-dbbe-40db-a95e-2419ccd31f2c
encrypted                 0
osd fsid                  c276d2a4-5578-4847-94c6-8e2e6abf81c4
osd id                    0
osdspec affinity          sesdev_osd_deployment
type                      block
vdo                       0
devices                   /dev/vdb
[...]

Create a new logical volume (LV) for the new DB device. Refer to Section 2.4.3, “Recommended size for the BlueStore's WAL and DB device” when determining the right size for the DB device. For example:
```
# lvcreate -n osd-db-$(cat /proc/sys/kernel/random/uuid) \
 ceph-c3dc9227-ca3e-49bc-992c-00602cb3eec7 --size DB_SIZE
```
Stop the OSD. Run the following command on the OSD node where the OSD daemon runs:
```
cephuser@osd > cephadm unit stop --name osd.0
```
Enter the shell on the stopped OSD container:
```
cephuser@osd > cephadm shell --name osd.0
```

If the OSD does not have a preexisting DB device, create a new DB with the new-db command:

[ceph: root@pacific /]ceph-volume lvm new-db --osd-id 0 \
 --osd-fsid c276d2a4-5578-4847-94c6-8e2e6abf81c4 \
 --target ceph-c3dc9227-ca3e-49bc-992c-00602cb3eec7/osd-db-b346b9ff-dbbe-40db-a95e-2419ccd31f2c

Then, migrate data using the --from data flag:

[ceph: root@pacific /]ceph-volume lvm migrate --osd-id 0 \
 --osd-fsid c276d2a4-5578-4847-94c6-8e2e6abf81c4 --from data \
 --target ceph-c3dc9227-ca3e-49bc-992c-00602cb3eec7/osd-db-b346b9ff-dbbe-40db-a95e-2419ccd31f2c

If the OSD does have a preexisting DB device, migrate the DB using the --from db flag:

Exit the cephadm shell:
```
[ceph: root@pacific /]exit
```
Start the OSD. Run the following command on the OSD node where the OSD daemon runs:
```
cephuser@osd > cephadm unit --name osd.0 start
```
Remove the old DB logical volume.

13.5 Moving the Salt Master to a new node #

If you need to replace the Salt Master host with a new one, follow these steps:

Export the cluster configuration and back up the exported JSON file. Find more details in Section 7.2.14, “Exporting cluster configurations”.
If the old Salt Master is also the only administration node in the cluster, then manually move /etc/ceph/ceph.client.admin.keyring and /etc/ceph/ceph.conf to the new Salt Master.

Stop and disable the Salt Master systemd service on the old Salt Master node:

root@master # systemctl stop salt-master.service
root@master # systemctl disable salt-master.service

If the old Salt Master node is no longer in the cluster, also stop and disable the Salt Minion systemd service:
```
root@master # systemctl stop salt-minion.service
root@master # systemctl disable salt-minion.service
```
Warning
Do not stop or disable the salt-minion.service if the old Salt Master node has any Ceph daemons (MON, MGR, OSD, MDS, gateway, monitoring) running on it.
Install SUSE Linux Enterprise Server 15 SP3 on the new Salt Master following the procedure described in Chapter 5, Installing and configuring SUSE Linux Enterprise Server.
Tip: Transition of Salt Minion
To simplify the transition of Salt Minions to the new Salt Master, remove the original Salt Master's public key from each of them:
```
root@minion > rm /etc/salt/pki/minion/minion_master.pub
root@minion > systemctl restart salt-minion.service
```
Install the salt-master package and, if applicable, the salt-minion package on the new Salt Master.
Install ceph-salt on the new Salt Master node:
```
root@master # zypper install ceph-salt
root@master # systemctl restart salt-master.service
root@master # salt '*' saltutil.sync_all
```
Important
Make sure to run all three commands before continuing. The commands are idempotent; it does not matter if they get repeated.
Include the new Salt Master in the cluster as described in Section 7.1, “Installing ceph-salt”, Section 7.2.2, “Adding Salt Minions” and Section 7.2.4, “Specifying Admin Node”.
Import the backed up cluster configuration and apply it:
```
root@master # ceph-salt import CLUSTER_CONFIG.json
root@master # ceph-salt apply
```
Important
Rename the Salt Master's minion id in the exported CLUSTER_CONFIG.json file before importing it.

13.6 Updating the cluster nodes #

Keep the Ceph cluster nodes up-to-date by applying rolling updates regularly.

13.6.1 Software repositories #

Before patching the cluster with the latest software packages, verify that all the cluster's nodes have access to the relevant repositories. Refer to Section 10.1.5.1, “Software repositories” for a complete list of the required repositories.

13.6.2 Repository staging #

If you use a staging tool—for example, SUSE Manager, Subscription Management Tool, or RMT—that serves software repositories to the cluster nodes, verify that stages for both 'Updates' repositories for SUSE Linux Enterprise Server and SUSE Enterprise Storage are created at the same point in time.

We strongly recommend to use a staging tool to apply patches which have frozen or staged patch levels. This ensures that new nodes joining the cluster have the same patch level as the nodes already running in the cluster. This way you avoid the need to apply the latest patches to all the cluster's nodes before new nodes can join the cluster.

13.6.3 Downtime of Ceph services #

Depending on the configuration, cluster nodes may be rebooted during the update. If there is a single point of failure for services such as Object Gateway, Samba Gateway, NFS Ganesha, or iSCSI, the client machines may be temporarily disconnected from services whose nodes are being rebooted.

13.6.4 Running the update #

To update the software packages on all cluster nodes to the latest version, run the following command:

root@master # ceph-salt update

13.7 Updating Ceph #

You can instruct cephadm to update Ceph from one bugfix release to another. The automated update of Ceph services respects the recommended order—it starts with Ceph Managers, Ceph Monitors, and then continues on to other services such as Ceph OSDs, Metadata Servers, and Object Gateways. Each daemon is restarted only after Ceph indicates that the cluster will remain available.

Note

The following update procedure uses the ceph orch upgrade command. Keep in mind that the following instructions detail how to update your Ceph cluster with a product version (for example, a maintenance update), and does not provide instructions on how to upgrade your cluster from one product version to another.

13.7.1 Starting the update #

Before you start the update, verify that all nodes are currently online and your cluster is healthy:

cephuser@adm > cephadm shell -- ceph -s

To update to a specific Ceph release:

cephuser@adm > ceph orch upgrade start --image REGISTRY_URL

For example:

cephuser@adm > ceph orch upgrade start --image registry.suse.com/ses/7.1/ceph/ceph:latest

Upgrade packages on the hosts:

cephuser@adm > ceph-salt update

13.7.2 Monitoring the update #

Run the following command to determine whether an update is in progress:

cephuser@adm > ceph orch upgrade status

While the update is in progress, you will see a progress bar in the Ceph status output:

cephuser@adm > ceph -s
[...]
  progress:
    Upgrade to registry.suse.com/ses/7.1/ceph/ceph:latest (00h 20m 12s)
      [=======.....................] (time remaining: 01h 43m 31s)

You can also watch the cephadm log:

cephuser@adm > ceph -W cephadm

13.7.3 Cancelling an update #

You can stop the update process at any time:

cephuser@adm > ceph orch upgrade stop

13.8 Halting or rebooting cluster #

In some cases it may be necessary to halt or reboot the whole cluster. We recommended carefully checking for dependencies of running services. The following steps provide an outline for stopping and starting the cluster:

Tell the Ceph cluster not to mark OSDs as out:
```
cephuser@adm > ceph osd set noout
```
Stop daemons and nodes in the following order:
1. Storage clients
2. Gateways, for example NFS Ganesha or Object Gateway
3. Metadata Server
4. Ceph OSD
5. Ceph Manager
6. Ceph Monitor
If required, perform maintenance tasks.
Start the nodes and servers in the reverse order of the shutdown process:
1. Ceph Monitor
2. Ceph Manager
3. Ceph OSD
4. Metadata Server
5. Gateways, for example NFS Ganesha or Object Gateway
6. Storage clients
Remove the noout flag:
```
cephuser@adm > ceph osd unset noout
```

13.9 Removing an entire Ceph cluster #

The ceph-salt purge command removes the entire Ceph cluster. If there are more Ceph clusters deployed, the one reported by ceph -s is purged. This way you can clean the cluster environment when testing different setups.

To prevent accidental deletion, the orchestration checks if the safety is disengaged. You can disengage the safety measures and remove the Ceph cluster by running:

root@master # ceph-salt disengage-safety
root@master # ceph-salt purge

13.10 Offline container management #

You can run specific commands, for example, ceph-objectstore-tool and ceph-monstore-tool, inside stopped containers by calling a cephadm shell. The following examples illustrate common use cases:

Tip

When stopping an OSD daemon, we recommend setting the noout flag to prevent unnecessary data movement:

cephuser@adm > ceph osd add-noout osd.DAEMON_ID

Remember to unset the noout flag after you finish maintaining the OSD:

cephuser@adm > ceph osd rm-noout osd.DAEMON_ID

To query an OSD, run the following:

cephuser@adm > ceph osd add-noout osd.1
cephuser@adm > cephadm unit stop --name osd.1
cephuser@adm > cephadm shell --name osd.1
[ceph: root@pacific /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --op list
cephuser@adm > cephadm unit start --name osd.1
cephuser@adm > ceph osd rm-noout osd.1

To query a MON, run the following:

cephuser@adm > cephadm unit stop --name mon.pacific
cephuser@adm > cephadm shell --name mon.pacific
[ceph: root@pacific /]# ceph-monstore-tool /var/lib/ceph/mon/ceph-pacific/ dump-keys
cephuser@adm > cephadm unit start --name mon.pacific

To print a MON map, run the following:

cephuser@adm > cephadm unit stop --name mon.pacific
cephuser@adm > cephadm shell --name mon.pacific
[ceph: root@pacific /]# ceph-monstore-tool /var/lib/ceph/mon/ceph-pacific get monmap > /tmp/monmap
[ceph: root@pacific /]# monmaptool --print /tmp/monmap
monmaptool: monmap file /tmp/monmap
epoch 1
fsid 28596f44-3b56-11ec-9034-482ae35a5fbb
last_changed 2021-11-01T20:57:19.755111+0000
created 2021-11-01T20:57:19.755111+0000
min_mon_release 17 (quincy)
election_strategy: 1
0: [v2:127.0.0.1:3300/0,v1:127.0.0.1:6789/0] mon.pacific
cephuser@adm > cephadm unit start --name mon.pacific

An example of migrating OSD's DB device is in Section 13.4.6, “Migrating OSD's DB device”.

13.11 Refreshing expired SSL certificates #

Multiple Ceph services use SSL certificates to secure communication between the client and the server. The validity of an SSL certificate is normally limited and expires after the time period specified at its creation. The following are procedures to renew SSL certificates for affected Ceph services.

Tip

The following procedures start with renewing an expired certificate. By renewing, we mean obtaining a valid certificate and key file with the expiration time in the future. A certificate authority (CA) can provide one for you, or you can create a self-signed certificate yourself.

13.11.1 iSCSI Gateway #

Renew the certificate.
Insert the new certificate and key into an iSCSI Gateway service specification file as described in Section 8.3.5.1, “Secure SSL configuration” and save it as iscsi.yaml, for example.
Apply the new iSCSI Gateway service specification by running the following command:
```
cephuser@adm > ceph orch apply -i iscsi.yaml
```
Reconfigure the iSCSI Gateway service to use the new certificate:
```
cephuser@adm > ceph orch reconfig NAME_OF_ISCSI_SERVICE
```

13.11.2 Object Gateway #

Renew the certificate.
Concatenate the certificate and key files into a single file if they are in separate files.

Apply the new certificate to the Object Gateway:

cephuser@adm > ceph config-key set rgw/cert/REALM_NAME/ZONE_NAME.crt \
-i SSL_CERT_FILE

Restart the Object Gateway by running the following command:
```
cephuser@adm > ceph orch restart NAME_OF_RGW_SERVICE
```

Note

If you originally deployed the SSL certificate by specifying the rgw_frontend_ssl_certificate option in the Object Gateway specification file, delete it from the specification to avoid having two different certificate specifications.

13.11.3 Ceph Dashboard #

The procedure of refreshing the Ceph Dashboard SSL certificate is detailed in Section 10.1, “Configuring TLS/SSL support”:

If you are using a self-signed certificate, generate a new one and restart the Ceph Manager by disabling and re-enabling the dashboard module:
```
cephuser@adm > ceph dashboard create-self-signed-cert
cephuser@adm > ceph mgr module disable dashboard
cephuser@adm > ceph mgr module enable dashboard
```
If you are using a certificate signed by a CA, obtain a renewed certificate and key files and configure Ceph Dashboard to use them. Then restart the Ceph Manager by disabling and re-enabling the dashboard module:
```
cephuser@adm > ceph dashboard set-ssl-certificate -i dashboard.crt
cephuser@adm > ceph dashboard set-ssl-certificate-key -i dashboard.key
cephuser@adm > ceph mgr module disable dashboard
cephuser@adm > ceph mgr module enable dashboard
```

13.11.4 Grafana #

Renewing the Grafana SSL certificate is almost identical to initial Renewing the Grafana SSL certificate is almost identical to the initial SSL certificate setup mentioned in

If you are using a self-signed certificate, remove the existing one from the Ceph configuration and reconfigure the Grafana service to have a new certificate and key files automatically generated and applied:
Important
Ceph Pacific prior to version 16.2.11 uses old configuration paths for specifying certificate files—it lacks the /admin path. For example, the path to Grafana SSL certificate key file was as follows:
```
mgr/cephadm/grafana_key
```
instead of
```
mgr/cephadm/admin/grafana_key
```
```
cephuser@adm > ceph config-key rm mgr/cephadm/admin/grafana_key
cephuser@adm > ceph config-key rm mgr/cephadm/admin/grafana_crt
cephuser@adm > ceph orch reconfig grafana
```

If you are using a certificate signed by a CA, obtain a renewed certificate and key files, specify them, and apply the changes:

cephuser@adm > ceph config-key set mgr/cephadm/admin/grafana_key -i key.pem
cephuser@adm > ceph config-key set mgr/cephadm/admin/grafana_crt -i certificate.pem
cephuser@adm > ceph orch reconfig grafana

13.11.5 iSCSI Gateway HA (behind HAProxy/Keepalived) #

Renew the certificate.
Insert the new certificate and key into an Ingress service specification file as described in Section 8.3.4.3, “Deploying High Availability for the Object Gateway” and save it as ingress.yaml, for example.
Apply the new Ingress service specification by running the following command:
```
cephuser@adm > ceph orch apply -i ingress.yaml
```

Reconfigure the Ingress service to use the new certificate:

cephuser@adm > ceph orch reconfig NAME_OF_INGRESS_SERVICE

13 Operational tasks #

13.1 Modifying the cluster configuration #

13.2 Adding nodes #

13.3 Removing nodes #

13.4 OSD management #

13.4.1 Listing disk devices #

13.4.2 Erasing disk devices #

13.4.3 Adding OSDs using DriveGroups specification #

13.4.3.1 Unmanaged OSDs #

13.4.3.2 DriveGroups specification #

13.4.3.3 Matching disk devices #

13.4.3.4 Filtering devices by size #

13.4.3.5 DriveGroups examples #

13.4.4 Removing OSDs #

13.4.4.1 Stopping OSD removal #

13.4.5 Replacing OSDs #

13.4.6 Migrating OSD's DB device #

13.5 Moving the Salt Master to a new node #

13.6 Updating the cluster nodes #

13.6.1 Software repositories #

13.6.2 Repository staging #

13.6.3 Downtime of Ceph services #

13.6.4 Running the update #

13.7 Updating Ceph #

13.7.1 Starting the update #

13.7.2 Monitoring the update #

13.7.3 Cancelling an update #

13.8 Halting or rebooting cluster #

13.9 Removing an entire Ceph cluster #

13.10 Offline container management #

13.11 Refreshing expired SSL certificates #

13.11.1 iSCSI Gateway #

13.11.2 Object Gateway #

13.11.3 Ceph Dashboard #

13.11.4 Grafana #

13.11.5 iSCSI Gateway HA (behind HAProxy/Keepalived) #