13 Operational tasks #
13.1 Modifying the cluster configuration #
To modify the configuration of an existing Ceph cluster, follow these steps:
- Export the current configuration of the cluster to a file: - cephuser@adm >ceph orch ls --export --format yaml > cluster.yaml
- Edit the file with the configuration and update the relevant lines. Find specification examples in Chapter 8, Deploying the remaining core services using cephadm and Section 13.4.3, “Adding OSDs using DriveGroups specification”. 
- Apply the new configuration: - cephuser@adm >ceph orch apply -i cluster.yaml
13.2 Adding nodes #
To add a new node to a Ceph cluster, follow these steps:
- Install SUSE Linux Enterprise Server and SUSE Enterprise Storage on the new host. Refer to Chapter 5, Installing and configuring SUSE Linux Enterprise Server for more information. 
- Configure the host as a Salt Minion of an already existing Salt Master. Refer to Chapter 6, Deploying Salt for more information. 
- Add the new host to - ceph-saltand make cephadm aware of it, for example:- root@master #ceph-salt config /ceph_cluster/minions add ses-min5.example.com- root@master #ceph-salt config /ceph_cluster/roles/cephadm add ses-min5.example.com- Refer to Section 7.2.2, “Adding Salt Minions” for more information. 
- Verify that the node was added to - ceph-salt:- root@master #ceph-salt config /ceph_cluster/minions ls o- minions ................................................. [Minions: 5] [...] o- ses-min5.example.com .................................... [no roles]
- Apply the configuration to the new cluster host: - root@master #ceph-salt apply ses-min5.example.com
- Verify that the newly added host now belongs to the cephadm environment: - cephuser@adm >ceph orch host ls HOST ADDR LABELS STATUS [...] ses-min5.example.com ses-min5.example.com
13.3 Removing nodes #
If the node that you are going to remove runs OSDs, remove the OSDs from it first and check that no OSDs are running on that node. Refer to Section 13.4.4, “Removing OSDs” for more details on removing OSDs.
To remove a node from a cluster, do the following:
- For all Ceph service types except for - node-exporterand- crash, remove the node's host name from the cluster placement specification file (for example,- cluster.yml). Refer to Section 8.2, “Service and placement specification” for more details. For example, if you are removing the host named- ses-min2, remove all occurrences of- - ses-min2from all- placement:sections:- Update - service_type: rgw service_id: EXAMPLE_NFS placement: hosts: - ses-min2 - ses-min3 - to - service_type: rgw service_id: EXAMPLE_NFS placement: hosts: - ses-min3 - Apply your changes to the configuration file: - cephuser@adm >ceph orch apply -i rgw-example.yaml
- Remove the node from cephadm's environment: - cephuser@adm >ceph orch host rm ses-min2
- If the node is running - crash.osd.1and- crash.osd.2services, remove them by running the following command on the host:- root@minion >cephadm rm-daemon --fsid CLUSTER_ID --name SERVICE_NAME- For example: - root@minion >cephadm rm-daemon --fsid b4b30c6e... --name crash.osd.1- root@minion >cephadm rm-daemon --fsid b4b30c6e... --name crash.osd.2
- Remove all the roles from the minion you want to delete: - cephuser@adm >ceph-salt config /ceph_cluster/roles/tuned/throughput remove ses-min2- cephuser@adm >ceph-salt config /ceph_cluster/roles/tuned/latency remove ses-min2- cephuser@adm >ceph-salt config /ceph_cluster/roles/cephadm remove ses-min2- cephuser@adm >ceph-salt config /ceph_cluster/roles/admin remove ses-min2- If the minion you want to remove is the bootstrap minion, you also need to remove the bootstrap role: - cephuser@adm >ceph-salt config /ceph_cluster/roles/bootstrap reset
- After removing all OSDs on a single host, remove the host from the CRUSH map: - cephuser@adm >ceph osd crush remove bucket-nameNote- The bucket name should be the same as the host name. 
- You can now remove the minion from the cluster: - cephuser@adm >ceph-salt config /ceph_cluster/minions remove ses-min2
In the event of a failure and the minion you are trying to remove is in a permanently powered-off state, you will need to remove the node from the Salt Master:
root@master # salt-key -d minion_id
    Then, manually remove the node from
    pillar_root/ceph-salt.sls.
    This is typically located in
    /srv/pillar/ceph-salt.sls.
   
13.4 OSD management #
This section describes how to add, erase, or remove OSDs in a Ceph cluster.
13.4.1 Listing disk devices #
To identify used and unused disk devices on all cluster nodes, list them by running the following command:
cephuser@adm > ceph orch device ls
HOST       PATH      TYPE SIZE  DEVICE  AVAIL REJECT REASONS
ses-master /dev/vda  hdd  42.0G         False locked
ses-min1   /dev/vda  hdd  42.0G         False locked
ses-min1   /dev/vdb  hdd  8192M  387836 False locked, LVM detected, Insufficient space (<5GB) on vgs
ses-min2   /dev/vdc  hdd  8192M  450575 True13.4.2 Erasing disk devices #
To re-use a disk device, you need to erase (or zap) it first:
ceph orch device zap HOST_NAME DISK_DEVICE
For example:
cephuser@adm > ceph orch device zap ses-min2 /dev/vdc
     If you previously deployed OSDs by using DriveGroups or the
     --all-available-devices option while the
     unmanaged flag was not set, cephadm will deploy these
     OSDs automatically after you erase them.
    
13.4.3 Adding OSDs using DriveGroups specification #
DriveGroups specify the layouts of OSDs in the Ceph
    cluster. They are defined in a single YAML file. In this section, we will
    use drive_groups.yml as an example.
   
    An administrator should manually specify a group of OSDs that are
    interrelated (hybrid OSDs that are deployed on a mixture of HDDs and SDDs)
    or share identical deployment options (for example, the same object store,
    same encryption option, stand-alone OSDs). To avoid explicitly listing
    devices, DriveGroups use a list of filter items that correspond to a few
    selected fields of ceph-volume's inventory reports.
    cephadm will provide code that translates these DriveGroups into actual
    device lists for inspection by the user.
   
The command to apply the OSD specification to the cluster is:
cephuser@adm >ceph orch apply osd -idrive_groups.yml
    To see a preview of actions and test your application, you can use the
    --dry-run option together with the ceph orch
    apply osd command. For example:
   
cephuser@adm >ceph orch apply osd -idrive_groups.yml--dry-run ... +---------+------+------+----------+----+-----+ |SERVICE |NAME |HOST |DATA |DB |WAL | +---------+------+------+----------+----+-----+ |osd |test |mgr0 |/dev/sda |- |- | |osd |test |mgr0 |/dev/sdb |- |- | +---------+------+------+----------+----+-----+
    If the --dry-run output matches your expectations, then
    simply re-run the command without the --dry-run option.
   
13.4.3.1 Unmanaged OSDs #
All available clean disk devices that match the DriveGroups specification will be used as OSDs automatically after you add them to the cluster. This behavior is called a managed mode.
     To disable the managed mode, add the
     unmanaged: true line to the relevant specifications,
     for example:
    
service_type: osd service_id: example_drvgrp_name placement: hosts: - ses-min2 - ses-min3 encrypted: true unmanaged: true
      To change already deployed OSDs from the managed to
      unmanaged mode, add the unmanaged:
      true lines where applicable during the procedure described in
      Section 13.1, “Modifying the cluster configuration”.
     
13.4.3.2 DriveGroups specification #
Following is an example DriveGroups specification file:
service_type: osd service_id: example_drvgrp_name placement: host_pattern: '*' data_devices: drive_spec: DEVICE_SPECIFICATION db_devices: drive_spec: DEVICE_SPECIFICATION wal_devices: drive_spec: DEVICE_SPECIFICATION block_wal_size: '5G' # (optional, unit suffixes permitted) block_db_size: '5G' # (optional, unit suffixes permitted) encrypted: true # 'True' or 'False' (defaults to 'False')
      The option previously called "encryption" in DeepSea has been renamed
      to "encrypted". When applying DriveGroups in SUSE Enterprise Storage 7, ensure you
      use this new terminology in your service specification, otherwise the
      ceph orch apply operation will fail.
     
13.4.3.3 Matching disk devices #
You can describe the specification using the following filters:
- By a disk model: - model: DISK_MODEL_STRING 
- By a disk vendor: - vendor: DISK_VENDOR_STRING Tip- Always enter the DISK_VENDOR_STRING in lowercase. - To obtain details about disk model and vendor, examine the output of the following command: - cephuser@adm >ceph orch device ls HOST PATH TYPE SIZE DEVICE_ID MODEL VENDOR ses-min1 /dev/sdb ssd 29.8G SATA_SSD_AF34075704240015 SATA SSD ATA ses-min2 /dev/sda ssd 223G Micron_5200_MTFDDAK240TDN Micron_5200_MTFD ATA [...]
- Whether a disk is rotational or not. SSDs and NVMe drives are not rotational. - rotational: 0 
- Deploy a node using all available drives for OSDs: - data_devices: all: true 
- Additionally, by limiting the number of matching disks: - limit: 10 
13.4.3.4 Filtering devices by size #
     You can filter disk devices by their size—either by an exact size,
     or a size range. The size: parameter accepts arguments in
     the following form:
    
- '10G' - Includes disks of an exact size. 
- '10G:40G' - Includes disks whose size is within the range. 
- ':10G' - Includes disks less than or equal to 10 GB in size. 
- '40G:' - Includes disks equal to or greater than 40 GB in size. 
service_type: osd service_id: example_drvgrp_name placement: host_pattern: '*' data_devices: size: '40TB:' db_devices: size: ':2TB'
When using the ':' delimiter, you need to enclose the size in quotes, otherwise the ':' sign will be interpreted as a new configuration hash.
Instead of Gigabytes (G), you can specify the sizes in Megabytes (M) or Terabytes (T).
13.4.3.5 DriveGroups examples #
This section includes examples of different OSD setups.
This example describes two nodes with the same setup:
- 20 HDDs - Vendor: Intel 
- Model: SSD-123-foo 
- Size: 4 TB 
 
- 2 SSDs - Vendor: Micron 
- Model: MC-55-44-ZX 
- Size: 512 GB 
 
      The corresponding drive_groups.yml file will be as
      follows:
     
service_type: osd service_id: example_drvgrp_name placement: host_pattern: '*' data_devices: model: SSD-123-foo db_devices: model: MC-55-44-XZ
Such a configuration is simple and valid. The problem is that an administrator may add disks from different vendors in the future, and these will not be included. You can improve it by reducing the filters on core properties of the drives:
service_type: osd service_id: example_drvgrp_name placement: host_pattern: '*' data_devices: rotational: 1 db_devices: rotational: 0
In the previous example, we are enforcing all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as 'shared devices' (wal, db).
If you know that drives with more than 2 TB will always be the slower data devices, you can filter by size:
service_type: osd service_id: example_drvgrp_name placement: host_pattern: '*' data_devices: size: '2TB:' db_devices: size: ':2TB'
This example describes two distinct setups: 20 HDDs should share 2 SSDs, while 10 SSDs should share 2 NVMes.
- 20 HDDs - Vendor: Intel 
- Model: SSD-123-foo 
- Size: 4 TB 
 
- 12 SSDs - Vendor: Micron 
- Model: MC-55-44-ZX 
- Size: 512 GB 
 
- 2 NVMes - Vendor: Samsung 
- Model: NVME-QQQQ-987 
- Size: 256 GB 
 
Such a setup can be defined with two layouts as follows:
service_type: osd service_id: example_drvgrp_name placement: host_pattern: '*' data_devices: rotational: 0 db_devices: model: MC-55-44-XZ
service_type: osd service_id: example_drvgrp_name2 placement: host_pattern: '*' data_devices: model: MC-55-44-XZ db_devices: vendor: samsung size: 256GB
The previous examples assumed that all nodes have the same drives. However, that is not always the case:
Nodes 1-5:
- 20 HDDs - Vendor: Intel 
- Model: SSD-123-foo 
- Size: 4 TB 
 
- 2 SSDs - Vendor: Micron 
- Model: MC-55-44-ZX 
- Size: 512 GB 
 
Nodes 6-10:
- 5 NVMes - Vendor: Intel 
- Model: SSD-123-foo 
- Size: 4 TB 
 
- 20 SSDs - Vendor: Micron 
- Model: MC-55-44-ZX 
- Size: 512 GB 
 
You can use the 'target' key in the layout to target specific nodes. Salt target notation helps to keep things simple:
service_type: osd service_id: example_drvgrp_one2five placement: host_pattern: 'node[1-5]' data_devices: rotational: 1 db_devices: rotational: 0
followed by
service_type: osd service_id: example_drvgrp_rest placement: host_pattern: 'node[6-10]' data_devices: model: MC-55-44-XZ db_devices: model: SSD-123-foo
All previous cases assumed that the WALs and DBs use the same device. It is however possible to deploy the WAL on a dedicated device as well:
- 20 HDDs - Vendor: Intel 
- Model: SSD-123-foo 
- Size: 4 TB 
 
- 2 SSDs - Vendor: Micron 
- Model: MC-55-44-ZX 
- Size: 512 GB 
 
- 2 NVMes - Vendor: Samsung 
- Model: NVME-QQQQ-987 
- Size: 256 GB 
 
service_type: osd service_id: example_drvgrp_name placement: host_pattern: '*' data_devices: model: MC-55-44-XZ db_devices: model: SSD-123-foo wal_devices: model: NVME-QQQQ-987
In the following setup, we are trying to define:
- 20 HDDs backed by 1 NVMe 
- 2 HDDs backed by 1 SSD(db) and 1 NVMe (wal) 
- 8 SSDs backed by 1 NVMe 
- 2 SSDs stand-alone (encrypted) 
- 1 HDD is spare and should not be deployed 
The summary of used drives is as follows:
- 23 HDDs - Vendor: Intel 
- Model: SSD-123-foo 
- Size: 4 TB 
 
- 10 SSDs - Vendor: Micron 
- Model: MC-55-44-ZX 
- Size: 512 GB 
 
- 1 NVMe - Vendor: Samsung 
- Model: NVME-QQQQ-987 
- Size: 256 GB 
 
The DriveGroups definition will be the following:
service_type: osd service_id: example_drvgrp_hdd_nvme placement: host_pattern: '*' data_devices: rotational: 0 db_devices: model: NVME-QQQQ-987
service_type: osd service_id: example_drvgrp_hdd_ssd_nvme placement: host_pattern: '*' data_devices: rotational: 0 db_devices: model: MC-55-44-XZ wal_devices: model: NVME-QQQQ-987
service_type: osd service_id: example_drvgrp_ssd_nvme placement: host_pattern: '*' data_devices: model: SSD-123-foo db_devices: model: NVME-QQQQ-987
service_type: osd service_id: example_drvgrp_standalone_encrypted placement: host_pattern: '*' data_devices: model: SSD-123-foo encrypted: True
One HDD will remain as the file is being parsed from top to bottom.
13.4.4 Removing OSDs #
Before removing an OSD node from the cluster, verify that the cluster has more free disk space than the OSD disk you are going to remove. Be aware that removing an OSD results in rebalancing of the whole cluster.
- Identify which OSD to remove by getting its ID: - cephuser@adm >ceph orch ps --daemon_type osd NAME HOST STATUS REFRESHED AGE VERSION osd.0 target-ses-090 running (3h) 7m ago 3h 15.2.7.689 ... osd.1 target-ses-090 running (3h) 7m ago 3h 15.2.7.689 ... osd.2 target-ses-090 running (3h) 7m ago 3h 15.2.7.689 ... osd.3 target-ses-090 running (3h) 7m ago 3h 15.2.7.689 ...
- Remove one or more OSDs from the cluster: - cephuser@adm >ceph orch osd rm OSD1_ID OSD2_ID ...- For example: - cephuser@adm >ceph orch osd rm 1 2
- You can query the state of the removal operation: - cephuser@adm >ceph orch osd rm status OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158
13.4.4.1 Stopping OSD removal #
After you have scheduled an OSD removal, you can stop the removal if needed. The following command will reset the initial state of the OSD and remove it from the queue:
cephuser@adm > ceph orch osd rm stop OSD_SERVICE_ID13.4.5 Replacing OSDs #
There are several reasons why you may need to replace an OSD disk. For example:
- The OSD disk failed or is soon going to fail based on SMART information, and can no longer be used to store data safely. 
- You need to upgrade the OSD disk, for example to increase its size. 
- You need to change the OSD disk layout. 
- You plan to move from a non-LVM to a LVM-based layout. 
To replace an OSD while preserving its ID, run:
cephuser@adm > ceph orch osd rm OSD_SERVICE_ID --replaceFor example:
cephuser@adm > ceph orch osd rm 4 --replace
    Replacing an OSD is identical to removing an OSD (see
    Section 13.4.4, “Removing OSDs” for more details) with the exception
    that the OSD is not permanently removed from the CRUSH hierarchy and is
    assigned a destroyed flag instead.
   
    The destroyed flag is used to determined OSD IDs that
    will be reused during the next OSD deployment. Newly added disks that match
    the DriveGroups specification (see Section 13.4.3, “Adding OSDs using DriveGroups specification” for more
    details) will be assigned OSD IDs of their replaced counterpart.
   
     Appending the --dry-run option will not execute the
     actual replacement, but will preview the steps that would normally happen.
    
In the case of replacing an OSD after a failure, we highly recommend triggering a deep scrub of the placement groups. See Section 17.6, “Scrubbing placement groups” for more details.
Run the following command to initiate a deep scrub:
cephuser@adm > ceph osd deep-scrub osd.OSD_NUMBERIf a shared device for DB/WAL fails you will need to perform the replacement procedure for all OSDs that share the failed device.
13.5 Moving the Salt Master to a new node #
If you need to replace the Salt Master host with a new one, follow these steps:
- Export the cluster configuration and back up the exported JSON file. Find more details in Section 7.2.14, “Exporting cluster configurations”. 
- If the old Salt Master is also the only administration node in the cluster, then manually move - /etc/ceph/ceph.client.admin.keyringand- /etc/ceph/ceph.confto the new Salt Master.
- Stop and disable the Salt Master - systemdservice on the old Salt Master node:- root@master #systemctl stop salt-master.service- root@master #systemctl disable salt-master.service
- If the old Salt Master node is no longer in the cluster, also stop and disable the Salt Minion - systemdservice:- root@master #systemctl stop salt-minion.service- root@master #systemctl disable salt-minion.serviceWarning- Do not stop or disable the - salt-minion.serviceif the old Salt Master node has any Ceph daemons (MON, MGR, OSD, MDS, gateway, monitoring) running on it.
- Install SUSE Linux Enterprise Server 15 SP2 on the new Salt Master following the procedure described in Chapter 5, Installing and configuring SUSE Linux Enterprise Server. Tip: Transition of Salt Minion- To simplify the transition of Salt Minions to the new Salt Master, remove the original Salt Master's public key from each of them: - root@minion >rm /etc/salt/pki/minion/minion_master.pub- root@minion >systemctl restart salt-minion.service
- Install the salt-master package and, if applicable, the salt-minion package on the new Salt Master. 
- Install - ceph-salton the new Salt Master node:- root@master #zypper install ceph-salt- root@master #systemctl restart salt-master.service- root@master #salt '*' saltutil.sync_allImportant- Make sure to run all three commands before continuing. The commands are idempotent; it does not matter if they get repeated. 
- Include the new Salt Master in the cluster as described in Section 7.1, “Installing - ceph-salt”, Section 7.2.2, “Adding Salt Minions” and Section 7.2.4, “Specifying Admin Node”.
- Import the backed up cluster configuration and apply it: - root@master #ceph-salt import CLUSTER_CONFIG.json- root@master #ceph-salt applyImportant- Rename the Salt Master's - minion idin the exported- CLUSTER_CONFIG.jsonfile before importing it.
13.6 Updating the cluster nodes #
Keep the Ceph cluster nodes up-to-date by applying rolling updates regularly.
13.6.1 Software repositories #
Before patching the cluster with the latest software packages, verify that all the cluster's nodes have access to the relevant repositories.
13.6.2 Repository staging #
If you use a staging tool—for example, SUSE Manager, Subscription Management Tool, or RMT—that serves software repositories to the cluster nodes, verify that stages for both 'Updates' repositories for SUSE Linux Enterprise Server and SUSE Enterprise Storage are created at the same point in time.
    We strongly recommend to use a staging tool to apply patches which have
    frozen or staged patch levels. This
    ensures that new nodes joining the cluster have the same patch level as the
    nodes already running in the cluster. This way you avoid the need to apply
    the latest patches to all the cluster's nodes before new nodes can join the
    cluster.
   
13.6.3 Downtime of Ceph services #
Depending on the configuration, cluster nodes may be rebooted during the update. If there is a single point of failure for services such as Object Gateway, Samba Gateway, NFS Ganesha, or iSCSI, the client machines may be temporarily disconnected from services whose nodes are being rebooted.
13.6.4 Running the update #
To update the software packages on all cluster nodes to the latest version, run the following command:
root@master # ceph-salt update13.7 Updating Ceph #
You can instruct cephadm to update Ceph from one bugfix release to another. The automated update of Ceph services respects the recommended order—it starts with Ceph Managers, Ceph Monitors, and then continues on to other services such as Ceph OSDs, Metadata Servers, and Object Gateways. Each daemon is restarted only after Ceph indicates that the cluster will remain available.
    The following update procedure uses the ceph orch
    upgrade command. Keep in mind that the following instructions
    detail how to update your Ceph cluster with a product version (for
    example, a maintenance update), and does not provide
    instructions on how to upgrade your cluster from one product version to
    another.
   
13.7.1 Starting the update #
Before you start the update, verify that all nodes are currently online and your cluster is healthy:
cephuser@adm > cephadm shell -- ceph -sTo update to a specific Ceph release:
cephuser@adm > ceph orch upgrade start --image REGISTRY_URLFor example:
cephuser@adm > ceph orch upgrade start --image registry.suse.com/ses/7/ceph/ceph:latestUpgrade packages on the hosts:
cephuser@adm > ceph-salt update13.7.2 Monitoring the update #
Run the following command to determine whether an update is in progress:
cephuser@adm > ceph orch upgrade statusWhile the update is in progress, you will see a progress bar in the Ceph status output:
cephuser@adm > ceph -s
[...]
  progress:
    Upgrade to registry.suse.com/ses/7/ceph/ceph:latest (00h 20m 12s)
      [=======.....................] (time remaining: 01h 43m 31s)You can also watch the cephadm log:
cephuser@adm > ceph -W cephadm13.7.3 Cancelling an update #
You can stop the update process at any time:
cephuser@adm > ceph orch upgrade stop13.8 Halting or rebooting cluster #
In some cases it may be necessary to halt or reboot the whole cluster. We recommended carefully checking for dependencies of running services. The following steps provide an outline for stopping and starting the cluster:
- Tell the Ceph cluster not to mark OSDs as out: - cephuser@adm >- cephosd set noout
- Stop daemons and nodes in the following order: - Storage clients 
- Gateways, for example NFS Ganesha or Object Gateway 
- Metadata Server 
- Ceph OSD 
- Ceph Manager 
- Ceph Monitor 
 
- If required, perform maintenance tasks. 
- Start the nodes and servers in the reverse order of the shutdown process: - Ceph Monitor 
- Ceph Manager 
- Ceph OSD 
- Metadata Server 
- Gateways, for example NFS Ganesha or Object Gateway 
- Storage clients 
 
- Remove the noout flag: - cephuser@adm >- cephosd unset noout
13.9 Removing an entire Ceph cluster #
   The ceph-salt purge command removes the entire Ceph
   cluster. If there are more Ceph clusters deployed, the one reported by
   ceph -s is purged. This way you can clean the cluster
   environment when testing different setups.
  
To prevent accidental deletion, the orchestration checks if the safety is disengaged. You can disengage the safety measures and remove the Ceph cluster by running:
root@master #ceph-salt disengage-safetyroot@master #ceph-salt purge