Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE Enterprise Storage 6

5 Deploying with DeepSea/Salt Edit source

Salt along with DeepSea is a stack of components that help you deploy and manage server infrastructure. It is very scalable, fast, and relatively easy to get running. Read the following considerations before you start deploying the cluster with Salt:

  • Salt minions are the nodes controlled by a dedicated node called Salt master. Salt minions have roles, for example Ceph OSD, Ceph Monitor, Ceph Manager, Object Gateway, iSCSI Gateway, or NFS Ganesha.

  • A Salt master runs its own Salt minion. It is required for running privileged tasks—for example creating, authorizing, and copying keys to minions—so that remote minions never need to run privileged tasks.

    Tip
    Tip: Sharing Multiple Roles per Server

    You will get the best performance from your Ceph cluster when each role is deployed on a separate node. But real deployments sometimes require sharing one node for multiple roles. To avoid trouble with performance and the upgrade procedure, do not deploy the Ceph OSD, Metadata Server, or Ceph Monitor role to the Admin Node.

  • Salt minions need to correctly resolve the Salt master's host name over the network. By default, they look for the salt host name, but you can specify any other network-reachable host name in the /etc/salt/minion file, see Section 5.3, “Cluster Deployment”.

5.1 Read the Release Notes Edit source

In the release notes you can find additional information on changes since the previous release of SUSE Enterprise Storage. Check the release notes to see whether:

  • your hardware needs special considerations.

  • any used software packages have changed significantly.

  • special precautions are necessary for your installation.

The release notes also provide information that could not make it into the manual on time. They also contain notes about known issues.

After having installed the package release-notes-ses, find the release notes locally in the directory /usr/share/doc/release-notes or online at https://www.suse.com/releasenotes/.

5.2 Introduction to DeepSea Edit source

The goal of DeepSea is to save the administrator time and confidently perform complex operations on a Ceph cluster.

Ceph is a very configurable software solution. It increases both the freedom and responsibility of system administrators.

The minimal Ceph setup is good for demonstration purposes, but does not show interesting features of Ceph that you can see with a big number of nodes.

DeepSea collects and stores data about individual servers, such as addresses and device names. For a distributed storage system such as Ceph, there can be hundreds of such items to collect and store. Collecting the information and entering the data manually into a configuration management tool is exhausting and error prone.

The steps necessary to prepare the servers, collect the configuration, and configure and deploy Ceph are mostly the same. However, this does not address managing the separate functions. For day to day operations, the ability to trivially add hardware to a given function and remove it gracefully is a requirement.

DeepSea addresses these observations with the following strategy: DeepSea consolidates the administrator's decisions in a single file. The decisions include cluster assignment, role assignment and profile assignment. And DeepSea collects each set of tasks into a simple goal. Each goal is a stage:

DeepSea Stages Description
  • Stage 0—the preparation— during this stage, all required updates are applied and your system may be rebooted.

    Important
    Important: Re-run Stage 0 after the Admin Node Reboot

    If the Admin Node reboots during stage 0 to load the new kernel version, you need to run stage 0 again, otherwise minions will not be targeted.

  • Stage 1—the discovery—here all hardware in your cluster is being detected and necessary information for the Ceph configuration is being collected. For details about configuration, refer to Section 5.5, “Configuration and Customization”.

  • Stage 2—the configuration—you need to prepare configuration data in a particular format.

  • Stage 3—the deployment—creates a basic Ceph cluster with mandatory Ceph services. See Section 1.2.3, “Ceph Nodes and Daemons” for their list.

  • Stage 4—the services—additional features of Ceph like iSCSI, Object Gateway and CephFS can be installed in this stage. Each is optional.

  • Stage 5—the removal stage. This stage is not mandatory and during the initial setup it is usually not needed. In this stage the roles of minions and also the cluster configuration are removed. You need to run this stage when you need to remove a storage node from your cluster. For details refer to Section 2.3, “Removing and Reinstalling Cluster Nodes”.

5.2.1 Organization and Important Locations Edit source

Salt has several standard locations and several naming conventions used on your master node:

/srv/pillar

The directory stores configuration data for your cluster minions. Pillar is an interface for providing global configuration values to all your cluster minions.

/srv/salt/

The directory stores Salt state files (also called sls files). State files are formatted descriptions of states in which the cluster should be.

/srv/module/runners

The directory stores Python scripts known as runners. Runners are executed on the master node.

/srv/salt/_modules

The directory stores Python scripts that are called modules. The modules are applied to all minions in your cluster.

/srv/pillar/ceph

The directory is used by DeepSea. Collected configuration data are stored here.

/srv/salt/ceph

A directory used by DeepSea. It stores sls files that can be in different formats, but each subdirectory contains sls files. Each subdirectory contains only one type of sls file. For example, /srv/salt/ceph/stage contains orchestration files that are executed by salt-run state.orchestrate.

5.2.2 Targeting the Minions Edit source

DeepSea commands are executed via the Salt infrastructure. When using the salt command, you need to specify a set of Salt minions that the command will affect. We describe the set of the minions as a target for the salt command. The following sections describe possible methods to target the minions.

5.2.2.1 Matching the Minion Name Edit source

You can target a minion or a group of minions by matching their names. A minion's name is usually the short host name of the node where the minion runs. This is a general Salt targeting method, not related to DeepSea. You can use globbing, regular expressions, or lists to limit the range of minion names. The general syntax follows:

root@master # salt target example.module
Tip
Tip: Ceph-only Cluster

If all Salt minions in your environment belong to your Ceph cluster, you can safely substitute target with '*' to include all registered minions.

Match all minions in the example.net domain (assuming the minion names are identical to their "full" host names):

root@master # salt '*.example.net' test.ping

Match the 'web1' to 'web5' minions:

root@master # salt 'web[1-5]' test.ping

Match both 'web1-prod' and 'web1-devel' minions using a regular expression:

root@master # salt -E 'web1-(prod|devel)' test.ping

Match a simple list of minions:

root@master # salt -L 'web1,web2,web3' test.ping

Match all minions in the cluster:

root@master # salt '*' test.ping

5.2.2.2 Targeting with a DeepSea Grain Edit source

In a heterogeneous Salt-managed environment where SUSE Enterprise Storage 6 is deployed on a subset of nodes alongside other cluster solutions, you need to mark the relevant minions by applying a 'deepsea' grain to them before running DeepSea stage 0. This way, you can easily target DeepSea minions in environments where matching by the minion name is problematic.

To apply the 'deepsea' grain to a group of minions, run:

root@master # salt target grains.append deepsea default

To remove the 'deepsea' grain from a group of minions, run:

root@master # salt target grains.delval deepsea destructive=True

After applying the 'deepsea' grain to the relevant minions, you can target them as follows:

root@master # salt -G 'deepsea:*' test.ping

The following command is an equivalent:

root@master # salt -C 'G@deepsea:*' test.ping

5.2.2.3 Set the deepsea_minions Option Edit source

Setting the deepsea_minions option's target is a requirement for DeepSea deployments. DeepSea uses it to instruct minions during the execution of stages (refer to DeepSea Stages Description for details.

To set or change the deepsea_minions option, edit the /srv/pillar/ceph/deepsea_minions.sls file on the Salt master and add or replace the following line:

deepsea_minions: target
Tip
Tip: deepsea_minions Target

As the target for the deepsea_minions option, you can use any targeting method: both Matching the Minion Name and Targeting with a DeepSea Grain.

Match all Salt minions in the cluster:

deepsea_minions: '*'

Match all minions with the 'deepsea' grain:

deepsea_minions: 'G@deepsea:*'

5.2.2.4 For More Information Edit source

You can use more advanced ways to target minions using the Salt infrastructure. The 'deepsea-minions' manual page gives you more details about DeepSea targeting (man 7 deepsea_minions).

5.3 Cluster Deployment Edit source

The cluster deployment process has several phases. First, you need to prepare all nodes of the cluster by configuring Salt and then deploy and configure Ceph.

Tip
Tip: Deploying Monitor Nodes without Defining OSD Profiles

If you need to skip defining storage roles for OSD as described in Section 5.5.1.2, “Role Assignment” and deploy Ceph Monitor nodes first, you can do so by setting the DEV_ENV variable.

This allows deploying monitors without the presence of the role-storage/ directory, as well as deploying a Ceph cluster with at least one storage, monitor, and manager role.

To set the environment variable, either enable it globally by setting it in the /srv/pillar/ceph/stack/global.yml file, or set it for the current shell session only:

root@master # export DEV_ENV=true

As an example, /srv/pillar/ceph/stack/global.yml can be created with the following contents:

DEV_ENV: True

The following procedure describes the cluster preparation in detail.

  1. Install and register SUSE Linux Enterprise Server 15 SP1 together with the SUSE Enterprise Storage 6 extension on each node of the cluster.

  2. Verify that proper products are installed and registered by listing existing software repositories. Run zypper lr -E and compare the output with the following list:

     SLE-Product-SLES15-SP1-Pool
     SLE-Product-SLES15-SP1-Updates
     SLE-Module-Server-Applications15-SP1-Pool
     SLE-Module-Server-Applications15-SP1-Updates
     SLE-Module-Basesystem15-SP1-Pool
     SLE-Module-Basesystem15-SP1-Updates
     SUSE-Enterprise-Storage-6-Pool
     SUSE-Enterprise-Storage-6-Updates
  3. Configure network settings including proper DNS name resolution on each node. The Salt master and all the Salt minions need to resolve each other by their host names. For more information on configuring a network, see https://documentation.suse.com/sles/15-SP1/single-html/SLES-admin/#sec-network-yast For more information on configuring a DNS server, see https://documentation.suse.com/sles/15-SP1/single-html/SLES-admin/#cha-dns.

    Important
    Important

    If cluster nodes are configured for multiple networks, DeepSea will use the network to which their host names (or FQDN) resolves. Consider the following example /etc/hosts:

    192.168.100.1 ses1.example.com       ses1
    172.16.100.1  ses1clus.cluster.lan   ses1clus

    In the above example, the ses1 minion will resolve to the 192.168.100.x network and DeepSea will use this network as the public network. If the desired public network is 172.16.100.x, then host name should be changed to ses1clus.

  4. Install the salt-master and salt-minion packages on the Salt master node:

    root@master # zypper in salt-master salt-minion

    Check that the salt-master service is enabled and started, and enable and start it if needed:

    root@master # systemctl enable salt-master.service
    root@master # systemctl start salt-master.service
  5. If you intend to use firewall, verify that the Salt master node has ports 4505 and 4506 open to all Salt minion nodes. If the ports are closed, you can open them using the yast2 firewall command by allowing the SaltStack service.

    Warning
    Warning: DeepSea Stages Fail with Firewall

    DeepSea deployment stages fail when firewall is active (and even configured). To pass the stages correctly, you need to either turn the firewall off by running

        root # systemctl stop firewalld.service

    or set the FAIL_ON_WARNING option to 'False' in /srv/pillar/ceph/stack/global.yml:

    FAIL_ON_WARNING: False
  6. Install the package salt-minion on all minion nodes.

    root@minion > zypper in salt-minion

    Make sure that the fully qualified domain name of each node can be resolved to the public network IP address by all other nodes.

  7. Configure all minions (including the master minion) to connect to the master. If your Salt master is not reachable by the host name salt, edit the file /etc/salt/minion or create a new file /etc/salt/minion.d/master.conf with the following content:

    master: host_name_of_salt_master

    If you performed any changes to the configuration files mentioned above, restart the Salt service on all Salt minions:

    root@minion > systemctl restart salt-minion.service
  8. Check that the salt-minion service is enabled and started on all nodes. Enable and start it if needed:

    root # systemctl enable salt-minion.service
    root # systemctl start salt-minion.service
  9. Verify each Salt minion's fingerprint and accept all salt keys on the Salt master if the fingerprints match.

    Note
    Note

    If the Salt minion fingerprint comes back empty, make sure the Salt minion has a Salt master configuration and it can communicate with the Salt master.

    View each minion's fingerprint:

    root@master # salt-call --local key.finger
    local:
    3f:a3:2f:3f:b4:d3:d9:24:49:ca:6b:2c:e1:6c:3f:c3:83:37:f0:aa:87:42:e8:ff...

    After gathering fingerprints of all the Salt minions, list fingerprints of all unaccepted minion keys on the Salt master:

    root@master # salt-key -F
    [...]
    Unaccepted Keys:
    minion1:
    3f:a3:2f:3f:b4:d3:d9:24:49:ca:6b:2c:e1:6c:3f:c3:83:37:f0:aa:87:42:e8:ff...

    If the minions' fingerprints match, accept them:

    root@master # salt-key --accept-all
  10. Verify that the keys have been accepted:

    root@master # salt-key --list-all
  11. By default, DeepSea uses the Admin Node as the time server for other cluster nodes. Therefore, if the Admin Node is not virtualized, select one or more time servers or pools, and synchronize the local time against them. Verify that the time synchronization service is enabled on each system start-up. Find more information on setting up time synchronization in https://documentation.suse.com/sles/15-SP1/html/SLES-all/cha-ntp.html#sec-ntp-yast.

    If the Admin Node is a virtual machine, provide better time sources for the cluster nodes by overriding the default NTP client configuration:

    1. Edit /srv/pillar/ceph/stack/global.yml on the Salt master node and add the following line:

      time_server: CUSTOM_NTP_SERVER

      To add multiple time servers, the format is as follows:

      time_server:
        - CUSTOM_NTP_SERVER1
        - CUSTOM_NTP_SERVER2
        - CUSTOM_NTP_SERVER3
      [...]
    2. Refresh the Salt pillar:

      root@master # salt '*' saltutil.pillar_refresh
    3. Verify the changed value:

      root@master # salt '*' pillar.items
    4. Apply the new setting:

      root@master # salt '*' state.apply ceph.time
  12. Prior to deploying SUSE Enterprise Storage 6, manually zap all the disks. Remember to replace 'X' with the correct disk letter:

    1. Stop all processes that are using the specific disk.

    2. Verify whether any partition on the disk is mounted, and unmount if needed.

    3. If the disk is managed by LVM, deactivate and delete the whole LVM infrastructure. Refer to https://documentation.suse.com/sles/15-SP1/html/SLES-all/cha-lvm.html for more details.

    4. If the disk is part of MD RAID, deactivate the RAID. Refer to https://documentation.suse.com/sles/15-SP1/single-html/SLES-storage/#part-software-raid for more details.

    5. Tip
      Tip: Rebooting the Server

      If you get error messages such as 'partition in use' or 'kernel cannot be updated with the new partition table' during the following steps, reboot the server.

      Wipe data and partitions on the disk:

      cephadm@adm > ceph-volume lvm zap /dev/sdX --destroy
    6. Verify that the drive is empty (with no GPT structures) using:

      root # parted -s /dev/sdX print free

      or

      root # dd if=/dev/sdX bs=512 count=34 | hexdump -C
      root # dd if=/dev/sdX bs=512 count=33 \
        skip=$((`blockdev --getsz /dev/sdX` - 33)) | hexdump -C
  13. Optionally, if you need to preconfigure the cluster's network settings before the deepsea package is installed, create /srv/pillar/ceph/stack/ceph/cluster.yml manually and set the cluster_network: and public_network: options. Note that the file will not be overwritten after you install deepsea. Then, run:

    chown -R salt:salt /srv/pillar/ceph/stack
    Tip
    Tip: Enabling IPv6

    If you need to enable IPv6 network addressing, refer to Section 7.2.1, “Enabling IPv6 for Ceph Cluster Deployment”

  14. Install DeepSea on the Salt master node:

    root@master # zypper in deepsea
  15. The value of the master_minion parameter is dynamically derived from the /etc/salt/minion_id file on the Salt master. If you need to override the discovered value, edit the file /srv/pillar/ceph/stack/global.yml and set a relevant value:

    master_minion: MASTER_MINION_NAME

    If your Salt master is reachable via more host names, use the Salt minion name for the storage cluster as returned by the salt-key -L command. If you used the default host name for your Salt master—salt—in the ses domain, then the file looks as follows:

    master_minion: salt.ses

Now you deploy and configure Ceph. Unless specified otherwise, all steps are mandatory.

Note
Note: Salt Command Conventions

There are two possible ways to run salt-run state.orch—one is with 'stage.STAGE_NUMBER', the other is with the name of the stage. Both notations have the same impact and it is fully your preference which command you use.

Procedure 5.1: Running Deployment Stages
  1. Ensure the Salt minions belonging to the Ceph cluster are correctly targeted through the deepsea_minions option in /srv/pillar/ceph/deepsea_minions.sls. Refer to Section 5.2.2.3, “Set the deepsea_minions Option” for more information.

  2. By default, DeepSea deploys Ceph clusters with tuned profiles active on Ceph Monitor, Ceph Manager, and Ceph OSD nodes. In some cases, you may need to deploy without tuned profiles. To do so, put the following lines in /srv/pillar/ceph/stack/global.yml before running DeepSea stages:

    alternative_defaults:
     tuned_mgr_init: default-off
     tuned_mon_init: default-off
     tuned_osd_init: default-off
  3. Optional: create Btrfs sub-volumes for /var/lib/ceph/. This step needs to be executed before DeepSea stage.0. To migrate existing directories or for more details, see Section 33.6, “Btrfs Subvolume for /var/lib/ceph on Ceph Monitor Nodes”.

    Apply the following commands to each of the Salt minions:

    root@master # salt 'MONITOR_NODES' saltutil.sync_all
    root@master # salt 'MONITOR_NODES' state.apply ceph.subvolume
    Note
    Note

    The Ceph.subvolume command creates /var/lib/ceph as a @/var/lib/ceph Btrfs subvolume.

    The new subvolume is now mounted and /etc/fstab is updated.

  4. Prepare your cluster. Refer to DeepSea Stages Description for more details.

    root@master # salt-run state.orch ceph.stage.0

    or

    root@master # salt-run state.orch ceph.stage.prep
    Note
    Note: Run or Monitor Stages using DeepSea CLI

    Using the DeepSea CLI, you can follow the stage execution progress in real-time, either by running the DeepSea CLI in the monitoring mode, or by running the stage directly through DeepSea CLI. For details refer to Section 5.4, “DeepSea CLI”.

  5. The discovery stage collects data from all minions and creates configuration fragments that are stored in the directory /srv/pillar/ceph/proposals. The data are stored in the YAML format in *.sls or *.yml files.

    Run the following command to trigger the discovery stage:

    root@master # salt-run state.orch ceph.stage.1

    or

    root@master # salt-run state.orch ceph.stage.discovery
  6. After the previous command finishes successfully, create a policy.cfg file in /srv/pillar/ceph/proposals. For details refer to Section 5.5.1, “The policy.cfg File”.

    Tip
    Tip

    If you need to change the cluster's network setting, edit /srv/pillar/ceph/stack/ceph/cluster.yml and adjust the lines starting with cluster_network: and public_network:.

  7. The configuration stage parses the policy.cfg file and merges the included files into their final form. Cluster and role related content are placed in /srv/pillar/ceph/cluster, while Ceph specific content is placed in /srv/pillar/ceph/stack/default.

    Run the following command to trigger the configuration stage:

    root@master # salt-run state.orch ceph.stage.2

    or

    root@master # salt-run state.orch ceph.stage.configure

    The configuration step may take several seconds. After the command finishes, you can view the pillar data for the specified minions (for example, named ceph_minion1, ceph_minion2, etc.) by running:

    root@master # salt 'ceph_minion*' pillar.items
    Tip
    Tip: Modifying OSD's Layout

    If you want to modify the default OSD's layout and change the drive groups configuration, follow the procedure described in Section 5.5.2, “DriveGroups”.

    Note
    Note: Overwriting Defaults

    As soon as the command finishes, you can view the default configuration and change it to suit your needs. For details refer to Chapter 7, Customizing the Default Configuration.

  8. Now you run the deployment stage. In this stage, the pillar is validated, and the Ceph Monitor and Ceph OSD daemons are started:

    root@master # salt-run state.orch ceph.stage.3

    or

    root@master # salt-run state.orch ceph.stage.deploy

    The command may take several minutes. If it fails, you need to fix the issue and run the previous stages again. After the command succeeds, run the following to check the status:

    cephadm@adm > ceph -s
  9. The last step of the Ceph cluster deployment is the services stage. Here you instantiate any of the currently supported services: iSCSI Gateway, CephFS, Object Gateway, and NFS Ganesha. In this stage, the necessary pools, authorizing keyrings, and starting services are created. To start the stage, run the following:

    root@master # salt-run state.orch ceph.stage.4

    or

    root@master # salt-run state.orch ceph.stage.services

    Depending on the setup, the command may run for several minutes.

  10. Disable insecure clients. Since Nautilus v14.2.20, a new health warning was introduced that informs you that insecure clients are allowed to join the cluster. This warning is on by default. The Ceph Dashboard will show the cluster in the HEALTH_WARN status and verifying the cluster status on the command line informs you as follows:

    cephadm@adm > ceph status
    cluster:
      id:     3fe8b35a-689f-4970-819d-0e6b11f6707c
      health: HEALTH_WARN
      mons are allowing insecure global_id reclaim
    [...]

    This warning means that the Ceph Monitors are still allowing old, unpatched clients to connect to the cluster. This ensures existing clients can still connect while the cluster is being upgraded, but warns you that there is a problem that needs to be addressed. When the cluster and all clients are upgraded to the latest version of Ceph, disallow unpatched clients by running the following command:

    cephadm@adm > ceph config set mon auth_allow_insecure_global_id_reclaim false
  11. Before you continue, we strongly recommend enabling the Ceph telemetry module. For more information, see Section 21.2, “Telemetry Module” for information and instructions.

5.4 DeepSea CLI Edit source

DeepSea also provides a command line interface (CLI) tool that allows the user to monitor or run stages while visualizing the execution progress in real-time. Verify that the deepsea-cli package is installed before you run the deepsea executable.

Two modes are supported for visualizing a stage's execution progress:

DeepSea CLI Modes
  • Monitoring mode: visualizes the execution progress of a DeepSea stage triggered by the salt-run command issued in another terminal session.

  • Stand-alone mode: runs a DeepSea stage while providing real-time visualization of its component steps as they are executed.

Important
Important: DeepSea CLI Commands

The DeepSea CLI commands can only be run on the Salt master node with the root privileges.

5.4.1 DeepSea CLI: Monitor Mode Edit source

The progress monitor provides a detailed, real-time visualization of what is happening during execution of stages using salt-run state.orch commands in other terminal sessions.

Tip
Tip: Start Monitor in a New Terminal Session

You need to start the monitor in a new terminal window before running any salt-run state.orch so that the monitor can detect the start of the stage's execution.

If you start the monitor after issuing the salt-run state.orch command, then no execution progress will be shown.

You can start the monitor mode by running the following command:

root@master # deepsea monitor

For more information about the available command line options of the deepsea monitor command, check its manual page:

root@master # man deepsea-monitor

5.4.2 DeepSea CLI: Stand-alone Mode Edit source

In the stand-alone mode, DeepSea CLI can be used to run a DeepSea stage, showing its execution in real-time.

The command to run a DeepSea stage from the DeepSea CLI has the following form:

root@master # deepsea stage run stage-name

where stage-name corresponds to the way Salt orchestration state files are referenced. For example, stage deploy, which corresponds to the directory located in /srv/salt/ceph/stage/deploy, is referenced as ceph.stage.deploy.

This command is an alternative to the Salt-based commands for running DeepSea stages (or any DeepSea orchestration state file).

The command deepsea stage run ceph.stage.0 is equivalent to salt-run state.orch ceph.stage.0.

For more information about the available command line options accepted by the deepsea stage run command, check its manual page:

root@master # man deepsea-stage run

In the following figure shows an example of the output of the DeepSea CLI when running Stage 2:

DeepSea CLI stage execution progress output
Figure 5.1: DeepSea CLI stage execution progress output

5.4.2.1 DeepSea CLI stage run Alias Edit source

For advanced users of Salt, we also support an alias for running a DeepSea stage that takes the Salt command used to run a stage, for example, salt-run state.orch stage-name, as a command of the DeepSea CLI.

Example:

root@master # deepsea salt-run state.orch stage-name

5.5 Configuration and Customization Edit source

5.5.1 The policy.cfg File Edit source

The /srv/pillar/ceph/proposals/policy.cfg configuration file is used to determine roles of individual cluster nodes. For example, which nodes act as Ceph OSDs or Ceph Monitors. Edit policy.cfg in order to reflect your desired cluster setup. The order of the sections is arbitrary, but the content of included lines overwrites matching keys from the content of previous lines.

Tip
Tip: Examples of policy.cfg

You can find several examples of complete policy files in the /usr/share/doc/packages/deepsea/examples/ directory.

5.5.1.1 Cluster Assignment Edit source

In the cluster section you select minions for your cluster. You can select all minions, or you can blacklist or whitelist minions. Examples for a cluster called ceph follow.

To include all minions, add the following lines:

cluster-ceph/cluster/*.sls

To whitelist a particular minion:

cluster-ceph/cluster/abc.domain.sls

or a group of minions—you can shell glob matching:

cluster-ceph/cluster/mon*.sls

To blacklist minions, set the them to unassigned:

cluster-unassigned/cluster/client*.sls

5.5.1.2 Role Assignment Edit source

This section provides you with details on assigning 'roles' to your cluster nodes. A 'role' in this context means the service you need to run on the node, such as Ceph Monitor, Object Gateway, or iSCSI Gateway. No role is assigned automatically, only roles added to policy.cfg will be deployed.

The assignment follows this pattern:

role-ROLE_NAME/PATH/FILES_TO_INCLUDE

Where the items have the following meaning and values:

  • ROLE_NAME is any of the following: 'master', 'admin', 'mon', 'mgr', 'storage', 'mds', 'igw', 'rgw', 'ganesha', 'grafana', or 'prometheus'.

  • PATH is a relative directory path to .sls or .yml files. In case of .sls files, it usually is cluster, while .yml files are located at stack/default/ceph/minions.

  • FILES_TO_INCLUDE are the Salt state files or YAML configuration files. They normally consist of Salt minions' host names, for example ses5min2.yml. Shell globbing can be used for more specific matching.

An example for each role follows:

  • master - the node has admin keyrings to all Ceph clusters. Currently, only a single Ceph cluster is supported. As the master role is mandatory, always add a similar line to the following:

    role-master/cluster/master*.sls
  • admin - the minion will have an admin keyring. You define the role as follows:

    role-admin/cluster/abc*.sls
  • mon - the minion will provide the monitor service to the Ceph cluster. This role requires addresses of the assigned minions. From SUSE Enterprise Storage 5, the public addresses are calculated dynamically and are no longer needed in the Salt pillar.

    role-mon/cluster/mon*.sls

    The example assigns the monitor role to a group of minions.

  • mgr - the Ceph manager daemon which collects all the state information from the whole cluster. Deploy it on all minions where you plan to deploy the Ceph monitor role.

    role-mgr/cluster/mgr*.sls
  • storage - use this role to specify storage nodes.

    role-storage/cluster/data*.sls
  • mds - the minion will provide the metadata service to support CephFS.

    role-mds/cluster/mds*.sls
  • igw - the minion will act as an iSCSI Gateway. This role requires addresses of the assigned minions, thus you need to also include the files from the stack directory:

    role-igw/cluster/*.sls
  • rgw - the minion will act as an Object Gateway:

    role-rgw/cluster/rgw*.sls
  • ganesha - the minion will act as an NFS Ganesha server. The 'ganesha' role requires either an 'rgw' or 'mds' role in cluster, otherwise the validation will fail in Stage 3.

    role-ganesha/cluster/ganesha*.sls

    To successfully install NFS Ganesha, additional configuration is required. If you want to use NFS Ganesha, read Chapter 12, Installation of NFS Ganesha before executing stages 2 and 4. However, it is possible to install NFS Ganesha later.

    In some cases it can be useful to define custom roles for NFS Ganesha nodes. For details, see Section 30.3, “Custom NFS Ganesha Roles”.

  • grafana, prometheus - this node adds Grafana charts based on Prometheus alerting to the Ceph Dashboard. Refer to Part II, “Ceph Dashboard” for its detailed description.

    role-grafana/cluster/grafana*.sls
    role-prometheus/cluster/prometheus*.sls
Note
Note: Multiple Roles of Cluster Nodes

You can assign several roles to a single node. For example, you can assign the 'mds' roles to the monitor nodes:

role-mds/cluster/mon[1,2]*.sls

5.5.1.3 Common Configuration Edit source

The common configuration section includes configuration files generated during the discovery (Stage 1). These configuration files store parameters like fsid or public_network. To include the required Ceph common configuration, add the following lines:

config/stack/default/global.yml
config/stack/default/ceph/cluster.yml

5.5.1.4 Item Filtering Edit source

Sometimes it is not practical to include all files from a given directory with *.sls globbing. The policy.cfg file parser understands the following filters:

Warning
Warning: Advanced Techniques

This section describes filtering techniques for advanced users. When not used correctly, filtering can cause problems for example in case your node numbering changes.

slice=[start:end]

Use the slice filter to include only items start through end-1. Note that items in the given directory are sorted alphanumerically. The following line includes the third to fifth files from the role-mon/cluster/ subdirectory:

role-mon/cluster/*.sls slice[3:6]
re=regexp

Use the regular expression filter to include only items matching the given expressions. For example:

role-mon/cluster/mon*.sls re=.*1[135]\.subdomainX\.sls$

5.5.1.5 Example policy.cfg File Edit source

Following is an example of a basic policy.cfg file:

## Cluster Assignment
cluster-ceph/cluster/*.sls 1

## Roles
# ADMIN
role-master/cluster/examplesesadmin.sls 2
role-admin/cluster/sesclient*.sls 3

# MON
role-mon/cluster/ses-example-[123].sls 4

# MGR
role-mgr/cluster/ses-example-[123].sls 5

# STORAGE
role-storage/cluster/ses-example-[5678].sls 6

# MDS
role-mds/cluster/ses-example-4.sls 7

# IGW
role-igw/cluster/ses-example-4.sls 8

# RGW
role-rgw/cluster/ses-example-4.sls 9

# COMMON
config/stack/default/global.yml 10
config/stack/default/ceph/cluster.yml 11

1

Indicates that all minions are included in the Ceph cluster. If you have minions you do not want to include in the Ceph cluster, use:

cluster-unassigned/cluster/*.sls
cluster-ceph/cluster/ses-example-*.sls

The first line marks all minions as unassigned. The second line overrides minions matching 'ses-example-*.sls', and assigns them to the Ceph cluster.

2

The minion called 'examplesesadmin' has the 'master' role. This, by the way, means it will get admin keys to the cluster.

3

All minions matching 'sesclient*' will get admin keys as well.

4

All minions matching 'ses-example-[123]' (presumably three minions: ses-example-1, ses-example-2, and ses-example-3) will be set up as MON nodes.

5

All minions matching 'ses-example-[123]' (all MON nodes in the example) will be set up as MGR nodes.

6

All minions matching 'ses-example-[5678]' will be set up as storage nodes.

7

Minion 'ses-example-4' will have the MDS role.

8

Minion 'ses-example-4' will have the IGW role.

9

Minion 'ses-example-4' will have the RGW role.

10

Means that we accept the default values for common configuration parameters such as fsid and public_network.

11

Means that we accept the default values for common configuration parameters such as fsid and public_network.

5.5.2 DriveGroups Edit source

DriveGroups specify the layouts of OSDs in the Ceph cluster. They are defined in a single file /srv/salt/ceph/configuration/files/drive_groups.yml.

An administrator should manually specify a group of OSDs that are interrelated (hybrid OSDs that are deployed on solid state and spinners) or share the same deployment options (identical, for example same object store, same encryption option, stand-alone OSDs). To avoid explicitly listing devices, DriveGroups use a list of filter items that correspond to a few selected fields of ceph-volume's inventory reports. In the simplest case this could be the 'rotational' flag (all solid-state drives are to be db_devices, all rotating ones data devices) or something more involved such as 'model' strings, or sizes. DeepSea will provide code that translates these DriveGroups into actual device lists for inspection by the user.

Note
Note

Note that the filters use an OR gate to match against the drives.

Following is a simple procedure that demonstrates the basic workflow when configuring DriveGroups:

  1. Inspect your disks' properties as seen by the ceph-volume command. Only these properties are accepted by DriveGroups:

    root@master # salt-run disks.details
  2. Open the /srv/salt/ceph/configuration/files/drive_groups.yml YAML file and adjust to your needs. Refer to Section 5.5.2.1, “Specification”. Remember to use spaces instead of tabs. Find more advanced examples in Section 5.5.2.4, “Examples”. The following example includes all drives available to Ceph as OSDs:

    default_drive_group_name:
      target: '*'
      data_devices:
        all: true
  3. Verify new layouts:

    root@master # salt-run disks.list

    This runner returns you a structure of matching disks based on your DriveGroups. If you are not happy with the result, repeat the previous step.

    Tip
    Tip: Detailed Report

    In addition to the disks.list runner, there is a disks.report runner that prints out a detailed report of what will happen in the next DeepSea stage 3 invocation.

    root@master # salt-run disks.report
  4. Deploy OSDs. On the next DeepSea stage 3 invocation, the OSD disks will be deployed according to your DriveGroups specification.

5.5.2.1 Specification Edit source

/srv/salt/ceph/configuration/files/drive_groups.yml can take one of two basic forms, depending on whether BlueStore or FileStore is to be used. For BlueStore setups, drive_groups.yml can be as follows:

drive_group_default_name:
  target: *
  data_devices:
    drive_spec: DEVICE_SPECIFICATION
  db_devices:
    drive_spec: DEVICE_SPECIFICATION
  wal_devices:
    drive_spec: DEVICE_SPECIFICATION
  block_wal_size: '5G'  # (optional, unit suffixes permitted)
  block_db_size: '5G'   # (optional, unit suffixes permitted)
  osds_per_device: 1   # number of osd daemons per device
  format:              # 'bluestore' or 'filestore' (defaults to 'bluestore')
  encryption:           # 'True' or 'False' (defaults to 'False')

For FileStore setups, drive_groups.yml can be as follows:

drive_group_default_name:
  target: *
  data_devices:
    drive_spec: DEVICE_SPECIFICATION
  journal_devices:
    drive_spec: DEVICE_SPECIFICATION
  format: filestore
  encryption: True
Note
Note

If you are unsure if your OSD is encrypted, see Section 2.5, “Verify an Encrypted OSD”.

5.5.2.2 Matching Disk Devices Edit source

You can describe the specification using the following filters:

  • By a disk model:

    model: DISK_MODEL_STRING
  • By a disk vendor:

    vendor: DISK_VENDOR_STRING
    Tip
    Tip: Lowercase Vendor String

    Always lowercase the DISK_VENDOR_STRING.

  • Whether a disk is rotational or not. SSDs and NVME drives are not rotational.

    rotational: 0
  • Deploy a node using all available drives for OSDs:

    data_devices:
      all: true
  • Additionally, by limiting the number of matching disks:

    limit: 10

5.5.2.3 Filtering Devices by Size Edit source

You can filter disk devices by their size—either by an exact size, or a size range. The size: parameter accepts arguments in the following form:

  • '10G' - Includes disks of an exact size.

  • '10G:40G' - Includes disks whose size is within the range.

  • ':10G' - Includes disks less than or equal to 10 GB in size.

  • '40G:' - Includes disks equal to or greater than 40 GB in size.

Example 5.1: Matching by Disk Size
drive_group_default:
  target: '*'
  data_devices:
    size: '40TB:'
  db_devices:
    size: ':2TB'
Note
Note: Quotes Required

When using the ':' delimiter, you need to enclose the size in quotes, otherwise the ':' sign will be interpreted as a new configuration hash.

Tip
Tip: Unit Shortcuts

Instead of (G)igabytes, you can specify the sizes in (M)egabytes or (T)erabytes as well.

5.5.2.4 Examples Edit source

This section includes examples of different OSD setups.

Example 5.2: Simple Setup

This example describes two nodes with the same setup:

  • 20 HDDs

    • Vendor: Intel

    • Model: SSD-123-foo

    • Size: 4 TB

  • 2 SSDs

    • Vendor: Micron

    • Model: MC-55-44-ZX

    • Size: 512 GB

The corresponding drive_groups.yml file will be as follows:

drive_group_default:
  target: '*'
  data_devices:
    model: SSD-123-foo
  db_devices:
    model: MC-55-44-XZ

Such a configuration is simple and valid. The problem is that an administrator may add disks from different vendors in the future, and these will not be included. You can improve it by reducing the filters on core properties of the drives:

drive_group_default:
  target: '*'
  data_devices:
    rotational: 1
  db_devices:
    rotational: 0

In the previous example, we are enforcing all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as 'shared devices' (wal, db).

If you know that drives with more than 2 TB will always be the slower data devices, you can filter by size:

drive_group_default:
  target: '*'
  data_devices:
    size: '2TB:'
  db_devices:
    size: ':2TB'
Example 5.3: Advanced Setup

This example describes two distinct setups: 20 HDDs should share 2 SSDs, while 10 SSDs should share 2 NVMes.

  • 20 HDDs

    • Vendor: Intel

    • Model: SSD-123-foo

    • Size: 4 TB

  • 12 SSDs

    • Vendor: Micron

    • Model: MC-55-44-ZX

    • Size: 512 GB

  • 2 NVMes

    • Vendor: Samsung

    • Model: NVME-QQQQ-987

    • Size: 256 GB

Such a setup can be defined with two layouts as follows:

drive_group:
  target: '*'
  data_devices:
    rotational: 0
  db_devices:
    model: MC-55-44-XZ
drive_group_default:
  target: '*'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    vendor: samsung
    size: 256GB

Note that any drive of the size 256 GB and any drive from Samsung will match as a DB device with this example.

Example 5.4: Advanced Setup with Non-uniform Nodes

The previous examples assumed that all nodes have the same drives. However, that is not always the case:

Nodes 1-5:

  • 20 HDDs

    • Vendor: Intel

    • Model: SSD-123-foo

    • Size: 4 TB

  • 2 SSDs

    • Vendor: Micron

    • Model: MC-55-44-ZX

    • Size: 512 GB

Nodes 6-10:

  • 5 NVMes

    • Vendor: Intel

    • Model: SSD-123-foo

    • Size: 4 TB

  • 20 SSDs

    • Vendor: Micron

    • Model: MC-55-44-ZX

    • Size: 512 GB

You can use the 'target' key in the layout to target specific nodes. Salt target notation helps to keep things simple:

drive_group_node_one_to_five:
  target: 'node[1-5]'
  data_devices:
    rotational: 1
  db_devices:
    rotational: 0

followed by

drive_group_the_rest:
  target: 'node[6-10]'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    model: SSD-123-foo
Example 5.5: Expert Setup

All previous cases assumed that the WALs and DBs use the same device. It is however possible to deploy the WAL on a dedicated device as well:

  • 20 HDDs

    • Vendor: Intel

    • Model: SSD-123-foo

    • Size: 4 TB

  • 2 SSDs

    • Vendor: Micron

    • Model: MC-55-44-ZX

    • Size: 512 GB

  • 2 NVMes

    • Vendor: Samsung

    • Model: NVME-QQQQ-987

    • Size: 256 GB

drive_group_default:
  target: '*'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    model: SSD-123-foo
  wal_devices:
    model: NVME-QQQQ-987
Example 5.6: Complex (and Unlikely) Setup

In the following setup, we are trying to define:

  • 20 HDDs backed by 1 NVMe

  • 2 HDDs backed by 1 SSD(db) and 1 NVMe(wal)

  • 8 SSDs backed by 1 NVMe

  • 2 SSDs stand-alone (encrypted)

  • 1 HDD is spare and should not be deployed

The summary of used drives follows:

  • 23 HDDs

    • Vendor: Intel

    • Model: SSD-123-foo

    • Size: 4 TB

  • 10 SSDs

    • Vendor: Micron

    • Model: MC-55-44-ZX

    • Size: 512 GB

  • 1 NVMe

    • Vendor: Samsung

    • Model: NVME-QQQQ-987

    • Size: 256 GB

The DriveGroups definition will be the following:

drive_group_hdd_nvme:
  target: '*'
  data_devices:
    rotational: 0
  db_devices:
    model: NVME-QQQQ-987
drive_group_hdd_ssd_nvme:
  target: '*'
  data_devices:
    rotational: 0
  db_devices:
    model: MC-55-44-XZ
  wal_devices:
    model: NVME-QQQQ-987
drive_group_ssd_nvme:
  target: '*'
  data_devices:
    model: SSD-123-foo
  db_devices:
    model: NVME-QQQQ-987
drive_group_ssd_standalone_encrypted:
  target: '*'
  data_devices:
    model: SSD-123-foo
  encryption: True

One HDD will remain as the file is being parsed from top to bottom.

5.5.3 Adjusting ceph.conf with Custom Settings Edit source

If you need to put custom settings into the ceph.conf configuration file, see Section 2.14, “Adjusting ceph.conf with Custom Settings” for more details.

Print this page