Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / SUSE Enterprise Storage 7.1 Documentation / Deployment Guide / Deploying Ceph Cluster / Deploying the remaining core services using cephadm
Applies to SUSE Enterprise Storage 7.1

8 Deploying the remaining core services using cephadm

After deploying the basic Ceph cluster, deploy core services to more cluster nodes. To make the cluster data accessible to clients, deploy additional services as well.

Currently, we support deployment of Ceph services on the command line by using the Ceph orchestrator (ceph orch subcommands).

8.1 The ceph orch command

The Ceph orchestrator command ceph orch—which is an interface to the cephadm module—will take care of listing cluster components and deploying Ceph services on new cluster nodes.

8.1.1 Displaying the orchestrator status

The following command shows the current mode and status of the Ceph orchestrator.

cephuser@adm > ceph orch status

8.1.2 Listing devices, services, and daemons

To list all disk devices, run the following:

cephuser@adm > ceph orch device ls
Hostname   Path      Type  Serial  Size   Health   Ident  Fault  Available
ses-main   /dev/vdb  hdd   0d8a... 10.7G  Unknown  N/A    N/A    No
ses-node1  /dev/vdc  hdd   8304... 10.7G  Unknown  N/A    N/A    No
ses-node1  /dev/vdd  hdd   7b81... 10.7G  Unknown  N/A    N/A    No
[...]
Tip
Tip: Services and daemons

Service is a general term for a Ceph service of a specific type, for example Ceph Manager.

Daemon is a specific instance of a service, for example, a process mgr.ses-node1.gdlcik running on a node called ses-node1.

To list all services known to cephadm, run:

cephuser@adm > ceph orch ls
NAME  RUNNING  REFRESHED  AGE  PLACEMENT  IMAGE NAME                  IMAGE ID
mgr       1/0  5m ago     -    <no spec>  registry.example.com/[...]  5bf12403d0bd
mon       1/0  5m ago     -    <no spec>  registry.example.com/[...]  5bf12403d0bd
Tip
Tip

You can limit the list to services on a particular node with the optional -–host parameter, and services of a particular type with the optional --service-type parameter. Acceptable types are mon, osd, mgr, mds, and rgw.

To list all running daemons deployed by cephadm, run:

cephuser@adm > ceph orch ps
NAME             HOST      STATUS   REFRESHED AGE VERSION    IMAGE ID     CONTAINER ID
mgr.ses-node1.gd ses-node1 running) 8m ago    12d 15.2.0.108 5bf12403d0bd b8104e09814c
mon.ses-node1    ses-node1 running) 8m ago    12d 15.2.0.108 5bf12403d0bd a719e0087369
Tip
Tip

To query the status of a particular daemon, use --daemon_type and --daemon_id. For OSDs, the ID is the numeric OSD ID. For MDS, the ID is the file system name:

cephuser@adm > ceph orch ps --daemon_type osd --daemon_id 0
cephuser@adm > ceph orch ps --daemon_type mds --daemon_id my_cephfs

8.2 Service and placement specification

The recommended way to specify the deployment of Ceph services is to create a YAML-formatted file with the specification of the services that you intend to deploy.

8.2.1 Creating service specifications

You can create a separate specification file for each type of service, for example:

root@master # cat nfs.yml
service_type: nfs
service_id: EXAMPLE_NFS
placement:
  hosts:
  - ses-node1
  - ses-node2
spec:
  pool: EXAMPLE_POOL
  namespace: EXAMPLE_NAMESPACE

Alternatively, you can specify multiple (or all) service types in one file—for example, cluster.yml—that describes which nodes will run specific services. Remember to separate individual service types with three dashes (---):

cephuser@adm > cat cluster.yml
service_type: nfs
service_id: EXAMPLE_NFS
placement:
  hosts:
  - ses-node1
  - ses-node2
spec:
  pool: EXAMPLE_POOL
  namespace: EXAMPLE_NAMESPACE
---
service_type: rgw
service_id: REALM_NAME.ZONE_NAME
placement:
  hosts:
  - ses-node1
  - ses-node2
  - ses-node3
---
[...]

The aforementioned properties have the following meaning:

service_type

The type of the service. It can be either a Ceph service (mon, mgr, mds, crash, osd, or rbd-mirror), a gateway (nfs or rgw), or part of the monitoring stack (alertmanager, grafana, node-exporter, or prometheus).

service_id

The name of the service. Specifications of type mon, mgr, alertmanager, grafana, node-exporter, and prometheus do not require the service_id property.

placement

Specifies which nodes will be running the service. Refer to Section 8.2.2, “Creating placement specification” for more details.

spec

Additional specification relevant for the service type.

Tip
Tip: Applying specific services

Ceph cluster services have usually a number of properties specific to them. For examples and details of individual services' specification, refer to Section 8.3, “Deploy Ceph services”.

8.2.2 Creating placement specification

To deploy Ceph services, cephadm needs to know on which nodes to deploy them. Use the placement property and list the short host names of the nodes that the service applies to:

cephuser@adm > cat cluster.yml
[...]
placement:
  hosts:
  - host1
  - host2
  - host3
[...]

8.2.2.1 Placement by labels

You can limit the placement of Ceph services to hosts that match a specific label. In the following example, we create a label prometheus, assign it to specific hosts, and apply the service Prometheus to the group of hosts with the prometheus label.

  1. Add the prometheus label to host1, host2, and host3:

    cephuser@adm > ceph orch host label add host1 prometheus
    cephuser@adm > ceph orch host label add host2 prometheus
    cephuser@adm > ceph orch host label add host3 prometheus
  2. Verify that the labels were correctly assigned:

    cephuser@adm > ceph orch host ls
    HOST   ADDR   LABELS      STATUS
    host1         prometheus
    host2         prometheus
    host3         prometheus
    [...]
  3. Create a placement specification using the label property:

    cephuser@adm > cat cluster.yml
    [...]
    placement:
      label: "prometheus"
    [...]

8.2.3 Applying cluster specification

After you have created a full cluster.yml file with specifications of all services and their placement, you can apply the cluster by running the following command:

cephuser@adm > ceph orch apply -i cluster.yml

To view the status of the cluster, run the ceph orch status command. For more details, see Section 8.1.1, “Displaying the orchestrator status”.

8.2.4 Exporting the specification of a running cluster

Although you deployed services to the Ceph cluster by using the specification files as described in Section 8.2, “Service and placement specification”, the configuration of the cluster may diverge from the original specification during its operation. Also, you may have removed the specification files accidentally.

To retrieve a complete specification of a running cluster, run:

cephuser@adm > ceph orch ls --export
placement:
  hosts:
  - hostname: ses-node1
    name: ''
    network: ''
service_id: my_cephfs
service_name: mds.my_cephfs
service_type: mds
---
placement:
  count: 2
service_name: mgr
service_type: mgr
---
[...]
Tip
Tip

You can append the --format option to change the default yaml output format. You can select from json, json-pretty, or yaml. For example:

ceph orch ls --export --format json

8.3 Deploy Ceph services

After the basic cluster is running, you can deploy Ceph services to additional nodes.

8.3.1 Deploying Ceph Monitors and Ceph Managers

Ceph cluster has three or five MONs deployed across different nodes. If there are five or more nodes in the cluster, we recommend deploying five MONs. A good practice is to have MGRs deployed on the same nodes as MONs.

Important
Important: Include Bootstrap MON

When deploying MONs and MGRs, remember to include the first MON that you added when configuring the basic cluster in Section 7.2.5, “Specifying first MON/MGR node”.

To deploy MONs, apply the following specification:

service_type: mon
placement:
  hosts:
  - ses-node1
  - ses-node2
  - ses-node3
Note
Note

If you need to add another node, append the host name to the same YAML list. For example:

service_type: mon
placement:
 hosts:
 - ses-node1
 - ses-node2
 - ses-node3
 - ses-node4

Similarly, to deploy MGRs, apply the following specification:

Important
Important

Ensure your deployment has at least three Ceph Managers in each deployment.

service_type: mgr
placement:
  hosts:
  - ses-node1
  - ses-node2
  - ses-node3
Tip
Tip

If MONs or MGRs are not on the same subnet, you need to append the subnet addresses. For example:

service_type: mon
placement:
  hosts:
  - ses-node1:10.1.2.0/24
  - ses-node2:10.1.5.0/24
  - ses-node3:10.1.10.0/24

8.3.2 Deploying Ceph OSDs

Important
Important: When Storage Device is Available

A storage device is considered available if all of the following conditions are met:

  • The device has no partitions.

  • The device does not have any LVM state.

  • The device is not be mounted.

  • The device does not contain a file system.

  • The device does not contain a BlueStore OSD.

  • The device is larger than 5 GB.

If the above conditions are not met, Ceph refuses to provision such OSDs.

There are two ways you can deploy OSDs:

  • Tell Ceph to consume all available and unused storage devices:

    cephuser@adm > ceph orch apply osd --all-available-devices
  • Use DriveGroups (see Section 13.4.3, “Adding OSDs using DriveGroups specification”) to create OSD specification describing devices that will be deployed based on their properties, such as device type (SSD or HDD), device model names, size, or the nodes on which the devices exist. Then apply the specification by running the following command:

    cephuser@adm > ceph orch apply osd -i drive_groups.yml

8.3.3 Deploying Metadata Servers

CephFS requires one or more Metadata Server (MDS) services. To create a CephFS, first create MDS servers by applying the following specification:

Note
Note

Ensure you have at least two pools, one for CephFS data and one for CephFS metadata, created before applying the following specification.

service_type: mds
service_id: CEPHFS_NAME
placement:
  hosts:
  - ses-node1
  - ses-node2
  - ses-node3

After MDSs are functional, create the CephFS:

ceph fs new CEPHFS_NAME metadata_pool data_pool

8.3.4 Deploying Object Gateways

cephadm deploys an Object Gateway as a collection of daemons that manage a particular realm and zone.

You can either relate an Object Gateway service to already existing realm and zone, (refer to Section 21.13, “Multisite Object Gateways” for more details), or you can specify a non-existing REALM_NAME and ZONE_NAME and they will be created automatically after you apply the following configuration:

service_type: rgw
service_id: REALM_NAME.ZONE_NAME
placement:
  hosts:
  - ses-node1
  - ses-node2
  - ses-node3
spec:
  rgw_realm: RGW_REALM
  rgw_zone: RGW_ZONE

8.3.4.1 Using secure SSL access

To use a secure SSL connection to the Object Gateway, you need a file containing both the valid SSL certificate and the key (see Section 21.7, “Enable HTTPS/SSL for Object Gateways” for more details). You need to enable SSL, specify a port number for SSL connections, and the SSL certificate file.

To enable SSL and specify the port number, include the following in your specification:

spec:
  ssl: true
  rgw_frontend_port: 443

To specify the SSL certificate and key, you can paste their contents directly into the YAML specification file. The pipe sign (|) at the end of line tells the parser to expect a multi-line string as a value. For example:

spec:
  ssl: true
  rgw_frontend_port: 443
  rgw_frontend_ssl_certificate: |
   -----BEGIN CERTIFICATE-----
   MIIFmjCCA4KgAwIBAgIJAIZ2n35bmwXTMA0GCSqGSIb3DQEBCwUAMGIxCzAJBgNV
   BAYTAkFVMQwwCgYDVQQIDANOU1cxHTAbBgNVBAoMFEV4YW1wbGUgUkdXIFNTTCBp
   [...]
   -----END CERTIFICATE-----
   -----BEGIN PRIVATE KEY-----
   MIIJRAIBADANBgkqhkiG9w0BAQEFAASCCS4wggkqAgEAAoICAQDLtFwg6LLl2j4Z
   BDV+iL4AO7VZ9KbmWIt37Ml2W6y2YeKX3Qwf+3eBz7TVHR1dm6iPpCpqpQjXUsT9
   [...]
   -----END PRIVATE KEY-----
Tip
Tip

Instead of pasting the content of an SSL certificate file, you can omit the rgw_frontend_ssl_certificate: directive and upload the certificate file to the configuration database:

cephuser@adm > ceph config-key set rgw/cert/REALM_NAME/ZONE_NAME.crt \
 -i SSL_CERT_FILE
8.3.4.1.1 Configure the Object Gateway to listen on both ports 443 and 80

To configure the Object Gateway to listen on both ports 443 (HTTPS) and 80 (HTTP), follow these steps:

Note
Note

The commands in the procedure use realm and zone default.

  1. Deploy the Object Gateway by supplying a specification file. Refer to Section 8.3.4, “Deploying Object Gateways” for more details on the Object Gateway specification. Use the following command:

    cephuser@adm > ceph orch apply -i SPEC_FILE
  2. If SSL certificates are not supplied in the specification file, add them by using the following command:

    cephuser@adm > ceph config-key set rgw/cert/default/default.crt -i certificate.pem
    cephuser@adm > ceph config-key set rgw/cert/default/default.key -i key.pem
  3. Change the default value of the rgw_frontends option:

    cephuser@adm > ceph config set client.rgw.default.default rgw_frontends \
     "beast port=80 ssl_port=443"
  4. Remove the specific configuration created by cephadm. Identify for which target the rgw_frontends option was configured by running the command:

    cephuser@adm > ceph config dump | grep rgw

    For example, the target is client.rgw.default.default.node4.yiewdu. Remove the current specific rgw_frontends value:

    cephuser@adm > ceph config rm client.rgw.default.default.node4.yiewdu rgw_frontends
    Tip
    Tip

    Instead of removing a value for rgw_frontends, you can specify it. For example:

    cephuser@adm > ceph config set client.rgw.default.default.node4.yiewdu \
     rgw_frontends "beast port=80 ssl_port=443"
  5. Restart Object Gateways:

    cephuser@adm > ceph orch restart rgw.default.default

8.3.4.2 Deploying with a subcluster

Subclusters help you organize the nodes in your clusters to isolate workloads and make elastic scaling easier. If you are deploying with a subcluster, apply the following configuration:

service_type: rgw
service_id: REALM_NAME.ZONE_NAME.SUBCLUSTER
placement:
  hosts:
  - ses-node1
  - ses-node2
  - ses-node3
spec:
  rgw_realm: RGW_REALM
  rgw_zone: RGW_ZONE
  subcluster: SUBCLUSTER

8.3.4.3 Deploying High Availability for the Object Gateway

You can create high availability (HA) endpoints for the Object Gateway service by deploying Ingress. Ingress includes a combination of HAProxy and Keepalived to provide load balancing on a floating virtual IP address. A virtual IP address is automatically configured only on one of Ingress hosts at a time.

Each Keepalived daemon checks every few seconds whether the HAProxy daemon on the same host is responding. Keepalived also checks whether the master Keepalived daemon is running without problems. If the master Keepalived or the active HAProxy daemon is not responding, one of the remaining Keepalived daemons running in backup mode is elected as the master daemon, and the virtual IP address will be assigned to that node.

High Availability setup for Object Gateway
Figure 8.1: High Availability setup for Object Gateway
Important
Important

If your Object Gateway deployment uses an SSL connection, disable it and configure Ingress to use the SSL certificate instead of the Object Gateway node.

  1. Create specifications for the Object Gateway and Ingress services, for example:

    > cat rgw.yaml
    service_type: rgw
    service_id: myrealm.myzone
    placement:
      hosts:
        - ses-node1
        - ses-node2
        - ses-node3
    [...]
    > cat ingress.yaml
    service_type: ingress
    service_id: rgw.myrealm.myzone
    placement:
      hosts:
        - ingress-host1
        - ingress-host2
        - ingress-host3
    spec:
      backend_service: rgw.myrealm.myzone
      virtual_ip: 192.168.20.1/24
      frontend_port: 8080
      monitor_port: 1967     # used by HAproxy for load balancer status
      ssl_cert: |
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
        -----BEGIN PRIVATE KEY-----
        ...
        -----END PRIVATE KEY-----
  2. Apply the specifications:

    cephuser@adm > ceph orch apply -i rgw.yaml
    cephuser@adm > ceph orch apply -i ingress.yaml

8.3.5 Deploying iSCSI Gateways

cephadm deploys an iSCSI Gateway which is a storage area network (SAN) protocol that allows clients (called initiators) to send SCSI commands to SCSI storage devices (targets) on remote servers.

Apply the following configuration to deploy. Ensure trusted_ip_list contains the IP addresses of all iSCSI Gateway and Ceph Manager nodes (see the example output below).

Note
Note

Ensure the pool is created before applying the following specification.

service_type: iscsi
service_id: EXAMPLE_ISCSI
placement:
  hosts:
  - ses-node1
  - ses-node2
  - ses-node3
spec:
  pool: EXAMPLE_POOL
  api_user: EXAMPLE_USER
  api_password: EXAMPLE_PASSWORD
  trusted_ip_list: "IP_ADDRESS_1,IP_ADDRESS_2"
Note
Note

Ensure the IPs listed for trusted_ip_list do not have a space after the comma separation.

8.3.5.1 Secure SSL configuration

To use a secure SSL connection between the Ceph Dashboard and the iSCSI target API, you need a pair of valid SSL certificate and key files. These can be either CA-issued or self-signed (see Section 10.1.1, “Creating self-signed certificates”). To enable SSL, include the api_secure: true setting in your specification file:

spec:
  api_secure: true

To specify the SSL certificate and key, you can paste the content directly into the YAML specification file. The pipe sign (|) at the end of line tells the parser to expect a multi-line string as a value. For example:

spec:
  pool: EXAMPLE_POOL
  api_user: EXAMPLE_USER
  api_password: EXAMPLE_PASSWORD
  trusted_ip_list: "IP_ADDRESS_1,IP_ADDRESS_2"
  api_secure: true
  ssl_cert: |
    -----BEGIN CERTIFICATE-----
    MIIDtTCCAp2gAwIBAgIYMC4xNzc1NDQxNjEzMzc2MjMyXzxvQ7EcMA0GCSqGSIb3
    DQEBCwUAMG0xCzAJBgNVBAYTAlVTMQ0wCwYDVQQIDARVdGFoMRcwFQYDVQQHDA5T
    [...]
    -----END CERTIFICATE-----
  ssl_key: |
    -----BEGIN PRIVATE KEY-----
    MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC5jdYbjtNTAKW4
    /CwQr/7wOiLGzVxChn3mmCIF3DwbL/qvTFTX2d8bDf6LjGwLYloXHscRfxszX/4h
    [...]
    -----END PRIVATE KEY-----

8.3.6 Deploying NFS Ganesha

Important
Important

NFS Ganesha supports NFS version 4.1 and newer. It does not support NFS version 3.

cephadm deploys NFS Ganesha using a standard pool .nfs instead of a user-predefined pool. To deploy NFS Ganesha, apply the following specification:

service_type: nfs
service_id: EXAMPLE_NFS
placement:
  hosts:
  - ses-node1
  - ses-node2

EXAMPLE_NFS should be replaced with an arbitrary string that identifies the NFS export.

Enable the nfs module in the Ceph Manager daemon. This allows defining NFS exports through the NFS section in the Ceph Dashboard:

cephuser@adm > ceph mgr module enable nfs

8.3.6.1 Deploying High Availability for the NFS Ganesha

You can create high availability (HA) endpoints for the NFS Ganesha service by deploying Ingress. Ingress includes a combination of HAProxy and Keepalived to provide load balancing on a floating virtual IP address. A virtual IP address is automatically configured only on one of Ingress hosts at a time.

Each Keepalived daemon checks every few seconds whether the HAProxy daemon on the same host is responding. Keepalived also checks whether the master Keepalived daemon is running without problems. If the master Keepalived or the active HAProxy daemon is not responding, one of the remaining Keepalived daemons running in backup mode is elected as the master daemon, and the virtual IP address will be assigned to that node.

High Availability setup for NFS Ganesha
Figure 8.2: High Availability setup for NFS Ganesha
  1. Ensure that the following sysctl variables are set on each Ingress host:

    net.ipv4.ip_nonlocal_bind=1
    net.ipv4.ip_forward=1
    Tip
    Tip

    You can persistently set the sysctl variables by creating a file /etc/sysctl.d/90-ingress.conf and including the above variables. To apply the changes, run sysctl --system or reboot the host.

  2. Create specifications for the Ingress and NFS Ganesha services, for example:

    > cat nfs.yaml
    service_type: nfs
    service_id: nfs-ha
    placement:
      hosts:
      - nfs-node1
      - nfs-node2
      - nfs-node3
    spec:
      port: 12049
    > cat ingress.yaml
    service_type: ingress
    service_id: nfs.nfs-ha
    placement:
      hosts:
      - ingress-node1
      - ingress-node2
      - ingress-node3
    spec:
      backend_service: nfs.nfs-ha
      frontend_port: 2049
      monitor_port: 9049
      virtual_ip: 10.20.27.111
    Warning
    Warning

    We recommend running at least three NFS Ganesha daemons, otherwise, mounted clients may experience excessively long hangs during failovers.

  3. Apply the customized service specifications:

    cephuser@adm > ceph orch apply -i nfs.yaml
    cephuser@adm > ceph orch apply -i ingress.yaml
  4. Create a NFS Ganesha export. Find more details in Section 7.1, “Creating NFS exports”. Remember to select the High Availability NFS Ganesha cluster and an appropriate storage back-end and pseudo path.

    Alternatively, create the export from the command line on the MGR node:

    cephuser@adm > ceph nfs export create \
     STORAGE_BACKEND \
     NFS_HA_CLUSTER \
     NFS_PSEUDO_PATH \
     STORAGE_BACKEND_INSTANCE \
     EXPORT_PATH
  5. Mount the NFS Ganesha share on your clients:

    # mount -t nfs \
    VIRTUAL_IP:PORT/NFS_PSEUDO_PATH \
    /mnt/nfs

8.3.7 Deploying rbd-mirror

The rbd-mirror service takes care of synchronizing RADOS Block Device images between two Ceph clusters (for more details, see Section 20.4, “RBD image mirrors”). To deploy rbd-mirror, use the following specification:

service_type: rbd-mirror
service_id: EXAMPLE_RBD_MIRROR
placement:
  hosts:
  - ses-node3

8.3.8 Deploying the monitoring stack

The monitoring stack consists of Prometheus, Prometheus exporters, Prometheus Alertmanager, and Grafana. Ceph Dashboard makes use of these components to store and visualize detailed metrics on cluster usage and performance.

Tip
Tip

If your deployment requires custom or locally served container images of the monitoring stack services, refer to Section 16.1, “Configuring custom or local images”.

To deploy the monitoring stack, follow these steps:

  1. Enable the prometheus module in the Ceph Manager daemon. This exposes the internal Ceph metrics so that Prometheus can read them:

    cephuser@adm > ceph mgr module enable prometheus
    Note
    Note

    Ensure this command is run before Prometheus is deployed. If the command was not run before the deployment, you must redeploy Prometheus to update Prometheus' configuration:

    cephuser@adm > ceph orch redeploy prometheus
  2. Create a specification file (for example monitoring.yaml) with a content similar to the following:

    service_type: prometheus
    placement:
      hosts:
      - ses-node2
    ---
    service_type: node-exporter
    ---
    service_type: alertmanager
    placement:
      hosts:
      - ses-node4
    ---
    service_type: grafana
    placement:
      hosts:
      - ses-node3
  3. Apply monitoring services by running:

    cephuser@adm > ceph orch apply -i monitoring.yaml

    It may take a minute or two for the monitoring services to be deployed.

Important
Important

Prometheus, Grafana, and the Ceph Dashboard are all automatically configured to talk to each other, resulting in a fully functional Grafana integration in the Ceph Dashboard when deployed as described above.

The only exception to this rule is monitoring with RBD images. See Section 16.5.4, “Enabling RBD-image monitoring” for more information.