ceph-deploy
Deployment) to 5Copyright © 2022 SUSE LLC
Copyright © 2016, RedHat, Inc, and contributors.
The text of and illustrations in this document are licensed under a Creative Commons Attribution-Share Alike 4.0 International ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/4.0/legalcode. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the Infinity Logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux® is the registered trademark of Linus Torvalds in the United States and other countries. Java® is a registered trademark of Oracle and/or its affiliates. XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. All other trademarks are the property of their respective owners.
For SUSE trademarks, see http://www.suse.com/company/legal/. All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.
All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof.
SUSE Enterprise Storage 5.5 is an extension to SUSE Linux Enterprise Server 12 SP3. It combines the capabilities of the Ceph (http://ceph.com/) storage project with the enterprise engineering and support of SUSE. SUSE Enterprise Storage 5.5 provides IT organizations with the ability to deploy a distributed storage architecture that can support a number of use cases using commodity hardware platforms.
This guide helps you understand the concept of the SUSE Enterprise Storage 5.5 with the main focus on managing and administrating the Ceph infrastructure. It also demonstrates how to use Ceph with other related solutions, such as OpenStack or KVM.
Many chapters in this manual contain links to additional documentation resources. These include additional documentation that is available on the system as well as documentation available on the Internet.
For an overview of the documentation available for your product and the latest documentation updates, refer to https://documentation.suse.com.
The following manuals are available for this product:
The guide describes various administration tasks that are typically
performed after the installation. The guide also introduces steps to
integrate Ceph with virtualization solutions such as libvirt
, Xen,
or KVM, and ways to access objects stored in the cluster via iSCSI and
RADOS gateways.
Guides you through the installation steps of the Ceph cluster and all services related to Ceph. The guide also illustrates a basic Ceph cluster structure and provides you with related terminology.
HTML versions of the product manuals can be found in the installed system
under /usr/share/doc/manual
. Find the latest
documentation updates at https://documentation.suse.com where you can download the
manuals for your product in multiple formats.
Several feedback channels are available:
For services and support options available for your product, refer to http://www.suse.com/support/.
To report bugs for a product component, log in to the Novell Customer Center from http://www.suse.com/support/ and select › .
We want to hear your comments and suggestions for this manual and
the other documentation included with this product. If you have questions,
suggestions, or corrections, contact doc-team@suse.com, or you can also
click the Report Documentation Bug
link beside each
chapter or section heading.
For feedback on the documentation of this product, you can also send a
mail to doc-team@suse.de
. Make sure to include the
document title, the product version, and the publication date of the
documentation. To report errors or suggest enhancements, provide a concise
description of the problem and refer to the respective section number and
page (or URL).
The following typographical conventions are used in this manual:
/etc/passwd
: directory names and file names
placeholder: replace placeholder with the actual value
PATH
: the environment variable PATH
ls
, --help
: commands, options, and
parameters
user
: users or groups
Alt, Alt–F1: a key to press or a key combination; keys are shown in uppercase as on a keyboard
, › : menu items, buttons
Dancing Penguins (Chapter Penguins, ↑Another Manual): This is a reference to a chapter in another manual.
This book is written in GeekoDoc, a subset of DocBook (see
http://www.docbook.org). The XML source files were
validated by xmllint
, processed by
xsltproc
, and converted into XSL-FO using a customized
version of Norman Walsh's stylesheets. The final PDF can be formatted through
FOP from Apache or through XEP from RenderX. The authoring and publishing
tools used to produce this manual are available in the package
daps
. The DocBook Authoring and
Publishing Suite (DAPS) is developed as open source software. For more
information, see http://daps.sf.net/.
The Ceph project and its documentation is a result of hundreds of contributors and organizations. See https://ceph.com/contributors/ for more details.
SUSE Enterprise Storage 5.5 is a distributed storage system designed for scalability, reliability and performance which is based on the Ceph technology. A Ceph cluster can be run on commodity servers in a common network like Ethernet. The cluster scales up well to thousands of servers (later on refe…
The hardware requirements of Ceph are heavily dependent on the IO workload. The following hardware requirements and recommendations should be considered as a starting point for detailed planning.
Ceph admin node is a Ceph cluster node where the Salt master service is running. The admin node is a central point of the Ceph cluster because it manages the rest of the cluster nodes by querying and instructing their Salt minion services. It usually includes other services as well, for example the …
SUSE Enterprise Storage 5.5 is a distributed storage system designed for scalability, reliability and performance which is based on the Ceph technology. A Ceph cluster can be run on commodity servers in a common network like Ethernet. The cluster scales up well to thousands of servers (later on referred to as nodes) and into the petabyte range. As opposed to conventional systems which have allocation tables to store and fetch data, Ceph uses a deterministic algorithm to allocate storage for data and has no centralized information structure. Ceph assumes that in storage clusters the addition or removal of hardware is the rule, not the exception. The Ceph cluster automates management tasks such as data distribution and redistribution, data replication, failure detection and recovery. Ceph is both self-healing and self-managing which results in a reduction of administrative and budget overhead.
This chapter provides a high level overview of SUSE Enterprise Storage 5.5 and briefly describes the most important components.
Since SUSE Enterprise Storage 5.5, the only cluster deployment method is DeepSea. Refer to Chapter 4, Deploying with DeepSea/Salt for details about the deployment process.
The Ceph environment has the following features:
Ceph can scale to thousands of nodes and manage storage in the range of petabytes.
No special hardware is required to run a Ceph cluster. For details, see Chapter 2, Hardware Requirements and Recommendations
The Ceph cluster is self-managing. When nodes are added, removed or fail, the cluster automatically redistributes the data. It is also aware of overloaded disks.
No node in a cluster stores important information alone. The number of redundancies can be configured.
Ceph is an open source software solution and independent of specific hardware or vendors.
To make full use of Ceph's power, it is necessary to understand some of the basic components and concepts. This section introduces some parts of Ceph that are often referenced in other chapters.
The basic component of Ceph is called RADOS (Reliable Autonomic Distributed Object Store). It is responsible for managing the data stored in the cluster. Data in Ceph is usually stored as objects. Each object consists of an identifier and the data.
RADOS provides the following access methods to the stored objects that cover many use cases:
Object Gateway is an HTTP REST gateway for the RADOS object store. It enables direct access to objects stored in the Ceph cluster.
RADOS Block Devices (RBD) can be accessed like any other block device.
These can be used for example in combination with libvirt
for
virtualization purposes.
The Ceph File System is a POSIX-compliant file system.
librados
librados
is a library that can
be used with many programming languages to create an application capable
of directly interacting with the storage cluster.
librados
is used by Object Gateway and RBD
while CephFS directly interfaces with RADOS
Figure 1.1, “Interfaces to the Ceph Object Store”.
At the core of a Ceph cluster is the CRUSH algorithm. CRUSH is the acronym for Controlled Replication Under Scalable Hashing. CRUSH is a function that handles the storage allocation and needs comparably few parameters. That means only a small amount of information is necessary to calculate the storage position of an object. The parameters are a current map of the cluster including the health state, some administrator-defined placement rules and the name of the object that needs to be stored or retrieved. With this information, all nodes in the Ceph cluster are able to calculate where an object and its replicas are stored. This makes writing or reading data very efficient. CRUSH tries to evenly distribute data over all nodes in the cluster.
The CRUSH map contains all storage nodes and administrator-defined placement rules for storing objects in the cluster. It defines a hierarchical structure that usually corresponds to the physical structure of the cluster. For example, the data-containing disks are in hosts, hosts are in racks, racks in rows and rows in data centers. This structure can be used to define failure domains. Ceph then ensures that replications are stored on different branches of a specific failure domain.
If the failure domain is set to rack, replications of objects are distributed over different racks. This can mitigate outages caused by a failed switch in a rack. If one power distribution unit supplies a row of racks, the failure domain can be set to row. When the power distribution unit fails, the replicated data is still available on other rows.
In Ceph, nodes are servers working for the cluster. They can run several different types of daemons. It is recommended to run only one type of daemon on each node, except for MGR daemons which can be collocated with MONs. Each cluster requires at least MON, MGR and OSD daemons:
Ceph Monitor (often abbreviated as MON) nodes maintain information about the cluster health state, a map of all nodes and data distribution rules (see Section 1.2.2, “CRUSH”).
If failures or conflicts occur, the Ceph Monitor nodes in the cluster decide by majority which information is correct. To form a qualified majority, it is recommended to have an odd number of Ceph Monitor nodes, and at least three of them.
If more than one site is used, the Ceph Monitor nodes should be distributed over an odd number of sites. The number of Ceph Monitor nodes per site should be such that more than 50% of the Ceph Monitor nodes remain functional if one site fails.
The Ceph manager (MGR) collects the state information from the whole cluster. The Ceph manager daemon runs alongside the monitor daemons. It provides additional monitoring, and interfaces the external monitoring and management systems.
The Ceph manager requires no additional configuration, beyond ensuring it is running. You can deploy it as a separate role using DeepSea.
A Ceph OSD is a daemon handling Object Storage Devices which are a physical or logical storage units (hard disks or partitions). Object Storage Devices can be physical disks/partitions or logical volumes. The daemon additionally takes care of data replication and rebalancing in case of added or removed nodes.
Ceph OSD daemons communicate with monitor daemons and provide them with the state of the other OSD daemons.
To use CephFS, Object Gateway, NFS Ganesha, or iSCSI Gateway, additional nodes are required:
The metadata servers store metadata for the CephFS. By using an MDS
you can execute basic file system commands such as ls
without overloading the cluster.
The Ceph Object Gateway provided by Object Gateway is an HTTP REST gateway for the RADOS object store. It is compatible with OpenStack Swift and Amazon S3 and has its own user management.
NFS Ganesha provides an NFS access to either the Object Gateway or the CephFS. It runs in the user instead of the kernel space and directly interacts with the Object Gateway or CephFS.
iSCSI is a storage network protocol that allows clients to send SCSI commands to SCSI storage devices (targets) on remote servers.
As a Ceph cluster administrator, you will be configuring and adjusting the cluster behavior by running specific commands. There are several types of commands you will need:
These commands help you to deploy or upgrade the Ceph cluster, run
commands on several (or all) cluster nodes at the same time, or assist you
when adding or removing cluster nodes. The most frequently used are
salt
, salt-run
, and
deepsea
. You need to run Salt commands on the
Salt master node (refer to Section 4.2, “Introduction to DeepSea” for
details) as root
. These commands are introduced with the following
prompt:
root@master #
For example:
root@master #
salt '*.example.net' test.ping
These are lower level commands to configure and fine tune all aspects of
the cluster and its gateways on the command line. ceph
,
rbd
, radosgw-admin
, or
crushtool
to name some of them.
To run Ceph related commands, you need to have read access to a Ceph
key. The key's capabilities then define your privileges within the Ceph
environment. One option is to run Ceph commands as root
(or via
sudo
) and use the unrestricted default keyring
'ceph.client.admin.key'.
Safer and recommended option is to create a more restrictive individual key for each administrator user and put it in a directory where the users can read it, for example:
~/.ceph/ceph.client.USERNAME.keyring
To use a custom admin user and keyring, you need to specify the user name
and path to the key each time you run the ceph
command
using the -n client.USER_NAME
and --keyring PATH/TO/KEYRING
options.
To avoid this, include these options in the CEPH_ARGS
variable in the individual users' ~/.bashrc
files.
Although you can run Ceph related commands on any cluster node, we
recommend running them on the node with the 'admin' role (see
Section 4.5.1.2, “Role Assignment” for details). This documentation
uses the cephadm
user to run the commands, therefore they are introduced
with the following prompt:
cephadm >
For example:
cephadm >
ceph auth list
Linux commands not related to Ceph or DeepSea, such as
mount
, cat
, or
openssl
, are introduced either with the
cephadm >
or root #
prompts, depending on which privileges
the related command requires.
For more information on Ceph key management, refer to
Book “Administration Guide”, Chapter 6 “Authentication with cephx
”, Section 6.2 “Key Management”.
Objects that are stored in a Ceph cluster are put into pools. Pools represent logical partitions of the cluster to the outside world. For each pool a set of rules can be defined, for example, how many replications of each object must exist. The standard configuration of pools is called replicated pool.
Pools usually contain objects but can also be configured to act similar to a RAID 5. In this configuration, objects are stored in chunks along with additional coding chunks. The coding chunks contain the redundant information. The number of data and coding chunks can be defined by the administrator. In this configuration, pools are referred to as erasure coded pools.
Placement Groups (PGs) are used for the distribution of data within a pool. When creating a pool, a certain number of placement groups is set. The placement groups are used internally to group objects and are an important factor for the performance of a Ceph cluster. The PG for an object is determined by the object's name.
This section provides a simplified example of how Ceph manages data (see
Figure 1.2, “Small Scale Ceph Example”). This example
does not represent a recommended configuration for a Ceph cluster. The
hardware setup consists of three storage nodes or Ceph OSDs
(Host 1
, Host 2
, Host
3
). Each node has three hard disks which are used as OSDs
(osd.1
to osd.9
). The Ceph Monitor nodes are
neglected in this example.
While Ceph OSD or Ceph OSD daemon refers to a daemon that is run on a node, the word OSD refers to the logical disk that the daemon interacts with.
The cluster has two pools, Pool A
and Pool
B
. While Pool A replicates objects only two times, resilience for
Pool B is more important and it has three replications for each object.
When an application puts an object into a pool, for example via the REST
API, a Placement Group (PG1
to PG4
)
is selected based on the pool and the object name. The CRUSH algorithm then
calculates on which OSDs the object is stored, based on the Placement Group
that contains the object.
In this example the failure domain is set to host. This ensures that replications of objects are stored on different hosts. Depending on the replication level set for a pool, the object is stored on two or three OSDs that are used by the Placement Group.
An application that writes an object only interacts with one Ceph OSD, the primary Ceph OSD. The primary Ceph OSD takes care of replication and confirms the completion of the write process after all other OSDs have stored the object.
If osd.5
fails, all object in PG1
are
still available on osd.1
. As soon as the cluster
recognizes that an OSD has failed, another OSD takes over. In this example
osd.4
is used as a replacement for
osd.5
. The objects stored on osd.1
are then replicated to osd.4
to restore the replication
level.
If a new node with new OSDs is added to the cluster, the cluster map is going to change. The CRUSH function then returns different locations for objects. Objects that receive new locations will be relocated. This process results in a balanced usage of all OSDs.
BlueStore is a new default storage back end for Ceph since SUSE Enterprise Storage 5. It has better performance than FileStore, full data check-summing, and built-in compression.
bluestore_cache_autotune
Since ceph version 12.2.10, a new setting
bluestore_cache_autotune
was introduced that disables all
bluestore_cache
options for manual cache sizing. To keep
the old behavior, you need to set
bluestore_cache_autotune=false
. Refer to
Book “Administration Guide”, Chapter 12 “Ceph Cluster Configuration”, Section 12.2 “Ceph OSD and BlueStore” for more details.
BlueStore manages either one, two, or three storage devices. In the simplest case, BlueStore consumes a single primary storage device. The storage device is normally partitioned into two parts:
A small partition named BlueFS that implements file system-like functionalities required by RocksDB.
The rest of the device is normally a large partition occupying the rest of the device. It is managed directly by BlueStore and contains all of the actual data. This primary device is normally identified by a block symbolic link in the data directory.
It is also possible to deploy BlueStore across two additional devices:
A WAL device can be used for BlueStore’s internal
journal or write-ahead log. It is identified by the
block.wal
symbolic link in the data directory. It is only
useful to use a separate WAL device if the device is faster than the primary
device or the DB device, for example when:
The WAL device is an NVMe, and the DB device is an SSD, and the data device is either SSD or HDD.
Both the WAL and DB devices are separate SSDs, and the data device is an SSD or HDD.
A DB device can be used for storing BlueStore’s internal metadata. BlueStore (or rather, the embedded RocksDB) will put as much metadata as it can on the DB device to improve performance. Again, it is only helpful to provision a shared DB device if it is faster than the primary device.
Plan thoroughly for the sufficient size of the DB device. If the DB device fills up, metadata will be spilling over to the primary device which badly degrades the OSD's performance.
You can check if a WAL/DB partition is getting full and spilling over with
the ceph daemon osd.ID perf
dump
command. The slow_used_bytes
value shows
the amount of data being spilled out:
cephadm >
ceph daemon osd.ID perf dump | jq '.bluefs'
"db_total_bytes": 1073741824,
"db_used_bytes": 33554432,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 554432,
"slow_used_bytes": 554432,
Ceph as a community project has its own extensive online documentation. For topics not found in this manual, refer to http://docs.ceph.com/docs/master/.
The original publication CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data by S.A. Weil, S.A. Brandt, E.L. Miller, C. Maltzahn provides helpful insight into the inner workings of Ceph. Especially when deploying large scale clusters it is a recommended reading. The publication can be found at http://www.ssrc.ucsc.edu/papers/weil-sc06.pdf.
SUSE Enterprise Storage can be used with non-SUSE OpenStack distributions. The Ceph clients need to be at a level that is compatible with SUSE Enterprise Storage.
SUSE supports the server component of the Ceph deployment and the client is supported by the OpenStack distribution vendor.
The hardware requirements of Ceph are heavily dependent on the IO workload. The following hardware requirements and recommendations should be considered as a starting point for detailed planning.
In general, the recommendations given in this section are on a per-process basis. If several processes are located on the same machine, the CPU, RAM, disk and network requirements need to be added up.
At least 4 OSD nodes, with 8 OSD disks each, are required.
For OSDs that do not use BlueStore, 1 GB of RAM per terabyte of raw OSD capacity is minimally required for each OSD storage node. 1.5 GB of RAM per terabyte of raw OSD capacity is recommended. During recovery, 2 GB of RAM per terabyte of raw OSD capacity may be optimal.
For OSDs that use BlueStore, first calculate the size of RAM that is recommended for OSDs that do not use BlueStore, then calculate 2 GB plus the size of the BlueStore cache of RAM is recommended for each OSD process, and choose the bigger value of RAM of the two results. Note that the default BlueStore cache is 1 GB for HDD and 3 GB for SSD drives by default. In summary, pick the greater of:
[1GB * OSD count * OSD size]
or
[(2 + BS cache) * OSD count]
1.5 GHz of a logical CPU core per OSD is minimally required for each OSD daemon process. 2 GHz per OSD daemon process is recommended. Note that Ceph runs one OSD daemon process per storage disk; do not count disks reserved solely for use as OSD journals, WAL journals, omap metadata, or any combination of these three cases.
10 Gb Ethernet (two network interfaces bonded to multiple switches).
OSD disks in JBOD configurations.
OSD disks should be exclusively used by SUSE Enterprise Storage 5.5.
Dedicated disk/SSD for the operating system, preferably in a RAID 1 configuration.
If this OSD host will host part of a cache pool used for cache tiering, allocate at least an additional 4 GB of RAM.
For disk performance reasons, we recommend using bare metal for OSD nodes and not virtual machines.
There are two types of disk space needed to run on OSD: the space for the disk journal (for FileStore) or WAL/DB device (for BlueStore), and the primary space for the stored data. The minimum (and default) value for the journal/WAL/DB is 6 GB. The minimum space for data is 5 GB, as partitions smaller than 5 GB are automatically assigned the weight of 0.
So although the minimum disk space for an OSD is 11 GB, we do not recommend a disk smaller than 20 GB, even for testing purposes.
Following are several rules for WAL/DB device sizing. When using DeepSea to deploy OSDs with BlueStore, it applies the recommended rules automatically and notifies the administrator about the fact.
10GB of the DB device for each Terabyte of the OSD capacity (1/100th of the OSD).
Between 500MB and 2GB for the WAL device. The WAL size depends on the data traffic and workload, not on the OSD size. If you know that an OSD is physically able to handle small writes and overwrites at a very high throughput, more WAL is preferred rather than less WAL. 1GB WAL device is a good compromise that fulfills most deployments.
If you intend to put the WAL and DB device on the same disk, then we recommend using a single partition for both devices, rather than having a separate partition for each. This allows Ceph to use the DB device for the WAL operation as well. Management of the disk space is therefore more effective as Ceph uses the DB partition for the WAL only if there is a need for it. Another advantage is that the probability that the WAL partition gets full is very small, and when it is not entirely used then its space is not wasted but used for DB operation.
To share the DB device with the WAL, do not specify the WAL device, and specify only the DB device:
bluestore_block_db_path = "/path/to/db/device" bluestore_block_db_size = 10737418240 bluestore_block_wal_path = "" bluestore_block_wal_size = 0
Alternatively, you can put the WAL on its own separate device. In such case, we recommend the fastest device for the WAL operation.
You can have as many disks in one server as it allows. There are a few things to consider when planning the number of disks per server:
Network bandwidth. The more disks you have in a server, the more data must be transferred via the network card(s) for the disk write operations.
Memory. For optimum performance, reserve at least 2 GB of RAM per terabyte of disk space installed.
Fault tolerance. If the complete server fails, the more disks it has, the more OSDs the cluster temporarily loses. Moreover, to keep the replication rules running, you need to copy all the data from the failed server among the other nodes in the cluster.
At least three Ceph Monitor nodes are required. The number of monitors should always be odd (1+2n).
4 GB of RAM.
Processor with four logical cores.
An SSD or other sufficiently fast storage type is highly recommended for
monitors, specifically for the /var/lib/ceph
path on
each monitor node, as quorum may be unstable with high disk latencies. Two
disks in RAID 1 configuration is recommended for redundancy. It is
recommended that separate disks or at least separate disk partitions are
used for the monitor processes to protect the monitor's available disk
space from things like log file creep.
There must only be one monitor process per node.
Mixing OSD, monitor, or Object Gateway nodes is only supported if sufficient hardware resources are available. That means that the requirements for all services need to be added up.
Two network interfaces bonded to multiple switches.
Object Gateway nodes should have six to eight CPU cores and 32 GB of RAM (64 GB recommended). When other processes are co-located on the same machine, their requirements need to be added up.
Proper sizing of the Metadata Server nodes depends on the specific use case. Generally, the more open files the Metadata Server is to handle, the more CPU and RAM it needs. Following are the minimal requirements:
3G of RAM per one Metadata Server daemon.
Bonded network interface.
2.5 GHz CPU with at least 2 cores.
At least 4 GB of RAM and a quad-core CPU are required. This is includes running openATTIC on the Salt master. For large clusters with hundreds of nodes, 6 GB of RAM is suggested.
iSCSI nodes should have six to eight CPU cores and 16 GB of RAM.
The network environment where you intend to run Ceph should ideally be a bonded set of at least two network interfaces that is logically split into a public part and a trusted internal part using VLANs. The bonding mode is recommended to be 802.3ad if possible to provide maximum bandwidth and resiliency.
The public VLAN serves to provide the service to the customers, while the internal part provides for the authenticated Ceph network communication. The main reason for this is that although Ceph provides authentication and protection against attacks once secret keys are in place, the messages used to configure these keys may be transferred openly and are vulnerable.
Additional administration network setup—that enables for example separating SSH, Salt, or DNS networking—is neither tested nor supported.
If your storage nodes are configured via DHCP, the default timeouts may not
be sufficient for the network to be configured correctly before the various
Ceph daemons start. If this happens, the Ceph MONs and OSDs will not
start correctly (running systemctl status ceph\*
will
result in "unable to bind" errors) To avoid this issue, we recommend
increasing the DHCP client timeout to at least 30 seconds on each node in
your storage cluster. This can be done by changing the following settings
on each node:
In /etc/sysconfig/network/dhcp
, set
DHCLIENT_WAIT_AT_BOOT="30"
In /etc/sysconfig/network/config
, set
WAIT_FOR_INTERFACES="60"
If you do not specify a cluster network during Ceph deployment, it assumes a single public network environment. While Ceph operates fine with a public network, its performance and security improves when you set a second private cluster network. To support two networks, each Ceph node needs to have at least two network cards.
You need to apply the following changes to each Ceph node. It is relatively quick to do for a small cluster, but can be very time consuming if you have a cluster consisting of hundreds or thousands of nodes.
Stop Ceph related services on each cluster node.
Add a line to /etc/ceph/ceph.conf
to define the
cluster network, for example:
cluster network = 10.0.0.0/24
If you need to specifically assign static IP addresses or override
cluster network
settings, you can do so with the
optional cluster addr
.
Check that the private cluster network works as expected on the OS level.
Start Ceph related services on each cluster node.
root #
systemctl start ceph.target
If the monitor nodes are on multiple subnets, for example they are located
in different rooms and served by different switches, you need to adjust the
ceph.conf
file accordingly. For example if the nodes
have IP addresses 192.168.123.12, 1.2.3.4, and 242.12.33.12, add the
following lines to its global section:
[global] [...] mon host = 192.168.123.12, 1.2.3.4, 242.12.33.12 mon initial members = MON1, MON2, MON3 [...]
Additionally, if you need to specify a per-monitor public address or
network, you need to add a
[mon.X]
section per each
monitor:
[mon.MON1] public network = 192.168.123.0/24 [mon.MON2] public network = 1.2.3.0/24 [mon.MON3] public network = 242.12.33.12/0
Ceph does not generally support non-ASCII characters in configuration files, pool names, user names and so forth. When configuring a Ceph cluster we recommend using only simple alphanumeric characters (A-Z, a-z, 0-9) and minimal punctuation ('.', '-', '_') in all Ceph object/configuration names.
Four Object Storage Nodes
10 Gb Ethernet (two networks bonded to multiple switches)
32 OSDs per storage cluster
OSD journal can reside on OSD disk
Dedicated OS disk for each Object Storage Node
1 GB of RAM per TB of raw OSD capacity for each Object Storage Node
1.5 GHz per OSD for each Object Storage Node
Ceph Monitors, gateway and Metadata Servers can reside on Object Storage Nodes
Three Ceph Monitor nodes (requires SSD for dedicated OS drive)
Ceph Monitors, Object Gateways and Metadata Servers nodes require redundant deployment
iSCSI Gateways, Object Gateways and Metadata Servers require incremental 4 GB RAM and four cores
Separate management node with 4 GB RAM, four cores, 1 TB capacity
Seven Object Storage Nodes
No single node exceeds ~15% of total storage
10 Gb Ethernet (four physical networks bonded to multiple switches)
56+ OSDs per storage cluster
RAID 1 OS disks for each OSD storage node
SSDs for Journal with 6:1 ratio SSD journal to OSD
1.5 GB of RAM per TB of raw OSD capacity for each Object Storage Node
2 GHz per OSD for each Object Storage Node
Dedicated physical infrastructure nodes
Three Ceph Monitor nodes: 4 GB RAM, 4 core processor, RAID 1 SSDs for disk
One SES management node: 4 GB RAM, 4 core processor, RAID 1 SSDs for disk
Redundant physical deployment of gateway or Metadata Server nodes:
Object Gateway nodes: 32 GB RAM, 8 core processor, RAID 1 SSDs for disk
iSCSI Gateway nodes: 16 GB RAM, 4 core processor, RAID 1 SSDs for disk
Metadata Server nodes (one active/one hot standby): 32 GB RAM, 8 core processor, RAID 1 SSDs for disk
This section contains important information about integrating SUSE Enterprise Storage 5.5 with other SUSE products.
SUSE Manager and SUSE Enterprise Storage are not integrated, therefore SUSE Manager cannot currently manage a SUSE Enterprise Storage 5.5 cluster.
Ceph admin node is a Ceph cluster node where the Salt master service is running. The admin node is a central point of the Ceph cluster because it manages the rest of the cluster nodes by querying and instructing their Salt minion services. It usually includes other services as well, for example the openATTIC Web UI with the Grafana dashboard backed by the Prometheus monitoring toolkit.
In case of the Ceph admin node failure, you usually need to provide a new working hardware for the node and restore the complete cluster configuration stack from a recent backup. Such method is time consuming and causes cluster outage.
To prevent the Ceph cluster performance downtime caused by the admin node failure, we recommend to make use of the High Availability (HA) cluster for the Ceph admin node.
The idea of an HA cluster is that in case of one cluster node failure, the other node automatically takes over its role including the virtualized Ceph admin node. This way other Ceph cluster nodes do not notice that the Ceph admin node failed.
The minimal HA solution for the Ceph admin node requires the following hardware:
Two bare metal servers able to run SUSE Linux Enterprise with the High Availability extension and virtualize the Ceph admin node.
Two or more redundant network communication paths, for example via Network Device Bonding.
Shared storage to host the disk image(s) of the Ceph admin node virtual machine. The shared storage needs to be accessible form both servers. It can be for example an NFS export, a Samba share, or iSCSI target.
Find more details on the cluster requirements at https://documentation.suse.com/sle-ha/12-SP5/single-html/SLE-HA-install-quick/#sec-ha-inst-quick-req.
The following procedure summarizes the most important steps of building the HA cluster for virtualizing the Ceph admin node. For details, refer to the indicated links.
Set up a basic 2-node HA cluster with shared storage as described in https://documentation.suse.com/sle-ha/12-SP5/single-html/SLE-HA-install-quick/#art-sleha-install-quick.
On both cluster nodes, install all packages required for running the KVM
hypervisor and the libvirt
toolkit as described in
https://documentation.suse.com/sles/12-SP5/single-html/SLES-virtualization/#sec-vt-installation-kvm.
On the first cluster node, create a new KVM virtual machine (VM) making
use of libvirt
as described in
https://documentation.suse.com/sles/12-SP5/single-html/SLES-virtualization/#sec-libvirt-inst-virt-install.
Use the preconfigured shared storage to store the disk images of the VM.
After the VM setup is complete, export its configuration to an XML file on the shared storage. Use the following syntax:
root #
virsh dumpxml VM_NAME > /path/to/shared/vm_name.xml
Create a resource for the Admin Node VM. Refer to https://documentation.suse.com/sle-ha/12-SP5/single-html/SLE-HA-guide/#cha-conf-hawk2 for general info on creating HA resources. Detailed info on creating resource for a KVM virtual machine is described in http://www.linux-ha.org/wiki/VirtualDomain_%28resource_agent%29.
On the newly created VM guest, deploy the Ceph admin node including the additional services you need there. Follow relevant steps in Section 4.3, “Cluster Deployment”. At the same time, deploy the remaining Ceph cluster nodes on the non-HA cluster servers.
The ceph-deploy
cluster deployment tool was deprecated in
SUSE Enterprise Storage 4 and is completely removed in favor of DeepSea as of
SUSE Enterprise Storage 5.
This chapter introduces steps to upgrade SUSE Enterprise Storage from the previous release(s) to version 5.5.
This chapter explains which files on the admin node should be backed up. As soon as you are finished with your cluster deployment or migration, create a backup of these directories.
You can change the default cluster configuration generated in Stage 2 (refer to DeepSea Stages Description). For example, you may need to change network settings, or software that is installed on the Salt master by default installed. You can perform the former by modifying the pillar updated after S…
ceph-deploy
Removed in SUSE Enterprise Storage 5.5
The ceph-deploy
cluster deployment tool was deprecated in
SUSE Enterprise Storage 4 and is completely removed in favor of DeepSea as of
SUSE Enterprise Storage 5.
Salt along with DeepSea is a stack of components that help you deploy and manage server infrastructure. It is very scalable, fast, and relatively easy to get running. Read the following considerations before you start deploying the cluster with Salt:
Salt minions are the nodes controlled by a dedicated node called Salt master. Salt minions have roles, for example Ceph OSD, Ceph Monitor, Ceph Manager, Object Gateway, iSCSI Gateway, or NFS Ganesha.
A Salt master runs its own Salt minion. It is required for running privileged tasks—for example creating, authorizing, and copying keys to minions—so that remote minions never need to run privileged tasks.
You will get the best performance from your Ceph cluster when each role is deployed on a separate node. But real deployments sometimes require sharing one node for multiple roles. To avoid troubles with performance and upgrade procedure, do not deploy the Ceph OSD, Metadata Server, or Ceph Monitor role to the Salt master.
Salt minions need to correctly resolve the Salt master's host name over the
network. By default, they look for the salt
host
name, but you can specify any other network-reachable host name in the
/etc/salt/minion
file, see
Section 4.3, “Cluster Deployment”.
In the release notes you can find additional information on changes since the previous release of SUSE Enterprise Storage. Check the release notes to see whether:
your hardware needs special considerations.
any used software packages have changed significantly.
special precautions are necessary for your installation.
The release notes also provide information that could not make it into the manual on time. They also contain notes about known issues.
After having installed the package release-notes-ses,
find the release notes locally in the directory
/usr/share/doc/release-notes
or online at
https://www.suse.com/releasenotes/.
The goal of DeepSea is to save the administrator time and confidently perform complex operations on a Ceph cluster.
Ceph is a very configurable software solution. It increases both the freedom and responsibility of system administrators.
The minimal Ceph setup is good for demonstration purposes, but does not show interesting features of Ceph that you can see with a big number of nodes.
DeepSea collects and stores data about individual servers, such as addresses and device names. For a distributed storage system such as Ceph, there can be hundreds of such items to collect and store. Collecting the information and entering the data manually into a configuration management tool is exhausting and error prone.
The steps necessary to prepare the servers, collect the configuration, and configure and deploy Ceph are mostly the same. However, this does not address managing the separate functions. For day to day operations, the ability to trivially add hardware to a given function and remove it gracefully is a requirement.
DeepSea addresses these observations with the following strategy: DeepSea consolidates the administrator's decisions in a single file. The decisions include cluster assignment, role assignment and profile assignment. And DeepSea collects each set of tasks into a simple goal. Each goal is a stage:
Stage 0—the preparation— during this stage, all required updates are applied and your system may be rebooted.
If, during Stage 0, the Salt master reboots to load the new kernel version, you need to run Stage 0 again, otherwise minions will not be targeted.
Stage 1—the discovery—here you detect all hardware in your cluster and collect necessary information for the Ceph configuration. For details about configuration, refer to Section 4.5, “Configuration and Customization”.
Stage 2—the configuration—you need to prepare configuration data in a particular format.
Stage 3—the deployment—creates a basic Ceph cluster with mandatory Ceph services. See Section 1.2.3, “Ceph Nodes and Daemons” for their list.
Stage 4—the services—additional features of Ceph like iSCSI, Object Gateway and CephFS can be installed in this stage. Each is optional.
Stage 5—the removal stage. This stage is not mandatory and during the initial setup it is usually not needed. In this stage the roles of minions and also the cluster configuration are removed. You need to run this stage when you need to remove a storage node from your cluster. For details refer to Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.3 “Removing and Reinstalling Cluster Nodes”.
You can find a more detailed introduction into DeepSea at https://github.com/suse/deepsea/wiki.
Salt has several standard locations and several naming conventions used on your master node:
/srv/pillar
The directory stores configuration data for your cluster minions. Pillar is an interface for providing global configuration values to all your cluster minions.
/srv/salt/
The directory stores Salt state files (also called sls files). State files are formatted descriptions of states in which the cluster should be. For more information, refer to the Salt documentation.
/srv/module/runners
The directory stores Python scripts known as runners. Runners are executed on the master node.
/srv/salt/_modules
The directory stores Python scripts that are called modules. The modules are applied to all minions in your cluster.
/srv/pillar/ceph
The directory is used by DeepSea. Collected configuration data are stored here.
/srv/salt/ceph
A directory used by DeepSea. It stores sls files that can be in
different formats, but each subdirectory contains sls files. Each
subdirectory contains only one type of sls file. For example,
/srv/salt/ceph/stage
contains orchestration files
that are executed by salt-run state.orchestrate
.
DeepSea commands are executed via the Salt infrastructure. When using
the salt
command, you need to specify a set of
Salt minions that the command will affect. We describe the set of the minions
as a target for the salt
command.
The following sections describe possible methods to target the minions.
You can target a minion or a group of minions by matching their names. A minion's name is usually the short host name of the node where the minion runs. This is a general Salt targeting method, not related to DeepSea. You can use globbing, regular expressions, or lists to limit the range of minion names. The general syntax follows:
root@master #
salt target example.module
If all Salt minions in your environment belong to your Ceph cluster, you
can safely substitute target with
'*'
to include all registered
minions.
Match all minions in the example.net domain (assuming the minion names are identical to their "full" host names):
root@master #
salt '*.example.net' test.ping
Match the 'web1' to 'web5' minions:
root@master #
salt 'web[1-5]' test.ping
Match both 'web1-prod' and 'web1-devel' minions using a regular expression:
root@master #
salt -E 'web1-(prod|devel)' test.ping
Match a simple list of minions:
root@master #
salt -L 'web1,web2,web3' test.ping
Match all minions in the cluster:
root@master #
salt '*' test.ping
In a heterogeneous Salt-managed environment where SUSE Enterprise Storage 5.5 is deployed on a subset of nodes alongside other cluster solution(s), it is a good idea to 'mark' the relevant minions by applying a 'deepsea' grain to them. This way you can easily target DeepSea minions in environments where matching by the minion name is problematic.
To apply the 'deepsea' grain to a group of minions, run:
root@master #
salt target grains.append deepsea default
To remove the 'deepsea' grain from a group of minions, run:
root@master #
salt target grains.delval deepsea destructive=True
After applying the 'deepsea' grain to the relevant minions, you can target them as follows:
root@master #
salt -G 'deepsea:*' test.ping
The following command is an equivalent:
root@master #
salt -C 'G@deepsea:*' test.ping
deepsea_minions
Option #Edit source
Setting the deepsea_minions
option's target is a
requirement for DeepSea deployments. DeepSea uses it to instruct
minions during stages execution (refer to
DeepSea Stages Description for details.
To set or change the deepsea_minions
option, edit the
/srv/pillar/ceph/deepsea_minions.sls
file on the
Salt master and add or replace the following line:
deepsea_minions: target
deepsea_minions
Target
As the target for the
deepsea_minions
option, you can use any targeting
method: both
Matching the Minion Name and
Targeting with a 'deepsea' Grain.
Match all Salt minions in the cluster:
deepsea_minions: '*'
Match all minions with the 'deepsea' grain:
deepsea_minions: 'G@deepsea:*'
You can use more advanced ways to target minions using the Salt infrastructure. Refer to https://docs.saltstack.com/en/latest/topics/targeting/ for a description of all targeting techniques.
Also, the 'deepsea-minions' manual page gives you more detail about
DeepSea targeting (man 7 deepsea_minions
).
The cluster deployment process has several phases. First, you need to prepare all nodes of the cluster by configuring Salt and then deploy and configure Ceph.
If you need to skip defining OSD profiles and deploy the monitor nodes
first, you can do so by setting the DEV_ENV
variable. It
allows deploying monitors without the presence of the
profile/
directory, as well as deploying a cluster
with at least one storage, monitor, and manager node.
To set the environment variable, either enable it globally by setting it in
the /srv/pillar/ceph/stack/global.yml
file, or set it
for the current shell session only:
root@master #
export DEV_ENV=true
The following procedure describes the cluster preparation in detail.
Install and register SUSE Linux Enterprise Server 12 SP3 together with SUSE Enterprise Storage 5.5 extension on each node of the cluster.
SUSE Linux Enterprise Server 12 SP4 is not a supported base operating system for SUSE Enterprise Storage 5.5.
Verify that proper products are installed and registered by listing existing software repositories. The list will be similar to this output:
root #
zypper lr -E
# | Alias | Name | Enabled | GPG Check | Refresh
---+---------+-----------------------------------+---------+-----------+--------
4 | [...] | SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No
6 | [...] | SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes
9 | [...] | SLES12-SP3-Pool | Yes | (r ) Yes | No
11 | [...] | SLES12-SP3-Updates | Yes | (r ) Yes | Yes
LTSS updates for SUSE Linux Enterprise Server are delivered as part of the SUSE Enterprise Storage 5.5 repositories. Therefore, no LTSS repositories need to be added.
Configure network settings including proper DNS name resolution on each node. The Salt master and all the Salt minions need to resolve each other by their host names. For more information on configuring a network, see https://documentation.suse.com/sles/12-SP5/single-html/SLES-admin/#sec-basicnet-yast For more information on configuring a DNS server, see https://documentation.suse.com/sles/12-SP5/single-html/SLES-admin/#cha-dns.
Select one or more time servers/pools, and synchronize the local time
against them. Verify that the time synchronization service is enabled on
each system start-up. You can use the yast ntp-client
command found in a yast2-ntp-client package to
configure time synchronization.
Virtual machines are not reliable NTP sources.
Find more information on setting up NTP in https://documentation.suse.com/sles/12-SP5/single-html/SLES-admin/#sec-ntp-yast.
Install the salt-master
and
salt-minion
packages on the Salt master node:
root@master #
zypper in salt-master salt-minion
Check that the salt-master
service is enabled and
started, and enable and start it if needed:
root@master #
systemctl enable salt-master.serviceroot@master #
systemctl start salt-master.service
If you intend to use firewall, verify that the Salt master node has ports
4505 and 4506 open to all Salt minion nodes. If the ports are closed, you
can open them using the yast2 firewall
command by
allowing the service.
DeepSea deployment stages fail when firewall is active (and even configured). To pass the stages correctly, you need to either turn the firewall off by running
root #
systemctl stop SuSEfirewall2.service
or set the FAIL_ON_WARNING
option to 'False' in
/srv/pillar/ceph/stack/global.yml
:
FAIL_ON_WARNING: False
Install the package salt-minion
on all minion nodes.
root #
zypper in salt-minion
Make sure that the fully qualified domain name of each node can be resolved to the public network IP address by all other nodes.
Configure all minions (including the master minion) to connect to the
master. If your Salt master is not reachable by the host name
salt
, edit the file
/etc/salt/minion
or create a new file
/etc/salt/minion.d/master.conf
with the following
content:
master: host_name_of_salt_master
If you performed any changes to the configuration files mentioned above, restart the Salt service on all Salt minions:
root@minion >
systemctl restart salt-minion.service
Check that the salt-minion
service is enabled and
started on all nodes. Enable and start it if needed:
root #
systemctl enable salt-minion.serviceroot #
systemctl start salt-minion.service
Verify each Salt minion's fingerprint and accept all salt keys on the Salt master if the fingerprints match.
View each minion's fingerprint:
root@minion >
salt-call --local key.finger
local:
3f:a3:2f:3f:b4:d3:d9:24:49:ca:6b:2c:e1:6c:3f:c3:83:37:f0:aa:87:42:e8:ff...
After gathering fingerprints of all the Salt minions, list fingerprints of all unaccepted minion keys on the Salt master:
root@master #
salt-key -F
[...]
Unaccepted Keys:
minion1:
3f:a3:2f:3f:b4:d3:d9:24:49:ca:6b:2c:e1:6c:3f:c3:83:37:f0:aa:87:42:e8:ff...
If the minions' fingerprint match, accept them:
root@master #
salt-key --accept-all
Verify that the keys have been accepted:
root@master #
salt-key --list-all
Prior to deploying SUSE Enterprise Storage 5.5, manually zap all the disks. Remember to replace 'X' with the correct disk letter:
Stop all processes that are using the specific disk.
Verify whether any partition on the disk is mounted, and unmount if needed.
If the disk is managed by LVM, deactivate and delete the whole LVM infrastructure. Refer to https://documentation.suse.com/sles/12-SP5/single-html/SLES-storage/#cha-lvm for more details.
If the disk is part of MD RAID, deactivate the RAID. Refer to https://documentation.suse.com/sles/12-SP5/single-html/SLES-storage/#part-software-raid for more details.
If you get error messages such as 'partition in use' or 'kernel can not be updated with the new partition table' during the following steps, reboot the server.
Wipe the beginning of each partition (as root
):
for partition in /dev/sdX[0-9]* do dd if=/dev/zero of=$partition bs=4096 count=1 oflag=direct done
Wipe the beginning of the drive:
root #
dd if=/dev/zero of=/dev/sdX bs=512 count=34 oflag=direct
Wipe the end of the drive:
root #
dd if=/dev/zero of=/dev/sdX bs=512 count=33 \
seek=$((`blockdev --getsz /dev/sdX` - 33)) oflag=direct
Create a new GPT partition table:
root #
sgdisk -Z --clear -g /dev/sdX
Verify the result with:
root #
parted -s /dev/sdX print free
or
root #
dd if=/dev/sdX bs=512 count=34 | hexdump -Croot #
dd if=/dev/sdX bs=512 count=33 \ skip=$((`blockdev --getsz /dev/sdX` - 33)) | hexdump -C
Optionally, if you need to preconfigure the cluster's network settings
before the deepsea package is installed, create
/srv/pillar/ceph/stack/ceph/cluster.yml
manually and
set the cluster_network:
and
public_network:
options. Note that the file will not be
overwritten after you install deepsea.
Install DeepSea on the Salt master node:
root@master #
zypper in deepsea
Check that the file
/srv/pillar/ceph/master_minion.sls
on the Salt master
points to your Salt master. If your Salt master is reachable via more host
names, use the one suitable for the storage cluster. If you used the
default host name for your
Salt master—salt—in the
ses domain, then the file looks as follows:
master_minion: salt.ses
Now you deploy and configure Ceph. Unless specified otherwise, all steps are mandatory.
There are two possible ways how to run salt-run
state.orch
—one is with stage.<stage
number>
, the other is with the name of the stage. Both
notations have the same impact and it is fully your preference which
command you use.
Include the Salt minions belonging to the Ceph cluster that you are currently deploying. Refer to Section 4.2.2.1, “Matching the Minion Name” for more information on targeting the minions.
Prepare your cluster. Refer to DeepSea Stages Description for more details.
root@master #
salt-run state.orch ceph.stage.0
or
root@master #
salt-run state.orch ceph.stage.prep
Using the DeepSea CLI, you can follow the stage execution progress in real-time, either by running the DeepSea CLI in the monitoring mode, or by running the stage directly through DeepSea CLI. For details refer to Section 4.4, “DeepSea CLI”.
Optional: create Btrfs sub-volumes for
/var/lib/ceph/
. This step should only be executed
before the next stages of DeepSea have been executed. To migrate
existing directories or for more details, see
Book “Administration Guide”, Chapter 20 “Hints and Tips”, Section 20.6 “Btrfs Sub-volume for /var/lib/ceph”.
root@master #
salt-run state.orch ceph.migrate.subvolume
The discovery stage collects data from all minions and creates
configuration fragments that are stored in the directory
/srv/pillar/ceph/proposals
. The data are stored in
the YAML format in *.sls or *.yml files.
root@master #
salt-run state.orch ceph.stage.1
or
root@master #
salt-run state.orch ceph.stage.discovery
After the previous command finishes successfully, create a
policy.cfg
file in
/srv/pillar/ceph/proposals
. For details refer to
Section 4.5.1, “The policy.cfg
File”.
If you need to change the cluster's network setting, edit
/srv/pillar/ceph/stack/ceph/cluster.yml
and adjust
the lines starting with cluster_network:
and
public_network:
.
The configuration stage parses the policy.cfg
file
and merges the included files into their final form. Cluster and role
related content are placed in
/srv/pillar/ceph/cluster
, while Ceph specific
content is placed in /srv/pillar/ceph/stack/default
.
Run the following command to trigger the configuration stage:
root@master #
salt-run state.orch ceph.stage.2
or
root@master #
salt-run state.orch ceph.stage.configure
The configuration step may take several seconds. After the command
finishes, you can view the pillar data for the specified minions (for
example, named ceph_minion1
,
ceph_minion2
, etc.) by running:
root@master #
salt 'ceph_minion*' pillar.items
As soon as the command finishes, you can view the default configuration and change it to suit your needs. For details refer to Chapter 7, Customizing the Default Configuration.
Now you run the deployment stage. In this stage, the pillar is validated, and monitors and ODS daemons are started on the storage nodes. Run the following to start the stage:
root@master #
salt-run state.orch ceph.stage.3
or
root@master #
salt-run state.orch ceph.stage.deploy
The command may take several minutes. If it fails, you need to fix the issue and run the previous stages again. After the command succeeds, run the following to check the status:
cephadm >
ceph -s
The last step of the Ceph cluster deployment is the services stage. Here you instantiate any of the currently supported services: iSCSI Gateway, CephFS, Object Gateway, openATTIC, and NFS Ganesha. In this stage, the necessary pools, authorizing keyrings, and starting services are created. To start the stage, run the following:
root@master #
salt-run state.orch ceph.stage.4
or
root@master #
salt-run state.orch ceph.stage.services
Depending on the setup, the command may run for several minutes.
DeepSea also provides a CLI tool that allows the user to monitor or run stages while visualizing the execution progress in real-time.
Two modes are supported for visualizing a stage's execution progress:
Monitoring mode: visualizes the execution
progress of a DeepSea stage triggered by the salt-run
command issued in another terminal session.
Stand-alone mode: runs a DeepSea stage while providing real-time visualization of its component steps as they are executed.
The DeepSea CLI commands can only be run on the Salt master node with the
root
privileges.
The progress monitor provides a detailed, real-time visualization of what
is happening during execution of stages using salt-run
state.orch
commands in other terminal sessions.
You need to start the monitor in a new terminal window
before running any salt-run
state.orch
so that the monitor can detect the start of the
stage's execution.
If you start the monitor after issuing the salt-run
state.orch
command, then no execution progress will be shown.
You can start the monitor mode by running the following command:
root@master #
deepsea monitor
For more information about the available command line options of the
deepsea monitor
command check its manual page:
cephadm >
man deepsea-monitor
In the stand-alone mode, DeepSea CLI can be used to run a DeepSea stage, showing its execution in real-time.
The command to run a DeepSea stage from the DeepSea CLI has the following form:
root@master #
deepsea stage run stage-name
where stage-name corresponds to the way Salt
orchestration state files are referenced. For example, stage
deploy, which corresponds to the directory
located in /srv/salt/ceph/stage/deploy
, is referenced
as ceph.stage.deploy.
This command is an alternative to the Salt-based commands for running DeepSea stages (or any DeepSea orchestration state file).
The command deepsea stage run ceph.stage.0
is equivalent
to salt-run state.orch ceph.stage.0
.
For more information about the available command line options accepted by
the deepsea stage run
command check its manual page:
root@master #
man deepsea-stage run
In the following figure shows an example of the output of the DeepSea CLI when running Stage 2:
stage run
Alias #Edit source
For advanced users of Salt, we also support an alias for running a
DeepSea stage that takes the Salt command used to run a stage, for
example, salt-run state.orch
stage-name
, as a command of the
DeepSea CLI.
Example:
root@master #
deepsea salt-run state.orch stage-name
policy.cfg
File #Edit source
The /srv/pillar/ceph/proposals/policy.cfg
configuration file is used to determine roles of individual cluster nodes.
For example, which node acts as an OSD or which as a monitor node. Edit
policy.cfg
in order to reflect your desired cluster
setup. The order of the sections is arbitrary, but the content of included
lines overwrites matching keys from the content of previous lines.
policy.cfg
You can find several examples of complete policy files in the
/usr/share/doc/packages/deepsea/examples/
directory.
In the cluster section you select minions for your cluster. You can select all minions, or you can blacklist or whitelist minions. Examples for a cluster called ceph follow.
To include all minions, add the following lines:
cluster-ceph/cluster/*.sls
To whitelist a particular minion:
cluster-ceph/cluster/abc.domain.sls
or a group of minions—you can shell glob matching:
cluster-ceph/cluster/mon*.sls
To blacklist minions, set the them to
unassigned
:
cluster-unassigned/cluster/client*.sls
This section provides you with details on assigning 'roles' to your
cluster nodes. A 'role' in this context means the service you need to run
on the node, such as Ceph Monitor, Object Gateway, iSCSI Gateway, or openATTIC. No role is assigned
automatically, only roles added to policy.cfg
will be
deployed.
The assignment follows this pattern:
role-ROLE_NAME/PATH/FILES_TO_INCLUDE
Where the items have the following meaning and values:
ROLE_NAME is any of the following: 'master', 'admin', 'mon', 'mgr', 'mds', 'igw', 'rgw', 'ganesha', or 'openattic'.
PATH is a relative directory path to .sls or
.yml files. In case of .sls files, it usually is
cluster
, while .yml files are located at
stack/default/ceph/minions
.
FILES_TO_INCLUDE are the Salt state files
or YAML configuration files. They normally consist of Salt minions host
names, for example ses5min2.yml
. Shell globbing can
be used for more specific matching.
An example for each role follows:
master - the node has admin keyrings to all Ceph clusters. Currently, only a single Ceph cluster is supported. As the master role is mandatory, always add a similar line to the following:
role-master/cluster/master*.sls
admin - the minion will have an admin keyring. You define the role as follows:
role-admin/cluster/abc*.sls
mon - the minion will provide the monitoring service to the Ceph cluster. This role requires addresses of the assigned minions. As of SUSE Enterprise Storage 5.5, the public address are calculated dynamically and are no longer needed in the Salt pillar.
role-mon/cluster/mon*.sls
The example assigns the monitoring role to a group of minions.
mgr - the Ceph manager daemon which collects all the state information from the whole cluster. Deploy it on all minions where you plan to deploy the Ceph monitor role.
role-mgr/cluster/mgr*.sls
mds - the minion will provide the metadata service to support CephFS.
role-mds/cluster/mds*.sls
igw - the minion will act as an iSCSI Gateway. This role
requires addresses of the assigned minions, thus you need to also
include the files from the stack
directory:
role-igw/cluster/*.sls
rgw - the minion will act as an Object Gateway:
role-rgw/cluster/rgw*.sls
openattic - the minion will act as an openATTIC server:
role-openattic/cluster/openattic*.sls
For more information, see Book “Administration Guide”, Chapter 17 “openATTIC”.
ganesha - the minion will act as an NFS Ganesha server. The 'ganesha' role requires either an 'rgw' or 'mds' role in cluster, otherwise the validation will fail in Stage 3.
To successfully install NFS Ganesha, additional configuration is required. If you want to use NFS Ganesha, read Chapter 12, Installation of NFS Ganesha before executing stages 2 and 4. However, it is possible to install NFS Ganesha later.
In some cases it can be useful to define custom roles for NFS Ganesha nodes. For details, see Book “Administration Guide”, Chapter 16 “NFS Ganesha: Export Ceph Data via NFS”, Section 16.3 “Custom NFS Ganesha Roles”.
You can assign several roles to a single node. For example, you can assign the mds roles to the monitor nodes:
role-mds/cluster/mon[1,2]*.sls
The common configuration section includes configuration files generated
during the discovery (Stage 1). These configuration
files store parameters like fsid
or
public_network
. To include the required Ceph common
configuration, add the following lines:
config/stack/default/global.yml config/stack/default/ceph/cluster.yml
In Ceph, a single storage role would be insufficient to describe the
many disk configurations available with the same hardware. DeepSea stage
1 will generate a default storage profile proposal. By default this
proposal will be a bluestore
profile and will try to
propose the highest performing configuration for the given hardware setup.
For example, external journals will be preferred over a single disk
containing objects and metadata. Solid state storage will be prioritized
over spinning disks. Profiles are assigned in the
policy.cfg
similar to roles.
The default proposal can be found in the profile-default directory tree.
To include this add the following two lines to your
policy.cfg
.
profile-default/cluster/*.sls profile-default/stack/default/ceph/minions/*.yml
You can also create a customized storage profile to your liking by using the proposal runner. This runner offers three methods: help, peek, and populate.
salt-run proposal.help
prints the runner help text
about the various arguments it accepts.
salt-run proposal.peek
shows the generated proposal
according to the arguments passed.
salt-run proposal.populate
writes the proposal to the
/srv/pillar/ceph/proposals
subdirectory. Pass
name=myprofile
to name the storage profile. This will
result in a profile-myprofile subdirectory.
For all other arguments, consult the output of salt-run
proposal.help
.
If you have a Salt minion with multiple disk devices assigned and the device
names do not seem to be consistent or persistent, you can override the
default search behavior by editing
/srv/pillar/ceph/stack/global.yml
:
Edit global.yml
and make the necessary changes:
To override the default match expression of -name ata* -o -name
scsi* -o -name nvme*
for the find
command
with for example -name wwn*
, add the following:
ceph: modules: cephdisks: device: match: '-name wwn*'
To override the default pathname of /dev/disk/by-id
with for example /dev/disk/by-label
, add the
following:
ceph: modules: cephdisks: device: pathname: '/dev/disk/by-label'
Refresh the Pillar:
root@master #
salt 'DEEPSEA_MINIONS' saltutil.pillar_refresh
Try a query for a device that was previously wrongly assigned:
root@master #
salt 'SPECIFIC_MINION' cephdisks.device PATH_TO_DEVICE
If the command returns 'module not found', be sure to synchronize:
root@master #
salt '*' saltutil.sync_all
Since SUSE Enterprise Storage 5.5, OSDs are by default deployed using BlueStore instead of FileStore. Although BlueStore supports encryption, Ceph OSDs are deployed unencrypted by default. The following procedure describes steps to encrypt OSDs during the upgrade process. Let us assume that both data and WAL/DB disks to be used for OSD deployment are clean with no partitions. If the disk were previously used, wipe them following the procedure described in Step 12.
To use encrypted OSDs for your new deployment, first wipe the disks
following the procedure described in Step 12,
then use the proposal.populate
runner with the
encryption=dmcrypt
argument:
root@master #
salt-run proposal.populate encryption=dmcrypt
Encrypted OSDs require longer boot and activation times compared to the default unencrypted ones.
Determine the bluestore block db size
and
bluestore block wal size
values for your deployment and
add them to the
/srv/salt/ceph/configuration/files/ceph.conf.d/global.conf
file on the Salt master. The values need to be specified in bytes.
[global] bluestore block db size = 48318382080 bluestore block wal size = 2147483648
For more information on customizing the ceph.conf
file, refer to Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.12 “Adjusting ceph.conf
with Custom Settings”.
Run DeepSea Stage 3 to distribute the changes:
root@master #
salt-run state.orch ceph.stage.3
Verify that the ceph.conf
file is updated on the
relevant OSD nodes:
root@minion >
cat /etc/ceph/ceph.conf
Edit the *.yml files in the
/srv/pillar/ceph/proposals/profile-default/stack/default/ceph/minions
directory that are relevant to the OSDs you are encrypting. Double check
their path with the one defined in the
/srv/pillar/ceph/proposals/policy.cfg
file to
ensure that you modify the correct *.yml files.
When identifying OSD disks in the
/srv/pillar/ceph/proposals/profile-default/stack/default/ceph/minions/*.yml
files, use long disk identifiers.
An example of an OSD configuration follows. Note that because we need
encryption, the db_size
and wal_size
options are removed:
ceph: storage: osds: /dev/disk/by-id/scsi-SDELL_PERC_H730_Mini_007027b1065faa972100d34d7aa06d86: format: bluestore encryption: dmcrypt db: /dev/disk/by-id/nvme-INTEL_SSDPEDMD020T4D_HHHL_NVMe_2000GB_PHFT642400HV2P0EGN wal: /dev/disk/by-id/nvme-INTEL_SSDPEDMD020T4D_HHHL_NVMe_2000GB_PHFT642400HV2P0EGN /dev/disk/by-id/scsi-SDELL_PERC_H730_Mini_00d146b1065faa972100d34d7aa06d86: format: bluestore encryption: dmcrypt db: /dev/disk/by-id/nvme-INTEL_SSDPEDMD020T4D_HHHL_NVMe_2000GB_PHFT642400HV2P0EGN wal: /dev/disk/by-id/nvme-INTEL_SSDPEDMD020T4D_HHHL_NVMe_2000GB_PHFT642400HV2P0EGN
Deploy the new Block Storage OSDs with encryption by running DeepSea Stages 2 and 3:
root@master #
salt-run state.orch ceph.stage.2root@master #
salt-run state.orch ceph.stage.3
You can watch the progress with ceph -s
or
ceph osd tree
. It is critical that you let the
cluster rebalance before repeating the process on the next OSD node.
Sometimes it is not practical to include all files from a given directory
with *.sls globbing. The policy.cfg
file parser
understands the following filters:
This section describes filtering techniques for advanced users. When not used correctly, filtering can cause problems for example in case your node numbering changes.
Use the slice filter to include only items start
through end-1. Note that items in the given
directory are sorted alphanumerically. The following line includes the
third to fifth files from the role-mon/cluster/
subdirectory:
role-mon/cluster/*.sls slice[3:6]
Use the regular expression filter to include only items matching the given expressions. For example:
role-mon/cluster/mon*.sls re=.*1[135]\.subdomainX\.sls$
policy.cfg
File #Edit source
Following is an example of a basic policy.cfg
file:
## Cluster Assignment cluster-ceph/cluster/*.sls 1 ## Roles # ADMIN role-master/cluster/examplesesadmin.sls 2 role-admin/cluster/sesclient*.sls 3 # MON role-mon/cluster/ses-example-[123].sls 4 # MGR role-mgr/cluster/ses-example-[123].sls 5 # MDS role-mds/cluster/ses-example-4.sls 6 # IGW role-igw/cluster/ses-example-4.sls 7 # RGW role-rgw/cluster/ses-example-4.sls 8 # openATTIC role-openattic/cluster/openattic*.sls 9 # COMMON config/stack/default/global.yml 10 config/stack/default/ceph/cluster.yml 11 ## Profiles profile-default/cluster/*.sls 12 profile-default/stack/default/ceph/minions/*.yml 13
Indicates that all minions are included in the Ceph cluster. If you have minions you do not want to include in the Ceph cluster, use: cluster-unassigned/cluster/*.sls cluster-ceph/cluster/ses-example-*.sls The first line marks all minions as unassigned. The second line overrides minions matching 'ses-example-*.sls', and assigns them to the Ceph cluster. | |
The minion called 'examplesesadmin' has the 'master' role. This, by the way, means it will get admin keys to the cluster. | |
All minions matching 'sesclient*' will get admin keys as well. | |
All minions matching 'ses-example-[123]' (presumably three minions: ses-example-1, ses-example-2, and ses-example-3) will be set up as MON nodes. | |
All minions matching 'ses-example-[123]' (all MON nodes in the example) will be set up as MGR nodes. | |
Minion 'ses-example-4' will have the MDS role. | |
Minion 'ses-example-4' will have the IGW role. | |
Minion 'ses-example-4' will have the RGW role. | |
Specifies to deploy the openATTIC user interface to administer the Ceph cluster. See Book “Administration Guide”, Chapter 17 “openATTIC” for more details. | |
Means that we accept the default values for common configuration
parameters such as | |
Means that we accept the default values for common configuration
parameters such as | |
We are telling DeepSea to use the default hardware profile for each minion. Choosing the default hardware profile means that we want all additional disks (other than the root disk) as OSDs. | |
We are telling DeepSea to use the default hardware profile for each minion. Choosing the default hardware profile means that we want all additional disks (other than the root disk) as OSDs. |
ceph.conf
with Custom Settings #Edit source
If you need to put custom settings into the ceph.conf
configuration file, see Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.12 “Adjusting ceph.conf
with Custom Settings” for more
details.
ceph-deploy
Deployment) to 5This chapter introduces steps to upgrade SUSE Enterprise Storage from the previous release(s) to version 5.5.
In the release notes you can find additional information on changes since the previous release of SUSE Enterprise Storage. Check the release notes to see whether:
your hardware needs special considerations.
any used software packages have changed significantly.
special precautions are necessary for your installation.
The release notes also provide information that could not make it into the manual on time. They also contain notes about known issues.
After having installed the package release-notes-ses ,
find the release notes locally in the directory
/usr/share/doc/release-notes
or online at
https://www.suse.com/releasenotes/.
Consider the following items before starting the upgrade procedure:
Before upgrading the Ceph cluster, you need to have both the underlying SUSE Linux Enterprise Server and SUSE Enterprise Storage correctly registered against SCC or SMT. You can upgrade daemons in your cluster while the cluster is online and in service. Certain types of daemons depend upon others. For example Ceph Object Gateways depend upon Ceph monitors and Ceph OSD daemons. We recommend upgrading in this order:
Ceph Monitors
Ceph Managers
Ceph OSDs
Metadata Servers
Object Gateways
iSCSI Gateways
NFS Ganesha
Remove not needed file system snapshots on the operating system partitions of nodes. This ensures that there is enough free disk space during the upgrade.
We recommend to check the cluster health before starting the upgrade procedure.
We recommend upgrading all the daemons of a specific type—for example all monitor daemons or all OSD daemons—one by one to ensure that they are all on the same release. We also recommend that you upgrade all the daemons in your cluster before you try to exercise new functionality in a release.
After all the daemons of a specific type are upgraded, check their status.
Ensure each monitor has rejoined the quorum after all monitors are upgraded:
cephadm >
ceph mon stat
Ensure each Ceph OSD daemon has rejoined the cluster after all OSDs are upgraded:
cephadm >
ceph osd stat
require-osd-release luminous
Flag
When the last OSD is upgraded to SUSE Enterprise Storage 5.5, the
monitor nodes will detect that all OSDs are running the 'luminous'
version of Ceph and they may complain that the
require-osd-release luminous
osdmap flag is not set. In
that case, you need to set this flag manually to acknowledge
that—now that the cluster has been upgraded to 'luminous'—it
cannot be downgraded back to Ceph 'jewel'. Set the flag by running the
following command:
cephadm >
ceph osd require-osd-release luminous
After the command completes, the warning disappears.
On fresh installs of SUSE Enterprise Storage 5.5, this flag is set automatically when the Ceph monitors create the initial osdmap, so no end user action is needed.
Since SUSE Enterprise Storage 5.5, OSDs are by default deployed using BlueStore instead of FileStore. Although BlueStore supports encryption, Ceph OSDs are deployed unencrypted by default. The following procedure describes steps to encrypt OSDs during the upgrade process. Let us assume that both data and WAL/DB disks to be used for OSD deployment are clean with no partitions. If the disk were previously used, wipe them following the procedure described in Step 12.
You need to deploy encrypted OSDs one by one, not simultaneously. The reason is that OSD's data is drained, and the cluster goes through several iterations of rebalancing.
Determine the bluestore block db size
and
bluestore block wal size
values for your deployment and
add them to the
/srv/salt/ceph/configuration/files/ceph.conf.d/global.conf
file on the Salt master. The values need to be specified in bytes.
[global] bluestore block db size = 48318382080 bluestore block wal size = 2147483648
For more information on customizing the ceph.conf
file, refer to Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.12 “Adjusting ceph.conf
with Custom Settings”.
Run DeepSea Stage 3 to distribute the changes:
root@master #
salt-run state.orch ceph.stage.3
Verify that the ceph.conf
file is updated on the
relevant OSD nodes:
root@minion >
cat /etc/ceph/ceph.conf
Edit the *.yml files in the
/srv/pillar/ceph/proposals/profile-default/stack/default/ceph/minions
directory that are relevant to the OSDs you are encrypting. Double check
their path with the one defined in the
/srv/pillar/ceph/proposals/policy.cfg
file to ensure
that you modify the correct *.yml files.
When identifying OSD disks in the
/srv/pillar/ceph/proposals/profile-default/stack/default/ceph/minions/*.yml
files, use long disk identifiers.
An example of an OSD configuration follows. Note that because we need
encryption, the db_size
and wal_size
options are removed:
ceph: storage: osds: /dev/disk/by-id/scsi-SDELL_PERC_H730_Mini_007027b1065faa972100d34d7aa06d86: format: bluestore encryption: dmcrypt db: /dev/disk/by-id/nvme-INTEL_SSDPEDMD020T4D_HHHL_NVMe_2000GB_PHFT642400HV2P0EGN wal: /dev/disk/by-id/nvme-INTEL_SSDPEDMD020T4D_HHHL_NVMe_2000GB_PHFT642400HV2P0EGN /dev/disk/by-id/scsi-SDELL_PERC_H730_Mini_00d146b1065faa972100d34d7aa06d86: format: bluestore encryption: dmcrypt db: /dev/disk/by-id/nvme-INTEL_SSDPEDMD020T4D_HHHL_NVMe_2000GB_PHFT642400HV2P0EGN wal: /dev/disk/by-id/nvme-INTEL_SSDPEDMD020T4D_HHHL_NVMe_2000GB_PHFT642400HV2P0EGN
Deploy the new Block Storage OSDs with encryption by running DeepSea Stages 2 and 3:
root@master #
salt-run state.orch ceph.stage.2root@master #
salt-run state.orch ceph.stage.3
You can watch the progress with ceph -s
or
ceph osd tree
. It is critical that you let the cluster
rebalance before repeating the process on the next OSD node.
You need to have the following software installed and updated to the latest package versions on all the Ceph nodes you want to upgrade before you can start with the upgrade procedure:
SUSE Linux Enterprise Server 12 SP2
SUSE Enterprise Storage 4
Although the cluster is fully functional during the upgrade, DeepSea sets the 'noout' flag which prevents Ceph from rebalancing data during downtime and therefore avoids unnecessary data transfers.
To optimize the upgrade process, DeepSea upgrades your nodes in the order, based on their assigned role as recommended by Ceph upstream: MONs, MGRs, OSDs, MDS, RGW, IGW, and NFS Ganesha.
Note that DeepSea cannot prevent the prescribed order from being violated if a node runs multiple services.
Although the Ceph cluster is operational during the upgrade, nodes may get rebooted in order to apply, for example, new kernel versions. To reduce waiting I/O operations, we recommend declining incoming requests for the duration of the upgrade process.
The cluster upgrade may take a very long time—approximately the time it takes to upgrade one machine multiplied by the number of cluster nodes.
Since Ceph Luminous, the osd crush location
configuration option is no longer supported. Update your DeepSea
configuration files to use crush location
before
upgrading.
There are two ways to obtain SUSE Linux Enterprise Server and SUSE Enterprise Storage 5.5 update repositories:
If your cluster nodes are registered with SUSEConnect and use SCC/SMT,
you will use the zypper migration
method and the
update repositories will be assigned automatically.
If you are not using SCC/SMT but a
Media-ISO or other package source, you will use the zypper
dup
method. In this case, you need to add the following
repositories to all cluster nodes manually: SLE12-SP3 Base, SLE12-SP3
Update, SES5 Base, and SES5 Update. You can do so using the
zypper
command. First remove all existing software
repositories, then add the required new ones, and finally refresh the
repositories sources:
root #
zypper sd {0..99}root #
zypper ar \ http://REPO_SERVER/repo/SUSE/Products/Storage/5/x86_64/product/ SES5-POOLroot #
zypper ar \ http://REPO_SERVER/repo/SUSE/Updates/Storage/5/x86_64/update/ SES5-UPDATESroot #
zypper ar \ http://REPO_SERVER/repo/SUSE/Products/SLE-SERVER/12-SP3/x86_64/product/ SLES12-SP3-POOLroot #
zypper ar \ http://REPO_SERVER/repo/SUSE/Updates/SLE-SERVER/12-SP3/x86_64/update/ SLES12-SP3-UPDATESroot #
zypper ref
To upgrade the SUSE Enterprise Storage 4 cluster to version 5, follow these steps:
Upgrade the Salt master node to SUSE Linux Enterprise Server 12 SP3 and SUSE Enterprise Storage
5.5. Depending on your upgrade method, use either
zypper migration
or zypper dup
.
Using rpm -q deepsea
, verify that the version of the
DeepSea package on the Salt master node starts with at least
0.7
. For example:
root #
rpm -q deepsea
deepsea-0.7.27+git.0.274c55d-5.1
If the DeepSea package version number starts with 0.6, double check whether you successfully migrated the Salt master node to SUSE Linux Enterprise Server 12 SP3 and SUSE Enterprise Storage 5.5.
Set the new internal object sort order, run:
cephadm >
ceph osd set sortbitwise
To verify that the command was successful, we recommend running
cephadm >
ceph osd dump --format json-pretty | grep sortbitwise
"flags": "sortbitwise,recovery_deletes,purged_snapdirs",
If your cluster nodes are not registered
with SUSEConnect and do not use SCC/SMT, you will use the zypper
dup
method. Change your Pillar data in order to use the
different strategy. Edit
/srv/pillar/ceph/stack/name_of_cluster/cluster.yml
and add the following line:
upgrade_init: zypper-dup
Update your Pillar:
root@master #
salt target saltutil.sync_all
See Section 4.2.2, “Targeting the Minions” for details about Salt minions targeting.
Verify that you successfully wrote to the Pillar:
root@master #
salt target pillar.get upgrade_init
The command's output should mirror the entry you added.
Upgrade Salt minions:
root@master #
salt target state.apply ceph.updates.salt
Verify that all Salt minions are upgraded:
root@master #
salt target test.version
Include the cluster's Salt minions. Refer to Section 4.2.2, “Targeting the Minions” of Procedure 4.1, “Running Deployment Stages” for more details.
Start the upgrade of SUSE Linux Enterprise Server and Ceph:
root@master #
salt-run state.orch ceph.maintenance.upgrade
Refer to Section 5.4.2, “Details on the salt target ceph.maintenance.upgrade
Command” for more
information.
If the process results in a reboot of the Salt master, re-run the command to start the upgrade process for the Salt minions again.
After the upgrade, the Ceph Managers are not installed yet. To reach a healthy cluster state, do the following:
Run Stage 0 to enable the Salt REST API:
root@master #
salt-run state.orch ceph.stage.0
Run Stage 1 to create the role-mgr/
subdirectory:
root@master #
salt-run state.orch ceph.stage.1
Edit Section 4.5.1, “The policy.cfg
File” and add a Ceph Manager role to the nodes
where Ceph Monitors are deployed, or uncomment the 'role-mgr' lines if you
followed the steps of Section 5.5, “Upgrade from SUSE Enterprise Storage 4 (ceph-deploy
Deployment) to 5” until
here. Also, add the openATTIC role to one of the cluster nodes. Refer to
Book “Administration Guide”, Chapter 17 “openATTIC” for more details.
Run Stage 2 to update the Pillar:
root@master #
salt-run state.orch ceph.stage.2
DeepSea uses a different approach to generate the
ceph.conf
configuration file now, refer to
Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.12 “Adjusting ceph.conf
with Custom Settings” for more details.
Set any of the three AppArmor states to all DeepSea minions. For example to disable them, run
root@master #
salt 'TARGET' state.apply ceph.apparmor.default-disable
For more information, refer to Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.13 “Enabling AppArmor Profiles”.
Run Stage 3 to deploy Ceph Managers:
root@master #
salt-run state.orch ceph.stage.3
Run Stage 4 to configure openATTIC properly:
root@master #
salt-run state.orch ceph.stage.4
If ceph.stage.3
fails with "Error EINVAL: entity
client.bootstrap-osd exists but caps do not match", it means the key
capabilities (caps) for the existing cluster's
client.bootstrap.osd
key do not match the caps that
DeepSea is trying to set. Above the error message, in red text, you can
see a dump of the ceph auth
command that failed. Look
at this command to check the key ID and file being used. In the case of
client.bootstrap-osd
, the command will be
cephadm >
ceph auth add client.bootstrap-osd \
-i /srv/salt/ceph/osd/cache/bootstrap.keyring
To fix mismatched key caps, check the content of the keyring file DeepSea is trying to deploy, for example:
cephadm >
cat /srv/salt/ceph/osd/cache/bootstrap.keyring
[client.bootstrap-osd]
key = AQD6BpVZgqVwHBAAQerW3atANeQhia8m5xaigw==
caps mgr = "allow r"
caps mon = "allow profile bootstrap-osd"
Compare this with the output of ceph auth get
client.bootstrap-osd
:
cephadm >
ceph auth get client.bootstrap-osd
exported keyring for client.bootstrap-osd
[client.bootstrap-osd]
key = AQD6BpVZgqVwHBAAQerW3atANeQhia8m5xaigw==
caps mon = "allow profile bootstrap-osd"
Note how the latter key is missing caps mgr = "allow
r"
. To fix this, run:
cephadm >
ceph auth caps client.bootstrap-osd mgr \
"allow r" mon "allow profile bootstrap-osd"
Running ceph.stage.3
should now succeed.
The same issue can occur with other daemon and gateway keyrings when
running ceph.stage.3
and
ceph.stage.4
. The same procedure as above applies:
check the command that failed, the keyring file being deployed, and the
capabilities of the existing key. Then run ceph auth
caps
to update the existing key capabilities to match to what
is being deployed by DeepSea. The keyring files that DeepSea tries to
deploy are typically placed under the
/srv/salt/ceph/DAEMON_OR_GATEWAY_NAME/cache
directory.
If the cluster is in 'HEALTH_ERR' state for more than 300 seconds, or one of the services for each assigned role is down for more than 900 seconds, the upgrade failed. In that case, try to find the problem, resolve it, and re-run the upgrade procedure. Note that in virtualized environments, the timeouts are shorter.
After upgrading to SUSE Enterprise Storage 5.5, FileStore OSDs need approximately five minutes longer to start as the OSD will do a one-off conversion of its on-disk files.
When you need to find out the versions of individual cluster components and nodes—for example to find out if all your nodes are actually on the same patch level after the upgrade—you can run
root@master #
salt-run status.report
The command goes through the connected Salt minions and scans for the version numbers of Ceph, Salt, and SUSE Linux Enterprise Server, and gives you a report displaying the version that the majority of nodes have and showing nodes whose version is different from the majority.
OSD BlueStore is a new back end for the OSD daemons. It is the default option since SUSE Enterprise Storage 5.5. Compared to FileStore, which stores objects as files in an XFS file system, BlueStore can deliver increased performance because it stores objects directly on the underlying block device. BlueStore also enables other features, such as built-in compression and EC overwrites, that are unavailable with FileStore.
Specifically for BlueStore, an OSD has a 'wal' (Write Ahead Log) device and a 'db' (RocksDB database) device. The RocksDB database holds the metadata for a BlueStore OSD. These two devices will reside on the same device as an OSD by default, but either can be placed on faster/different media.
In SES5, both FileStore and BlueStore are supported and it is possible for FileStore and BlueStore OSDs to co-exist in a single cluster. During the SUSE Enterprise Storage upgrade procedure, FileStore OSDs are not automatically converted to BlueStore. Be aware that the BlueStore-specific features will not be available on OSDs that have not been migrated to BlueStore.
Before converting to BlueStore, the OSDs need to be running SUSE Enterprise Storage 5.5. The conversion is a slow process as all data gets re-written twice. Though the migration process can take a long time to complete, there is no cluster outage and all clients can continue accessing the cluster during this period. However, do expect lower performance for the duration of the migration. This is caused by rebalancing and backfilling of cluster data.
Use the following procedure to migrate FileStore OSDs to BlueStore:
Salt commands needed for running the migration are blocked by safety measures. In order to turn these precautions off, run the following command:
root@master #
salt-run disengage.safety
Migrate hardware profiles:
root@master #
salt-run state.orch ceph.migrate.policy
This runner migrates any hardware profiles currently in use by the
policy.cfg
file. It processes
policy.cfg
, finds any hardware profile using the
original data structure, and converts it to the new data structure. The
result is a new hardware profile named
'migrated-original_name'.
policy.cfg
is updated as well.
If the original configuration had separate journals, the BlueStore configuration will use the same device for the 'wal' and 'db' for that OSD.
DeepSea migrates OSDs by setting their weight to 0 which 'vacuums' the data until the OSD is empty. You can either migrate OSDs one by one, or all OSDs at once. In either case, when the OSD is empty, the orchestration removes it and then re-creates it with the new configuration.
Use ceph.migrate.nodes
if you have a large number of
physical storage nodes or almost no data. If one node represents less
than 10% of your capacity, then the
ceph.migrate.nodes
may be marginally faster moving
all the data from those OSDs in parallel.
If you are not sure about which method to use, or the site has few
storage nodes (for example each node has more than 10% of the cluster
data), then select ceph.migrate.osds
.
To migrate OSDs one at a time, run:
root@master #
salt-run state.orch ceph.migrate.osds
To migrate all OSDs on each node in parallel, run:
root@master #
salt-run state.orch ceph.migrate.nodes
As the orchestration gives no feedback about the migration progress, use
cephadm >
ceph osd tree
to see which OSDs have a weight of zero periodically.
After the migration to BlueStore, the object count will remain the same and disk usage will be nearly the same.
salt target ceph.maintenance.upgrade
Command #Edit source
During an upgrade via salt
targetceph.maintenance.upgrade
,
DeepSea applies all available updates/patches on all servers in the
cluster in parallel without rebooting them. After these updates/patches are
applied, the actual upgrade begins:
The admin node is upgraded to SUSE Linux Enterprise Server 12 SP3. This also upgrades the salt-master and deepsea packages.
All Salt minions are upgraded to a version that corresponds to the Salt master.
The migration is performed sequentially on all cluster nodes in the recommended order (the Ceph Monitors first, see Upgrade Order) using the preferred method. As a consequence, the ceph package is upgraded.
After updating all Ceph Monitors, their services are restarted but the nodes are not rebooted. This way we ensure that all running Ceph Monitors have identical version.
If the cluster monitor nodes host OSDs, do not reboot the nodes during this stage because the shared OSDs will not join the cluster after the reboot.
All the remaining cluster nodes are updated and rebooted in the recommended order.
After all nodes are on the same patch-level, the following command is run:
ceph require osd release RELEASE
In case this process is interrupted by an accident or intentionally by the administrator, never reboot the nodes manually because after rebooting the first OSD node and OSD daemon, it will not be able to join the cluster anymore.
ceph-deploy
Deployment) to 5 #Edit sourceYou need to have the following software installed and updated to the latest package versions on all the Ceph nodes you want to upgrade before you can start with the upgrade procedure:
SUSE Linux Enterprise Server 12 SP2
SUSE Enterprise Storage 4
Choose the Salt master for your cluster. If your cluster has Calamari
deployed, then the Calamari node already is the
Salt master. Alternatively, the admin node from which you ran the
ceph-deploy
command will become the Salt master.
Before starting the procedure below, you need to upgrade the Salt master node
to SUSE Linux Enterprise Server 12 SP3 and SUSE Enterprise Storage 5.5 by running
zypper migration
(or your preferred way of upgrading).
To upgrade the SUSE Enterprise Storage 4 cluster which was deployed with
ceph-deploy
to version 5, follow these steps:
Install the salt
package from SLE-12-SP2/SES4:
root #
zypper install salt
Install the salt-minion
package from
SLE-12-SP2/SES4, then enable and start the related service:
root #
zypper install salt-minionroot #
systemctl enable salt-minionroot #
systemctl start salt-minion
Ensure that the host name 'salt' resolves to the IP address of the
Salt master node. If your Salt master is not reachable by the host name
salt
, edit the file
/etc/salt/minion
or create a new file
/etc/salt/minion.d/master.conf
with the following
content:
master: host_name_of_salt_master
The existing Salt minions have the master:
option already
set in /etc/salt/minion.d/calamari.conf
. The
configuration file name does not matter, the
/etc/salt/minion.d/
directory is important.
If you performed any changes to the configuration files mentioned above, restart the Salt service on all Salt minions:
root@minion >
systemctl restart salt-minion.service
If you registered your systems with SUSEConnect and use SCC/SMT, no further actions need to be taken.
If you are not using SCC/SMT but a
Media-ISO or other package source, add the following repositories
manually: SLE12-SP3 Base, SLE12-SP3 Update, SES5 Base, and SES5 Update.
You can do so using the zypper
command. First remove
all existing software repositories, then add the required new ones, and
finally refresh the repositories sources:
root #
zypper sd {0..99}root #
zypper ar \ http://172.17.2.210:82/repo/SUSE/Products/Storage/5/x86_64/product/ SES5-POOLroot #
zypper ar \ http://172.17.2.210:82/repo/SUSE/Updates/Storage/5/x86_64/update/ SES5-UPDATESroot #
zypper ar \ http://172.17.2.210:82/repo/SUSE/Products/SLE-SERVER/12-SP3/x86_64/product/ SLES12-SP3-POOLroot #
zypper ar \ http://172.17.2.210:82/repo/SUSE/Updates/SLE-SERVER/12-SP3/x86_64/update/ SLES12-SP3-UPDATESroot #
zypper ref
Set the new internal object sort order, run:
cephadm >
ceph osd set sortbitwise
To verify that the command was successful, we recommend running
cephadm >
;ceph osd dump --format json-pretty | grep sortbitwise
"flags": "sortbitwise,recovery_deletes,purged_snapdirs",
Upgrade the Salt master node to SUSE Linux Enterprise Server 12 SP3 and SUSE Enterprise Storage 5.5.
For SCC-registered systems, use zypper migration
. If
you provide the required software repositories manually, use
zypper dup
. After the upgrade, ensure that only
repositories for SUSE Linux Enterprise Server 12 SP3 and SUSE Enterprise Storage 5.5 are active
(and refreshed) on the Salt master node before proceeding.
If not already present, install the salt-master
package, then enable and start the related service:
root@master #
zypper install salt-masterroot@master #
systemctl enable salt-masterroot@master #
systemctl start salt-master
Verify the presence of all Salt minions by listing their keys:
root@master #
salt-key -L
Add all Salt minions keys to Salt master including the minion master:
root@master #
salt-key -A -y
Ensure that all Salt minions' keys were accepted:
root@master #
salt-key -L
Make sure that the software on your Salt master node is up to date:
root@master #
zypper migration
Install the deepsea
package:
root@master #
zypper install deepsea
Include the cluster's Salt minions. Refer to Section 4.2.2, “Targeting the Minions” of Procedure 4.1, “Running Deployment Stages” for more details.
Import the existing ceph-deploy
installed cluster:
root@master #
salt-run populate.engulf_existing_cluster
The command will do the following:
Distribute all the required Salt and DeepSea modules to all the Salt minions.
Inspect the running Ceph cluster and populate
/srv/pillar/ceph/proposals
with a layout of the
cluster.
/srv/pillar/ceph/proposals/policy.cfg
will be
created with roles matching all detected running Ceph services. If no
ceph-mgr
daemons are detected a
'role-mgr' is added for every node with 'role-mon'. View this file to
verify that each of your existing MON, OSD, RGW and MDS nodes have the
appropriate roles. OSD nodes will be imported into the
profile-import/
subdirectory, so you can examine
the files in
/srv/pillar/ceph/proposals/profile-import/cluster/
and
/srv/pillar/ceph/proposals/profile-import/stack/default/ceph/minions/
to confirm that the OSDs were correctly picked up.
The generated policy.cfg
will only apply roles for
detected Ceph services 'role-mon', 'role-mds', 'role-rgw',
'role-admin', and 'role-master' for the Salt master node. Any other
desired roles will need to be added to the file manually (see
Section 4.5.1.2, “Role Assignment”).
The existing cluster's ceph.conf
will be saved to
/srv/salt/ceph/configuration/files/ceph.conf.import
.
/srv/pillar/ceph/proposals/config/stack/default/ceph/cluster.yml
will include the cluster's fsid, cluster and public networks, and also
specifies the configuration_init: default-import
option, which makes DeepSea use the
ceph.conf.import
configuration file mentioned
previously, rather than using DeepSea's default
/srv/salt/ceph/configuration/files/ceph.conf.j2
template.
ceph.conf
If you need to integrate the ceph.conf
file with
custom changes, wait until the engulf/upgrade process successfully
finishes. Then edit the
/srv/pillar/ceph/proposals/config/stack/default/ceph/cluster.yml
file and comment the following line:
configuration_init: default-import
Save the file and follow the information in
Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.12 “Adjusting ceph.conf
with Custom Settings”.
The cluster's various keyrings will be saved to the following directories:
/srv/salt/ceph/admin/cache/ /srv/salt/ceph/mon/cache/ /srv/salt/ceph/osd/cache/ /srv/salt/ceph/mds/cache/ /srv/salt/ceph/rgw/cache/
Verify that the keyring files exist, and that there is no keyring file in the following directory (the Ceph Manager did not exist before SUSE Enterprise Storage 5.5):
/srv/salt/ceph/mgr/cache/
If the salt-run populate.engulf_existing_cluster
command cannot detect ceph-mgr
daemons, the policy.cfg
file will contain a 'mgr'
role for each node that has the 'role-mon' assigned. This will deploy
ceph-mgr
daemons together with the
monitor daemons in a later step. Since there are no
ceph-mgr
daemons running at this
time, please edit
/srv/pillar/ceph/proposals/policy.cfg
and comment out
all lines starting with 'role-mgr' by prepending a '#' character.
The salt-run populate.engulf_existing_cluster
command
does not handle importing the openATTIC configuration. You need to manually
edit the policy.cfg
file and add a
role-openattic
line. Refer to
Section 4.5.1, “The policy.cfg
File” for more details.
The salt-run populate.engulf_existing_cluster
command
does not handle importing the iSCSI Gateways configurations. If your cluster
includes iSCSI Gateways, import their configurations manually:
On one of iSCSI Gateway nodes, export the current lrbd.conf
and copy it to the Salt master node:
root@minion >
lrbd -o >/tmp/lrbd.confroot@minion >
scp /tmp/lrbd.conf admin:/srv/salt/ceph/igw/cache/lrbd.conf
On the Salt master node, add the default iSCSI Gateway configuration to the DeepSea setup:
root@master #
mkdir -p /srv/pillar/ceph/stack/ceph/root@master #
echo 'igw_config: default-ui' >> /srv/pillar/ceph/stack/ceph/cluster.ymlroot@master #
chown salt:salt /srv/pillar/ceph/stack/ceph/cluster.yml
Add the iSCSI Gateway roles to policy.cfg
and save the
file:
role-igw/stack/default/ceph/minions/ses-1.ses.suse.yml role-igw/cluster/ses-1.ses.suse.sls [...]
Run Stages 0 and 1 to update packages and create all possible roles:
root@master #
salt-run state.orch ceph.stage.0root@master #
salt-run state.orch ceph.stage.1
Generate required subdirectories under
/srv/pillar/ceph/stack
:
root@master #
salt-run push.proposal
Verify that there is a working DeepSea-managed cluster with correctly assigned roles:
root@master #
salt target pillar.get roles
Compare the output with the actual layout of the cluster.
Calamari leaves a scheduled Salt job running to check the cluster status. Remove the job:
root@master #
salt target schedule.delete ceph.heartbeat
From this point on, follow the procedure described in Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5”.
You need to have the following software installed and updated to the latest package versions on all the Ceph nodes you want to upgrade before you can start with the upgrade procedure:
SUSE Linux Enterprise Server 12 SP2
SUSE Enterprise Storage 4
To upgrade SUSE Enterprise Storage 4 deployed using Crowbar to version 5, follow these steps:
For each Ceph node (including the Calamari node), stop and disable all Crowbar-related services :
root@minion >
systemctl stop chef-clientroot@minion >
systemctl disable chef-clientroot@minion >
systemctl disable crowbar_joinroot@minion >
systemctl disable crowbar_notify_shutdown
For each Ceph node (including the Calamari node), verify that the software repositories point to SUSE Enterprise Storage 5.5 and SUSE Linux Enterprise Server 12 SP3 products. If repositories pointing to older product versions are still present, disable them.
For each Ceph node (including the Calamari node), verify that the salt-minion is installed. If not, install it:
root@minion >
zypper in salt salt-minion
For the Ceph nodes that did not have the salt-minion
package installed, create the file
/etc/salt/minion.d/master.conf
with the
master
option pointing to the full Calamari node host
name:
master: full_calamari_hostname
The existing Salt minions have the master:
option already
set in /etc/salt/minion.d/calamari.conf
. The
configuration file name does not matter, the
/etc/salt/minion.d/
directory is important.
Enable and start the salt-minion
service:
root@minion >
systemctl enable salt-minionroot@minion >
systemctl start salt-minion
On the Calamari node, accept any remaining salt minion keys:
root@master #
salt-key -L [...] Unaccepted Keys: d52-54-00-16-45-0a.example.com d52-54-00-70-ac-30.example.com [...]root@master #
salt-key -A The following keys are going to be accepted: Unaccepted Keys: d52-54-00-16-45-0a.example.com d52-54-00-70-ac-30.example.com Proceed? [n/Y] y Key for minion d52-54-00-16-45-0a.example.com accepted. Key for minion d52-54-00-70-ac-30.example.com accepted.
If Ceph was deployed on the public network and no VLAN interface is present, add a VLAN interface on Crowbar's public network to the Calamari node.
Upgrade the Calamari node to SUSE Linux Enterprise Server 12 SP3 and SUSE Enterprise Storage
5.5, either by using zypper migration
or
your favorite method. From here onward, the Calamari node becomes the
Salt master. After the upgrade, reboot the Salt master.
Install DeepSea on the Salt master:
root@master #
zypper in deepsea
Specify the deepsea_minions
option to include the correct
group of Salt minions into deployment stages. Refer to
Section 4.2.2.3, “Set the deepsea_minions
Option” for more details.
DeepSea expects all Ceph nodes to have an identical
/etc/ceph/ceph.conf
. Crowbar deploys a slightly
different ceph.conf
to each node, so you need to
consolidate them:
Remove the osd crush location hook
option, it was
included by Calamari.
Remove the public addr
option from the
[mon]
section.
Remove the port numbers from the mon host
option.
If you were running the Object Gateway, Crowbar deployed a separate
/etc/ceph/ceph.conf.radosgw
file to keep the keystone
secrets separated from the regular ceph.conf
file.
Crowbar also added a custom
/etc/systemd/system/ceph-radosgw@.service
file.
Because DeepSea does not support it, you need to remove it:
Append all [client.rgw....]
sections from the
ceph.conf.radosgw
file to
/etc/ceph/ceph.conf
on all nodes.
On the Object Gateway node, run the following:
root@minion >
rm /etc/systemd/system/ceph-radosgw@.service
systemctl reenable ceph-radosgw@rgw.public.$hostname
Double check that ceph status
works when run from the
Salt master:
root@master #
ceph status
cluster a705580c-a7ae-4fae-815c-5cb9c1ded6c2
health HEALTH_OK
[...]
Import the existing cluster:
root@master #
salt-run populate.engulf_existing_clusterroot@master #
salt-run state.orch ceph.stage.1root@master #
salt-run push.proposal
The salt-run populate.engulf_existing_cluster
command
does not handle importing the iSCSI Gateways configurations. If your cluster
includes iSCSI Gateways, import their configurations manually:
On one of iSCSI Gateway nodes, export the current lrbd.conf
and copy it to the Salt master node:
root@minion >
lrbd -o > /tmp/lrbd.confroot@minion >
scp /tmp/lrbd.conf admin:/srv/salt/ceph/igw/cache/lrbd.conf
On the Salt master node, add the default iSCSI Gateway configuration to the DeepSea setup:
root@master #
mkdir -p /srv/pillar/ceph/stack/ceph/root@master #
echo 'igw_config: default-ui' >> /srv/pillar/ceph/stack/ceph/cluster.ymlroot@master #
chown salt:salt /srv/pillar/ceph/stack/ceph/cluster.yml
Add the iSCSI Gateway roles to policy.cfg
and save the
file:
role-igw/stack/default/ceph/minions/ses-1.ses.suse.yml role-igw/cluster/ses-1.ses.suse.sls [...]
If you registered your systems with SUSEConnect and use SCC/SMT, no further actions need to be taken.
If you are not using SCC/SMT but a
Media-ISO or other package source, add the following repositories
manually: SLE12-SP3 Base, SLE12-SP3 Update, SES5 Base, and SES5 Update.
You can do so using the zypper
command. First remove
all existing software repositories, then add the required new ones, and
finally refresh the repositories sources:
root #
zypper sd {0..99}root #
zypper ar \ http://172.17.2.210:82/repo/SUSE/Products/Storage/5/x86_64/product/ SES5-POOLroot #
zypper ar \ http://172.17.2.210:82/repo/SUSE/Updates/Storage/5/x86_64/update/ SES5-UPDATESroot #
zypper ar \ http://172.17.2.210:82/repo/SUSE/Products/SLE-SERVER/12-SP3/x86_64/product/ SLES12-SP3-POOLroot #
zypper ar \ http://172.17.2.210:82/repo/SUSE/Updates/SLE-SERVER/12-SP3/x86_64/update/ SLES12-SP3-UPDATESroot #
zypper ref
Then change your Pillar data in order to use a different strategy. Edit
/srv/pillar/ceph/stack/name_of_cluster/cluster.yml
and add the following line:
upgrade_init: zypper-dup
The zypper-dup
strategy requires you to manually add
the latest software repositories, while the default
zypper-migration
relies on the repositories provided
by SCC/SMT.
Fix host grains to make DeepSea use short host names on the public
network for the Ceph daemon instance IDs. For each node, you need to run
grains.set
with the new (short) host name. Before
running grains.set
, verify the current monitor
instances by running ceph status
. A before and after
example follows:
root@master #
salt target grains.get host
d52-54-00-16-45-0a.example.com:
d52-54-00-16-45-0a
d52-54-00-49-17-2a.example.com:
d52-54-00-49-17-2a
d52-54-00-76-21-bc.example.com:
d52-54-00-76-21-bc
d52-54-00-70-ac-30.example.com:
d52-54-00-70-ac-30
root@master #
salt d52-54-00-16-45-0a.example.com grains.set \ host public.d52-54-00-16-45-0aroot@master #
salt d52-54-00-49-17-2a.example.com grains.set \ host public.d52-54-00-49-17-2aroot@master #
salt d52-54-00-76-21-bc.example.com grains.set \ host public.d52-54-00-76-21-bcroot@master #
salt d52-54-00-70-ac-30.example.com grains.set \ host public.d52-54-00-70-ac-30
root@master #
salt target grains.get host
d52-54-00-76-21-bc.example.com:
public.d52-54-00-76-21-bc
d52-54-00-16-45-0a.example.com:
public.d52-54-00-16-45-0a
d52-54-00-49-17-2a.example.com:
public.d52-54-00-49-17-2a
d52-54-00-70-ac-30.example.com:
public.d52-54-00-70-ac-30
Run the upgrade:
root@master #
salt target state.apply ceph.updatesroot@master #
salt target test.versionroot@master #
salt-run state.orch ceph.maintenance.upgrade
Every node will reboot. The cluster will come back up complaining that there is no active Ceph Manager instance. This is normal. Calamari should not be installed/running anymore at this point.
Run all the required deployment stages to get the cluster to a healthy state:
root@master #
salt-run state.orch ceph.stage.0root@master #
salt-run state.orch ceph.stage.1root@master #
salt-run state.orch ceph.stage.2root@master #
salt-run state.orch ceph.stage.3
To deploy openATTIC (see Book “Administration Guide”, Chapter 17 “openATTIC”), add an appropriate
role-openattic
(see
Section 4.5.1.2, “Role Assignment”) line to
/srv/pillar/ceph/proposals/policy.cfg
, then run:
root@master #
salt-run state.orch ceph.stage.2root@master #
salt-run state.orch ceph.stage.4
During the upgrade, you may receive "Error EINVAL: entity [...] exists but caps do not match" errors. To fix them, refer to Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5”.
Do the remaining cleanup:
Crowbar creates entries in /etc/fstab
for each OSD.
They are not necessary, so delete them.
Calamari leaves a scheduled Salt job running to check the cluster status. Remove the job:
root@master #
salt target schedule.delete ceph.heartbeat
There are still some unnecessary packages installed, mostly ruby gems,
and chef related. Their removal is not required but you may want to
delete them by running zypper rm
pkg_name
.
You need to have the following software installed and updated to the latest package versions on all the Ceph nodes you want to upgrade before you can start with the upgrade procedure:
SUSE Linux Enterprise Server 12 SP1
SUSE Enterprise Storage 3
To upgrade the SUSE Enterprise Storage 3 cluster to version 5, follow the steps described in Procedure 5.1, “Steps to Apply to All Cluster Nodes (including the Calamari Node)” and then Procedure 5.2, “Steps to Apply to the Salt master Node”.
This chapter explains which files on the admin node should be backed up. As soon as you are finished with your cluster deployment or migration, create a backup of these directories.
Back up the /etc/ceph
directory. It contains crucial
cluster configuration. You will need the backup of
/etc/ceph
for example when you need to replace the
Admin Node.
You need to back up the /etc/salt/
directory. It
contains the Salt configuration files, for example the Salt master key and
accepted client keys.
The Salt files are not strictly required for backing up the admin node, but make redeploying the Salt cluster easier. If there is no backup of these files, the Salt minions need to be registered again at the new admin node.
Make sure that the backup of the Salt master private key is stored in a safe location. The Salt master key can be used to manipulate all cluster nodes.
After restoring the /etc/salt
directory from a backup,
restart the Salt services:
root@master #
systemctl
restart salt-masterroot@master #
systemctl
restart salt-minion
All files required by DeepSea are stored in
/srv/pillar/
, /srv/salt/
and
/etc/salt/master.d
.
If you need to redeploy the admin node, install the DeepSea package on the new node and move the backed up data back into the directories. DeepSea can then be used again without any further changes being required. Before using DeepSea again, make sure that all Salt minions are correctly registered on the admin node.
You can change the default cluster configuration generated in Stage 2 (refer
to DeepSea Stages Description). For example, you may need to
change network settings, or software that is installed on the Salt master by
default installed. You can perform the former by modifying the pillar updated
after Stage 2, while the latter is usually done by creating a custom
sls
file and adding it to the pillar. Details are
described in following sections.
This section lists several tasks that require adding/changing your own
sls
files. Such a procedure is typically used when you
need to change the default deployment process.
Your custom .sls files belong to the same subdirectory as DeepSea's .sls
files. To prevent overwriting your .sls files with the possibly newly added
ones from the DeepSea package, prefix their name with the
custom-
string.
If you address a specific task outside of the DeepSea deployment process and therefore need to skip it, create a 'no-operation' file following this example:
Create /srv/salt/ceph/time/disabled.sls
with the
following content and save it:
disable time setting: test.nop
Edit /srv/pillar/ceph/stack/global.yml
, add the
following line, and save it:
time_init: disabled
Verify by refreshing the pillar and running the step:
root@master #
salt target saltutil.pillar_refreshroot@master #
salt 'admin.ceph' state.apply ceph.time admin.ceph: Name: disable time setting - Function: test.nop - Result: Clean Summary for admin.ceph ------------ Succeeded: 1 Failed: 0 ------------ Total states run: 1
The task ID 'disable time setting' may be any message unique within an
sls
file. Prevent ID collisions by specifying unique
descriptions.
If you need to replace the default behavior of a specific step with a
custom one, create a custom sls
file with replacement
content.
By default /srv/salt/ceph/pool/default.sls
creates an
rbd image called 'demo'. In our example, we do not want this image to be
created, but we need two images: 'archive1' and 'archive2'.
Create /srv/salt/ceph/pool/custom.sls
with the
following content and save it:
wait: module.run: - name: wait.out - kwargs: 'status': "HEALTH_ERR"1 - fire_event: True archive1: cmd.run: - name: "rbd -p rbd create archive1 --size=1024"2 - unless: "rbd -p rbd ls | grep -q archive1$" - fire_event: True archive2: cmd.run: - name: "rbd -p rbd create archive2 --size=768" - unless: "rbd -p rbd ls | grep -q archive2$" - fire_event: True
The wait module will pause until the
Ceph cluster does not have a status of | |
The |
To call the newly created custom file instead of the default, you need to
edit /srv/pillar/ceph/stack/ceph/cluster.yml
, add
the following line, and save it:
pool_init: custom
Verify by refreshing the pillar and running the step:
root@master #
salt target saltutil.pillar_refreshroot@master #
salt 'admin.ceph' state.apply ceph.pool
The creation of pools or images requires sufficient authorization. The
admin.ceph
minion has an admin keyring.
Another option is to change the variable in
/srv/pillar/ceph/stack/ceph/roles/master.yml
instead.
Using this file will reduce the clutter of pillar data for other minions.
Sometimes you may need a specific step to do some additional tasks. We do not recommend modifying the related state file as it may complicate a future upgrade. Instead, create a separate file to carry out the additional tasks identical to what was described in Section 7.1.2, “Replacing a Deployment Step”.
Name the new sls
file descriptively. For example, if you
need to create two rbd images in addition to the demo image, name the file
archive.sls
.
Create /srv/salt/ceph/pool/custom.sls
with the
following content and save it:
include: - .archive - .default
In this example, Salt will create the archive
images and then create the demo image. The order
does not matter in this example. To change the order, reverse the lines
after the include:
directive.
You can add the include line directly to
archive.sls
and all the images will get created as
well. However, regardless of where the include line is placed, Salt
processes the steps in the included file first. Although this behavior
can be overridden with requires and
order statements, a separate file that includes the
others guarantees the order and reduces the chances of confusion.
Edit /srv/pillar/ceph/stack/ceph/cluster.yml
, add
the following line, and save it:
pool_init: custom
Verify by refreshing the pillar and running the step:
root@master #
salt target saltutil.pillar_refreshroot@master #
salt 'admin.ceph' state.apply ceph.pool
If you need to add a completely separate deployment step, create three new
files—an sls
file that performs the command, an
orchestration file, and a custom file which aligns the new step with the
original deployment steps.
For example, if you need to run logrotate
on all minions
as part of the preparation stage:
First create an sls
file and include the
logrotate
command.
logrotate
on all Salt minions #
Create a directory such as /srv/salt/ceph/logrotate
.
Create /srv/salt/ceph/logrotate/init.sls
with the
following content and save it:
rotate logs: cmd.run: - name: "/usr/sbin/logrotate /etc/logrotate.conf"
Verify that the command works on a minion:
root@master #
salt 'admin.ceph' state.apply ceph.logrotate
Because the orchestration file needs to run before all other preparation steps, add it to the Prep stage 0:
Create /srv/salt/ceph/stage/prep/logrotate.sls
with
the following content and save it:
logrotate: salt.state: - tgt: '*' - sls: ceph.logrotate
Verify that the orchestration file works:
root@master #
salt-run state.orch ceph.stage.prep.logrotate
The last file is the custom one which includes the additional step with the original steps:
Create /srv/salt/ceph/stage/prep/custom.sls
with the
following content and save it:
include: - .logrotate - .master - .minion
Override the default behavior. Edit
/srv/pillar/ceph/stack/global.yml
, add the following
line, and save the file:
stage_prep: custom
Verify that Stage 0 works:
root@master #
salt-run state.orch ceph.stage.0
global.yml
?
The global.yml
file is chosen over the
cluster.yml
because during the
prep stage, no minion belongs to the Ceph cluster
and has no access to any settings in cluster.yml
.
During Stage 0 (refer to DeepSea Stages Description for more information on DeepSea stages), the Salt master and Salt minions may optionally reboot because newly updated packages, for example kernel, require rebooting the system.
The default behavior is to install available new updates and not reboot the nodes even in case of kernel updates.
You can change the default update/reboot behavior of DeepSea Stage 0 by
adding/changing the stage_prep_master
and
stage_prep_minion
options in the
/srv/pillar/ceph/stack/global.yml
file.
stage_prep_master
sets the behavior of the Salt master, and
stage_prep_minion
sets the behavior of all minions. All
available parameters are:
Install updates without rebooting.
Install updates and reboot after updating.
Reboots without installing updates.
Do not install updates or reboot.
For example, to prevent the cluster nodes from installing updates and
rebooting, edit /srv/pillar/ceph/stack/global.yml
and
add the following lines:
stage_prep_master: default-no-update-no-reboot stage_prep_minion: default-no-update-no-reboot
The values of stage_prep_master
correspond to file names
located in /srv/salt/ceph/stage/0/master
, while
values of stage_prep_minion
correspond to files in
/srv/salt/ceph/stage/0/minion
:
cephadm >
ls -l /srv/salt/ceph/stage/0/master default-no-update-no-reboot.sls default-no-update-reboot.sls default-update-reboot.sls [...]cephadm >
ls -l /srv/salt/ceph/stage/0/minion default-no-update-no-reboot.sls default-no-update-reboot.sls default-update-reboot.sls [...]
After you completed Stage 2, you may want to change the discovered configuration. To view the current settings, run:
root@master #
salt target pillar.items
The output of the default configuration for a single minion is usually similar to the following:
---------- available_roles: - admin - mon - storage - mds - igw - rgw - client-cephfs - client-radosgw - client-iscsi - mds-nfs - rgw-nfs - master cluster: ceph cluster_network: 172.16.22.0/24 fsid: e08ec63c-8268-3f04-bcdb-614921e94342 master_minion: admin.ceph mon_host: - 172.16.21.13 - 172.16.21.11 - 172.16.21.12 mon_initial_members: - mon3 - mon1 - mon2 public_address: 172.16.21.11 public_network: 172.16.21.0/24 roles: - admin - mon - mds time_server: admin.ceph time_service: ntp
The above mentioned settings are distributed across several configuration
files. The directory structure with these files is defined in the
/srv/pillar/ceph/stack/stack.cfg
directory. The
following files usually describe your cluster:
/srv/pillar/ceph/stack/global.yml
- the file affects
all minions in the Salt cluster.
/srv/pillar/ceph/stack/ceph/cluster.yml
- the file affects all minions in the Ceph cluster called
ceph
.
/srv/pillar/ceph/stack/ceph/roles/role.yml
- affects all minions that are assigned the specific role in the
ceph
cluster.
/srv/pillar/ceph/stack/cephminions/minion
ID/yml
- affects the individual minion.
There is a parallel directory tree that stores the default configuration
setup in /srv/pillar/ceph/stack/default
. Do not change
values here, as they are overwritten.
The typical procedure for changing the collected configuration is the following:
Find the location of the configuration item you need to change. For
example, if you need to change cluster related setting such as cluster
network, edit the file
/srv/pillar/ceph/stack/ceph/cluster.yml
.
Save the file.
Verify the changes by running:
root@master #
salt target saltutil.pillar_refresh
and then
root@master #
salt target pillar.items
After you deploy your SUSE Enterprise Storage 5.5 cluster you may need to install additional software for accessing your data, such as the Object Gateway or the iSCSI Gateway, or you can deploy a clustered file system on top of the Ceph cluster. This chapter mainly focuses on manual installation. If…
Ceph Object Gateway is an object storage interface built on top of
librgw
to provide applications with a RESTful gateway to
Ceph clusters. It supports two interfaces:
iSCSI is a storage area network (SAN) protocol that allows clients (called initiators) to send SCSI commands to SCSI storage devices (targets) on remote servers. SUSE Enterprise Storage 5.5 includes a facility that opens Ceph storage management to heterogeneous clients, such as Microsoft Windows* an…
The Ceph file system (CephFS) is a POSIX-compliant file system that uses
a Ceph storage cluster to store its data. CephFS uses the same cluster
system as Ceph block devices, Ceph object storage with its S3 and Swift
APIs, or native bindings (librados
).
NFS Ganesha provides NFS access to either the Object Gateway or the CephFS. In SUSE Enterprise Storage 5.5, NFS versions 3 and 4 are supported. NFS Ganesha runs in the user space instead of the kernel space and directly interacts with the Object Gateway or CephFS.
This chapter describes how to export data stored in a Ceph cluster via a Samba/CIFS share so that you can easily access them from Windows* client machines. It also includes information that will help you configure a Ceph Samba gateway to join Active Directory in the Windows* domain to authenticate a…
After you deploy your SUSE Enterprise Storage 5.5 cluster you may need to install additional software for accessing your data, such as the Object Gateway or the iSCSI Gateway, or you can deploy a clustered file system on top of the Ceph cluster. This chapter mainly focuses on manual installation. If you have a cluster deployed using Salt, refer to Chapter 4, Deploying with DeepSea/Salt for a procedure on installing particular gateways or the CephFS.
Ceph Object Gateway is an object storage interface built on top of
librgw
to provide applications with a RESTful gateway to
Ceph clusters. It supports two interfaces:
S3-compatible: Provides object storage functionality with an interface that is compatible with a large subset of the Amazon S3 RESTful API.
Swift-compatible: Provides object storage functionality with an interface that is compatible with a large subset of the OpenStack Swift API.
The Object Gateway daemon uses an embedded HTTP server (CivetWeb) for interacting with the Ceph cluster. Since it provides interfaces compatible with OpenStack Swift and Amazon S3, the Object Gateway has its own user management. Object Gateway can store data in the same cluster that is used to store data from CephFS clients or RADOS Block Device clients. The S3 and Swift APIs share a common name space, so you may write data with one API and retrieve it with the other.
Since SUSE Enterprise Storage 5.5, the Object Gateway is installed as a DeepSea role, therefore you do not need to install it manually.
To install the Object Gateway during the cluster deployment, see Section 4.3, “Cluster Deployment”.
To add a new node with Object Gateway to the cluster, see Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.2 “Adding New Roles to Nodes”.
Install Object Gateway on a node that is not using port 80. For example a node already running openATTIC is already using port 80. The following command installs all required components:
root #
zypper ref && zypper in ceph-radosgw
If the Apache server from the previous Object Gateway instance is running, stop it and disable the relevant service:
root #
systemctl stop disable apache2.service
Edit /etc/ceph/ceph.conf
and add the following lines:
[client.rgw.gateway_host] rgw frontends = "civetweb port=80"
If you want to configure Object Gateway/CivetWeb for use with SSL encryption, modify the line accordingly:
rgw frontends = civetweb port=7480s ssl_certificate=path_to_certificate.pem
Restart the Object Gateway service.
root #
systemctl restart ceph-radosgw@rgw.gateway_host
Several steps are required to configure an Object Gateway.
Configuring a Ceph Object Gateway requires a running Ceph Storage Cluster. The Ceph Object Gateway is a client of the Ceph Storage Cluster. As a Ceph Storage Cluster client, it requires:
A host name for the gateway instance, for example
gateway
.
A storage cluster user name with appropriate permissions and a keyring.
Pools to store its data.
A data directory for the gateway instance.
An instance entry in the Ceph configuration file.
Each instance must have a user name and key to communicate with a Ceph storage cluster. In the following steps, we use a monitor node to create a bootstrap keyring, then create the Object Gateway instance user keyring based on the bootstrap one. Then, we create a client user name and key. Next, we add the key to the Ceph Storage Cluster. Finally, we distribute the keyring to the node containing the gateway instance.
Create a keyring for the gateway:
root #
ceph-authtool --create-keyring /etc/ceph/ceph.client.rgw.keyringroot #
chmod +r /etc/ceph/ceph.client.rgw.keyring
Generate a Ceph Object Gateway user name and key for each instance. As an
example, we will use the name gateway
after
client.radosgw
:
root #
ceph-authtool /etc/ceph/ceph.client.rgw.keyring \
-n client.rgw.gateway --gen-key
Add capabilities to the key:
root #
ceph-authtool -n client.rgw.gateway --cap osd 'allow rwx' \
--cap mon 'allow rwx' /etc/ceph/ceph.client.rgw.keyring
Once you have created a keyring and key to enable the Ceph Object Gateway with access to the Ceph Storage Cluster, add the key to your Ceph Storage Cluster. For example:
root #
ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.rgw.gateway \
-i /etc/ceph/ceph.client.rgw.keyring
Distribute the keyring to the node with the gateway instance:
root #
scp /etc/ceph/ceph.client.rgw.keyring ceph@hostname:/home/cephcephadm >
ssh hostnameroot #
mv ceph.client.rgw.keyring /etc/ceph/ceph.client.rgw.keyring
An alternative way is to create the Object Gateway bootstrap keyring, and then create the Object Gateway keyring from it:
Create an Object Gateway bootstrap keyring on one of the monitor nodes:
root #
ceph \
auth get-or-create client.bootstrap-rgw mon 'allow profile bootstrap-rgw' \
--connect-timeout=25 \
--cluster=ceph \
--name mon. \
--keyring=/var/lib/ceph/mon/ceph-node_host/keyring \
-o /var/lib/ceph/bootstrap-rgw/keyring
Create the
/var/lib/ceph/radosgw/ceph-rgw_name
directory for storing the bootstrap keyring:
root #
mkdir \
/var/lib/ceph/radosgw/ceph-rgw_name
Create an Object Gateway keyring from the newly created bootstrap keyring:
root #
ceph \
auth get-or-create client.rgw.rgw_name osd 'allow rwx' mon 'allow rw' \
--connect-timeout=25 \
--cluster=ceph \
--name client.bootstrap-rgw \
--keyring=/var/lib/ceph/bootstrap-rgw/keyring \
-o /var/lib/ceph/radosgw/ceph-rgw_name/keyring
Copy the Object Gateway keyring to the Object Gateway host:
root #
scp \
/var/lib/ceph/radosgw/ceph-rgw_name/keyring \
rgw_host:/var/lib/ceph/radosgw/ceph-rgw_name/keyring
Ceph Object Gateways require Ceph Storage Cluster pools to store specific gateway data. If the user you created has proper permissions, the gateway will create the pools automatically. However, ensure that you have set an appropriate default number of placement groups per pool in the Ceph configuration file.
The pool names follow the
ZONE_NAME.POOL_NAME
syntax. When configuring a gateway with the default region and zone, the
default zone name is 'default' as in our example:
.rgw.root default.rgw.control default.rgw.meta default.rgw.log default.rgw.buckets.index default.rgw.buckets.data
To create the pools manually, see Book “Administration Guide”, Chapter 8 “Managing Storage Pools”, Section 8.2.2 “Create a Pool”.
Only the default.rgw.buckets.data
pool can be
erasure-coded. All other pools need to be replicated, otherwise the
gateway is not accessible.
Add the Ceph Object Gateway configuration to the Ceph Configuration file. The Ceph Object Gateway configuration requires you to identify the Ceph Object Gateway instance. Then, specify the host name where you installed the Ceph Object Gateway daemon, a keyring (for use with cephx), and optionally a log file. For example:
[client.rgw.instance-name] host = hostname keyring = /etc/ceph/ceph.client.rgw.keyring
To override the default Object Gateway log file, include the following:
log file = /var/log/radosgw/client.rgw.instance-name.log
The [client.rgw.*]
portion of the gateway instance
identifies this portion of the Ceph configuration file as configuring a
Ceph Storage Cluster client where the client type is a Ceph Object Gateway
(radosgw). The instance name follows. For example:
[client.rgw.gateway] host = ceph-gateway keyring = /etc/ceph/ceph.client.rgw.keyring
The host must be your machine host name, excluding the domain name.
Then turn off print continue
. If you have it set to
true, you may encounter problems with PUT operations:
rgw print continue = false
To use a Ceph Object Gateway with subdomain S3 calls (for example
http://bucketname.hostname
), you must add the Ceph
Object Gateway DNS name under the [client.rgw.gateway]
section
of the Ceph configuration file:
[client.rgw.gateway] ... rgw dns name = hostname
You should also consider installing a DNS server such as Dnsmasq on your
client machine(s) when using the
http://bucketname.hostname
syntax. The dnsmasq.conf
file should include the
following settings:
address=/hostname/host-ip-address listen-address=client-loopback-ip
Then, add the client-loopback-ip IP address as the first DNS server on the client machine(s).
Deployment scripts may not create the default Ceph Object Gateway data directory.
Create data directories for each instance of a radosgw daemon if not
already done. The host
variables in the Ceph
configuration file determine which host runs each instance of a radosgw
daemon. The typical form specifies the radosgw daemon, the cluster name,
and the daemon ID.
root #
mkdir -p /var/lib/ceph/radosgw/cluster-id
Using the example ceph.conf
settings above, you would
execute the following:
root #
mkdir -p /var/lib/ceph/radosgw/ceph-radosgw.gateway
To ensure that all components have reloaded their configurations, we
recommend restarting your Ceph Storage Cluster service. Then, start up
the radosgw
service. For more information, see
Book “Administration Guide”, Chapter 2 “Introduction” and
Book “Administration Guide”, Chapter 13 “Ceph Object Gateway”, Section 13.3 “Operating the Object Gateway Service”.
When the service is up and running, you can make an anonymous GET request to see if the gateway returns a response. A simple HTTP request to the domain name should return the following:
<ListAllMyBucketsResult> <Owner> <ID>anonymous</ID> <DisplayName/> </Owner> <Buckets/> </ListAllMyBucketsResult>
iSCSI is a storage area network (SAN) protocol that allows clients (called
initiators) to send SCSI commands to SCSI storage
devices (targets) on remote servers. SUSE Enterprise Storage
5.5 includes a facility that opens Ceph storage management to
heterogeneous clients, such as Microsoft Windows* and VMware* vSphere, through the
iSCSI protocol. Multipath iSCSI access enables availability and scalability
for these clients, and the standardized iSCSI protocol also provides an
additional layer of security isolation between clients and the SUSE Enterprise Storage
5.5 cluster. The configuration facility is named
lrbd
. Using lrbd
, Ceph
storage administrators can define thin-provisioned, replicated,
highly-available volumes supporting read-only snapshots, read-write clones,
and automatic resizing with Ceph RADOS Block Device (RBD). Administrators
can then export volumes either via a single lrbd
gateway host, or via multiple gateway hosts supporting multipath failover.
Linux, Microsoft Windows, and VMware hosts can connect to volumes using the iSCSI
protocol, which makes them available like any other SCSI block device. This
means SUSE Enterprise Storage 5.5 customers can effectively run a complete
block-storage infrastructure subsystem on Ceph that provides all the
features and benefits of a conventional SAN, enabling future growth.
This chapter introduces detailed information to set up a Ceph cluster infrastructure together with an iSCSI gateway so that the client hosts can use remotely stored data as local storage devices using the iSCSI protocol.
iSCSI is an implementation of the Small Computer System Interface (SCSI) command set using the Internet Protocol (IP), specified in RFC 3720. iSCSI is implemented as a service where a client (the initiator) talks to a server (the target) via a session on TCP port 3260. An iSCSI target's IP address and port are called an iSCSI portal, where a target can be exposed through one or more portals. The combination of a target and one or more portals is called the target portal group (TPG).
The underlying data link layer protocol for iSCSI is commonly Ethernet. More specifically, modern iSCSI infrastructures use 10 Gigabit Ethernet or faster networks for optimal throughput. 10 Gigabit Ethernet connectivity between the iSCSI gateway and the back-end Ceph cluster is strongly recommended.
The Linux kernel iSCSI target was originally named LIO for linux-iscsi.org, the project's original domain and Web site. For some time, no fewer than four competing iSCSI target implementations were available for the Linux platform, but LIO ultimately prevailed as the single iSCSI reference target. The mainline kernel code for LIO uses the simple, but somewhat ambiguous name "target", distinguishing between "target core" and a variety of front-end and back-end target modules.
The most commonly used front-end module is arguably iSCSI. However, LIO also supports Fibre Channel (FC), Fibre Channel over Ethernet (FCoE) and several other front-end protocols. At this time, only the iSCSI protocol is supported by SUSE Enterprise Storage.
The most frequently used target back-end module is one that is capable of simply re-exporting any available block device on the target host. This module is named iblock. However, LIO also has an RBD-specific back-end module supporting parallelized multipath I/O access to RBD images.
This section introduces brief information on iSCSI initiators used on Linux, Microsoft Windows, and VMware platforms.
The standard initiator for the Linux platform is
open-iscsi
. open-iscsi
launches a daemon, iscsid
, which the user can
then use to discover iSCSI targets on any given portal, log in to targets,
and map iSCSI volumes. iscsid
communicates with
the SCSI mid layer to create in-kernel block devices that the kernel can
then treat like any other SCSI block device on the system. The
open-iscsi
initiator can be deployed in
conjunction with the Device Mapper Multipath
(dm-multipath
) facility to provide a highly
available iSCSI block device.
The default iSCSI initiator for the Microsoft Windows operating system is the Microsoft iSCSI initiator. The iSCSI service can be configured via a graphical user interface (GUI), and supports multipath I/O for high availability.
The default iSCSI initiator for VMware vSphere and ESX is the VMware
ESX software iSCSI initiator, vmkiscsi
. When
enabled, it can be configured either from the vSphere client, or using the
vmkiscsi-tool
command. You can then format storage
volumes connected through the vSphere iSCSI storage adapter with VMFS, and
use them like any other VM storage device. The VMware initiator also
supports multipath I/O for high availability.
lrbd
combines the benefits of RADOS Block Devices
with the ubiquitous versatility of iSCSI. By employing
lrbd
on an iSCSI target host (known as the
lrbd
gateway), any application that needs to make
use of block storage can benefit from Ceph, even if it does not speak any
Ceph client protocol. Instead, users can use iSCSI or any other target
front-end protocol to connect to an LIO target, which translates all target
I/O to RBD storage operations.
lrbd
is inherently highly-available and supports
multipath operations. Thus, downstream initiator hosts can use multiple
iSCSI gateways for both high availability and scalability. When
communicating with an iSCSI configuration with more than one gateway,
initiators may load-balance iSCSI requests across multiple gateways. In the
event of a gateway failing, being temporarily unreachable, or being disabled
for maintenance, I/O will transparently continue via another gateway.
A minimum configuration of SUSE Enterprise Storage 5.5 with
lrbd
consists of the following components:
A Ceph storage cluster. The Ceph cluster consists of a minimum of four physical servers hosting at least eight object storage daemons (OSDs) each. In such a configuration, three OSD nodes also double as a monitor (MON) host.
An iSCSI target server running the LIO iSCSI target, configured via
lrbd
.
An iSCSI initiator host, running open-iscsi
(Linux), the Microsoft iSCSI Initiator (Microsoft Windows), or any other compatible
iSCSI initiator implementation.
A recommended production configuration of SUSE Enterprise Storage 5.5 with
lrbd
consists of:
A Ceph storage cluster. A production Ceph cluster consists of any number of (typically more than 10) OSD nodes, each typically running 10-12 object storage daemons (OSDs), with no fewer than three dedicated MON hosts.
Several iSCSI target servers running the LIO iSCSI target, configured via
lrbd
. For iSCSI fail-over and load-balancing,
these servers must run a kernel supporting the
target_core_rbd
module. Update packages are
available from the SUSE Linux Enterprise Server maintenance channel.
Any number of iSCSI initiator hosts, running
open-iscsi
(Linux), the Microsoft iSCSI Initiator
(Microsoft Windows), or any other compatible iSCSI initiator implementation.
This section describes steps to install and configure an iSCSI Gateway on top of SUSE Enterprise Storage.
You can deploy the iSCSI Gateway either during Ceph cluster deployment process, or add it to an existing cluster using DeepSea.
To include the iSCSI Gateway during the cluster deployment process, refer to Section 4.5.1.2, “Role Assignment”.
To add the iSCSI Gateway to an existing cluster, refer to Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.2 “Adding New Roles to Nodes”.
RBD images are created in the Ceph store and subsequently exported to
iSCSI. We recommend that you use a dedicated RADOS pool for this purpose.
You can create a volume from any host that is able to connect to your
storage cluster using the Ceph rbd
command line
utility. This requires the client to have at least a minimal ceph.conf
configuration file, and appropriate CephX authentication credentials.
To create a new volume for subsequent export via iSCSI, use the
rbd create
command, specifying the volume size in
megabytes. For example, in order to create a 100 GB volume named
testvol
in the pool named iscsi
, run:
cephadm >
rbd --pool iscsi create --size=102400 testvol
The above command creates an RBD volume in the default format 2.
Since SUSE Enterprise Storage 3, the default volume format is 2, and format 1 is
deprecated. However, you can still create the deprecated format 1 volumes
with the --image-format 1
option.
To export RBD images via iSCSI, use the lrbd
utility. lrbd
allows you to create, review, and
modify the iSCSI target configuration, which uses a JSON format.
Any changes made to the iSCSI Gateway configuration using the
lrbd
command are not visible to DeepSea and openATTIC. To
import your manual changes, you need to export the iSCSI Gateway configuration to
a file:
root@minion >
lrbd -o /tmp/lrbd.conf
Then copy it to the Salt master so that DeepSea and openATTIC can see it:
root@minion >
scp /tmp/lrbd.conf ses5master:/srv/salt/ceph/igw/cache/lrbd.conf
Finally, edit /srv/pillar/ceph/stack/global.yml
and
set:
igw_config: default-ui
In order to edit the configuration, use lrbd -e
or
lrbd --edit
. This command will invoke the default
editor, as defined by the EDITOR
environment variable.
You may override this behavior by setting the -E
option in
addition to -e
.
Below is an example configuration for
two iSCSI gateway hosts named iscsi1.example.com
and
iscsi2.example.com
,
defining a single iSCSI target with an iSCSI Qualified Name (IQN) of
iqn.2003-01.org.linux-iscsi.iscsi.x86:testvol
,
with a single iSCSI Logical Unit (LU),
backed by an RBD image named testvol
in the RADOS pool
rbd
,
and exporting the target via two portals named "east" and "west":
{ "auth": [ { "target": "iqn.2003-01.org.linux-iscsi.iscsi.x86:testvol", "authentication": "none" } ], "targets": [ { "target": "iqn.2003-01.org.linux-iscsi.iscsi.x86:testvol", "hosts": [ { "host": "iscsi1.example.com", "portal": "east" }, { "host": "iscsi2.example.com", "portal": "west" } ] } ], "portals": [ { "name": "east", "addresses": [ "192.168.124.104" ] }, { "name": "west", "addresses": [ "192.168.124.105" ] } ], "pools": [ { "pool": "rbd", "gateways": [ { "target": "iqn.2003-01.org.linux-iscsi.iscsi.x86:testvol", "tpg": [ { "image": "testvol" } ] } ] } ] }
Note that whenever you refer to a host name in the configuration, this host
name must match the iSCSI gateway's uname -n
command
output.
The edited JSON is stored in the extended attributes (xattrs) of a single
RADOS object per pool. This object is available to the gateway hosts where
the JSON is edited, as well as to all gateway hosts connected to the same
Ceph cluster. No configuration information is stored locally on the
lrbd
gateway.
To activate the configuration, store it in the Ceph cluster, and do one
of the following things (as root
):
Run the lrbd
command (without additional options) from
the command line,
or
Restart the lrbd
service with service
lrbd restart
.
The lrbd
"service" does not operate any background
daemon. Instead, it simply invokes the lrbd
command.
This type of service is known as a "one-shot" service.
You should also enable lrbd
to auto-configure on
system start-up. To do so, run the systemctl enable lrbd
command.
The configuration above reflects a simple, one-gateway setup.
lrbd
configuration can be much more complex and
powerful. The lrbd
RPM package comes with an
extensive set of configuration examples, which you may refer to by checking
the content of the
/usr/share/doc/packages/lrbd/samples
directory after
installation. The samples are also available from
https://github.com/SUSE/lrbd/tree/master/samples.
iSCSI authentication is flexible and covers many possibilities. The five
possible top level settings are none
,
tpg
, acls
,
tpg+identified
and identified
.
'No authentication' means that no initiator will require a user name and password to access any LUNs for a specified host or target. 'No authentication' can be set explicitly or implicitly. Specify a value of 'none' for authentication to be set explicitly:
{ "host": "igw1", "authentication": none }
Removing the entire auth
section from the configuration
will use no authentication implicitly.
For common credentials or a shared user name/password, set authentication
to tpg
. This setting will apply to all initiators for the
associated host or target. In the following example, the same user name
and password are used for the redundant target and a target local to
igw1
:
{ "target": "iqn.2003-01.org.linux-iscsi.igw.x86:sn.redundant", "authentication": tpg, "tpg": { "userid": "common1", "password": "pass1" } }, { "host": "igw1", "authentication": tpg, "tpg": { "userid": "common1", "password": "pass1" } }
Redundant configurations will have the same credentials across gateways but are independent of other configurations. In other words, LUNs configured specifically for a host and multiple redundant configurations can have a unique user name and password for each.
One caveat is that any initiator setting will be ignored when using
tpg
authentication. Using common credentials does not
restrict which initiators may connect. This configuration may be suitable
in isolated network environments.
For unique credentials for each initiator, set authentication to
acls
. Additionally, only defined initiators are allowed
to connect.
{ "host": "igw1", "authentication": acls, "acls": [ { "initiator": "iqn.1996-04.de.suse:01:e6ca28cc9f20", "userid": "initiator1", "password": "pass1", } ] },
The previous two authentication settings pair two independent features: TPG pairs common credentials with unidentified initiators, while ACLs pair unique credentials with identified initiators.
Setting authentication to tpg+identified
pairs common
credentials with identified initiators. Although you can imitate the same
behavior choosing acls
and repeating the same credentials
with each initiator, the configuration would grow huge and harder to
maintain.
The following configuration uses the tpg
configuration
with only the authentication keyword changing.
{ "target": "iqn.2003-01.org.linux-iscsi.igw.x86:sn.redundant", "authentication": tpg+identified, "tpg": { "userid": "common1", "password": "pass1" } }, { "host": "igw1", "authentication": tpg+identified, "tpg": { "userid": "common1", "password": "pass1" } }
The list of initiators is gathered from those defined in the pools for the given hosts and targets in the authentication section.
This type of authentication does not use any credentials. In secure
environments where only assignment of initiators is needed, set the
authentication to identified
. All initiators will connect
but only have access to the images listed in the pools
section.
{ "target": "iqn.2003-01.org.linux-iscsi:igw.x86:sn.redundant", "authentication": "identified", }, { "host": "igw1", "authentication": "identified", }
Discovery authentication is independent of the previous authentication methods. It requires credentials for browsing.
Authentication of type tpg
,
tpg+identified
, acls
, and
discovery
support mutual authentication. Specifying the
mutual settings requires that the target authenticates against the
initiator.
Discovery and mutual authentications are optional. These options can be present, but disabled allowing experimentation with a particular configuration. After you decide, you can remove the disabled entries without breaking the configuration.
Refer to the examples in
/usr/share/doc/packages/lrbd/samples
. You can combine
excerpts from one file with others to create unique configurations.
The following settings may be useful for some environments. For images,
there are uuid
, lun
,
retries
, sleep
, and
retry_errors
attributes. The first
two—uuid
and lun
—allow
hardcoding of the 'uuid' or 'lun' for a specific image. You can specify
either of them for an image. The retries
,
sleep
and retry_errors
affect attempts to
map an rbd image.
If a site needs statically assigned LUNs, then assign numbers to each LUN.
"pools": [ { "pool": "rbd", "gateways": [ { "host": "igw1", "tpg": [ { "image": "archive", "uuid": "12345678-abcd-9012-efab-345678901234", "lun": "2", "retries": "3", "sleep": "4", "retry_errors": [ 95 ], [...] } ] } ] } ]
lrbd
can be configured with advanced parameters
which are subsequently passed on to the LIO I/O target. The parameters are
divided up into iSCSI and backing store components, which can then be
specified in the "targets" and "tpg" sections, respectively, of the
lrbd
configuration.
Unless otherwise noted, changing these parameters from the default setting is not recommended.
"targets": [ { [...] "tpg_default_cmdsn_depth": "64", "tpg_default_erl": "0", "tpg_login_timeout": "10", "tpg_netif_timeout": "2", "tpg_prod_mode_write_protect": "0", } ]
A description of the options follows:
Default CmdSN (Command Sequence Number) depth. Limits the amount of requests that an iSCSI initiator can have outstanding at any moment.
Default error recovery level.
Login timeout value in seconds.
NIC failure timeout in seconds.
If set to 1, prevents writes to LUNs.
"pools": [ { "pool": "rbd", "gateways": [ { "host": "igw1", "tpg": [ { "image": "archive", "backstore_block_size": "512", "backstore_emulate_3pc": "1", "backstore_emulate_caw": "1", "backstore_emulate_dpo": "0", "backstore_emulate_fua_read": "0", "backstore_emulate_fua_write": "1", "backstore_emulate_model_alias": "0", "backstore_emulate_pr": "1", "backstore_emulate_rest_reord": "0", "backstore_emulate_tas": "1", "backstore_emulate_tpu": "0", "backstore_emulate_tpws": "0", "backstore_emulate_ua_intlck_ctrl": "0", "backstore_emulate_write_cache": "0", "backstore_enforce_pr_isids": "1", "backstore_fabric_max_sectors": "8192", "backstore_hw_block_size": "512", "backstore_hw_max_sectors": "8192", "backstore_hw_pi_prot_type": "0", "backstore_hw_queue_depth": "128", "backstore_is_nonrot": "1", "backstore_max_unmap_block_desc_count": "1", "backstore_max_unmap_lba_count": "8192", "backstore_max_write_same_len": "65535", "backstore_optimal_sectors": "8192", "backstore_pi_prot_format": "0", "backstore_pi_prot_type": "0", "backstore_queue_depth": "128", "backstore_unmap_granularity": "8192", "backstore_unmap_granularity_alignment": "4194304" } ] } ] } ]
A description of the options follows:
Block size of the underlying device.
If set to 1, enables Third Party Copy.
If set to 1, enables Compare and Write.
If set to 1, turns on Disable Page Out.
If set to 1, enables Force Unit Access read.
If set to 1, enables Force Unit Access write.
If set to 1, uses the back-end device name for the model alias.
If set to 0, support for SCSI Reservations, including Persistent Group Reservations, is disabled. While disabled, the SES iSCSI Gateway can ignore reservation state, resulting in improved request latency.
Setting backstore_emulate_pr to 0 is recommended if iSCSI initiators do not require SCSI Reservation support.
If set to 0, the Queue Algorithm Modifier has Restricted Reordering.
If set to 1, enables Task Aborted Status.
If set to 1, enables Thin Provisioning Unmap.
If set to 1, enables Thin Provisioning Write Same.
If set to 1, enables Unit Attention Interlock.
If set to 1, turns on Write Cache Enable.
If set to 1, enforces persistent reservation ISIDs.
Maximum number of sectors the fabric can transfer at once.
Hardware block size in bytes.
Maximum number of sectors the hardware can transfer at once.
If non-zero, DIF protection is enabled on the underlying hardware.
Hardware queue depth.
If set to 1, the backstore is a non-rotational device.
Maximum number of block descriptors for UNMAP.
Maximum number of LBAs for UNMAP.
Maximum length for WRITE_SAME.
Optimal request size in sectors.
DIF protection format.
DIF protection type.
Queue depth.
UNMAP granularity.
UNMAP granularity alignment.
For targets, the tpg
attributes allow tuning of kernel
parameters. Use with caution.
"targets": [ { "host": "igw1", "target": "iqn.2003-01.org.linux-iscsi.generic.x86:sn.abcdefghijk", "tpg_default_cmdsn_depth": "64", "tpg_default_erl": "0", "tpg_login_timeout": "10", "tpg_netif_timeout": "2", "tpg_prod_mode_write_protect": "0", "tpg_t10_pi": "0" }
For initiators, the attrib
and param
settings allow the tuning of kernel parameters. Use with caution. These are
set in the authentication section. If the authentication is
tpg+identified
or identified
, then the
subsection is identified.
"auth": [ { "authentication": "tpg+identified", "identified": [ { "initiator": "iqn.1996-04.de.suse:01:e6ca28cc9f20", "attrib_dataout_timeout": "3", "attrib_dataout_timeout_retries": "5", "attrib_default_erl": "0", "attrib_nopin_response_timeout": "30", "attrib_nopin_timeout": "15", "attrib_random_datain_pdu_offsets": "0", "attrib_random_datain_seq_offsets": "0", "attrib_random_r2t_offsets": "0", "param_DataPDUInOrder": "1", "param_DataSequenceInOrder": "1", "param_DefaultTime2Retain": "0", "param_DefaultTime2Wait": "2", "param_ErrorRecoveryLevel": "0", "param_FirstBurstLength": "65536", "param_ImmediateData": "1", "param_InitialR2T": "1", "param_MaxBurstLength": "262144", "param_MaxConnections": "1", "param_MaxOutstandingR2T": "1" } ] } ]
If the authentication is acls
, then the settings are
included in the acls
subsection. One caveat is that
settings are only applied for active initiators. If an initiator is absent
from the pools section, the acl
entry is not created and
settings cannot be applied.
tcmu-runner
#Edit source
Since version 5, SUSE Enterprise Storage ships a user space RBD back-end for
tcmu-runner
(see man 8
tcmu-runner
for details).
tcmu-runner
based iSCSI Gateway deployments are currently
a technology preview. See Chapter 10, Installation of iSCSI Gateway for
instructions on kernel-based iSCSI Gateway deployment with
lrbd
.
Unlike kernel-based lrbd
iSCSI Gateway deployments,
tcmu-runner
based iSCSI Gateways do not offer support for
multipath I/O or SCSI Persistent Reservations.
As DeepSea and openATTIC do not currently support
tcmu-runner
deployments, you need to manage the
installation, deployment, and monitoring manually.
On your iSCSI Gateway node, install the
tcmu-runner-handler-rbd
package from the
SUSE Enterprise Storage 5 media, together with the libtcmu1
and tcmu-runner
package dependencies. Install the
targetcli-fb
package for configuration purposes.
Note that the targetcli-fb
package is incompatible
with the 'non-fb' version of the targetcli
package.
Confirm that the tcmu-runner
systemd
service is
running:
root #
systemctl enable tcmu-runner
tcmu-gw:~ # systemctl status tcmu-runner
● tcmu-runner.service - LIO Userspace-passthrough daemon
Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; static; vendor
preset: disabled)
Active: active (running) since ...
Create a RADOS Block Device image on your existing Ceph cluster. In the following example, we will use a 10G image called 'tcmu-lu' located in the 'rbd' pool.
Following RADOS Block Device image creation, run targetcli
, and
ensure that the tcmu-runner RBD handler (plug-in) is available:
root #
targetcli
targetcli shell version 2.1.fb46
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.
/> ls
o- / ................................... [...]
o- backstores ........................ [...]
...
| o- user:rbd ......... [Storage Objects: 0]
Create a backstore configuration entry for the RBD image:
/> cd backstores/user:rbd /backstores/user:rbd> create tcmu-lu 10G /rbd/tcmu-lu Created user-backed storage object tcmu-lu size 10737418240.
Create an iSCSI transport configuration entry. In the following example,
the target IQN "iqn.2003-01.org.linux-iscsi.tcmu-gw.x8664:sn.cb3d2a3a" is
automatically generated by targetcli
for use as a unique
iSCSI target identifier:
/backstores/user:rbd> cd /iscsi /iscsi> create Created target iqn.2003-01.org.linux-iscsi.tcmu-gw.x8664:sn.cb3d2a3a. Created TPG 1. Global pref auto_add_default_portal=true Created default portal listening on all IPs (0.0.0.0), port 3260.
Create an ACL entry for the iSCSI initiator(s) that you want to connect to the target. In the following example, an initiator IQN of "iqn.1998-01.com.vmware:esxi-872c4888" is used:
/iscsi> cd iqn.2003-01.org.linux-iscsi.tcmu-gw.x8664:sn.cb3d2a3a/tpg1/acls/ /iscsi/iqn.20...a3a/tpg1/acls> create iqn.1998-01.com.vmware:esxi-872c4888
Finally, link the previously created RBD backstore configuration to the iSCSI target:
/iscsi/iqn.20...a3a/tpg1/acls> cd ../luns /iscsi/iqn.20...a3a/tpg1/luns> create /backstores/user:rbd/tcmu-lu Created LUN 0. Created LUN 0->0 mapping in node ACL iqn.1998-01.com.vmware:esxi-872c4888
Exit the shell to save the existing configuration:
/iscsi/iqn.20...a3a/tpg1/luns> exit Global pref auto_save_on_exit=true Last 10 configs saved in /etc/target/backup. Configuration saved to /etc/target/saveconfig.json
From your iSCSI initiator (client) node, connect to your newly provisioned iSCSI target using the IQN and host name configured above.
"auth": [ { "host": "igw1", "authentication": "acls", "acls": [ { "initiator": "iqn.1996-04.de.suse:01:e6ca28cc9f20", "userid": "initiator1", "password": "pass1", "attrib_dataout_timeout": "3", "attrib_dataout_timeout_retries": "5", "attrib_default_erl": "0", "attrib_nopin_response_timeout": "30", "attrib_nopin_timeout": "15", "attrib_random_datain_pdu_offsets": "0", "attrib_random_datain_seq_offsets": "0", "attrib_random_r2t_offsets": "0", "param_DataPDUInOrder": "1", "param_DataSequenceInOrder": "1", "param_DefaultTime2Retain": "0", "param_DefaultTime2Wait": "2", "param_ErrorRecoveryLevel": "0", "param_FirstBurstLength": "65536", "param_ImmediateData": "1", "param_InitialR2T": "1", "param_MaxBurstLength": "262144", "param_MaxConnections": "1", "param_MaxOutstandingR2T": "1" } ] }, ]
The Ceph file system (CephFS) is a POSIX-compliant file system that uses
a Ceph storage cluster to store its data. CephFS uses the same cluster
system as Ceph block devices, Ceph object storage with its S3 and Swift
APIs, or native bindings (librados
).
To use CephFS, you need to have a running Ceph storage cluster, and at least one running Ceph metadata server.
With SUSE Enterprise Storage, SUSE introduces official support for many scenarios in which the scale-out and distributed component CephFS is used. This entry describes hard limits and provides guidance for the suggested use cases.
A supported CephFS deployment must meet these requirements:
A minimum of one Metadata Server. SUSE recommends to deploy several nodes with the
MDS role. Only one will be 'active' and the rest will be 'passive'.
Remember to mention all the MON nodes in the mount
command when mounting the CephFS from a client.
CephFS snapshots are disabled (default) and not supported in this version.
Clients are SUSE Linux Enterprise Server 12 SP2 or SP3 based, using the
cephfs
kernel module driver. The FUSE module is not
supported.
CephFS quotas are not supported in SUSE Enterprise Storage, as support for quotas is implemented in the FUSE client only.
CephFS supports file layout changes as documented in
http://docs.ceph.com/docs/jewel/cephfs/file-layouts/.
However, while the file system is mounted by any client, new data pools
may not be added to an existing CephFS file system (ceph mds
add_data_pool
). They may only be added while the file system is
unmounted.
Ceph metadata server (MDS) stores metadata for the CephFS. Ceph block
devices and Ceph object storage do not use MDS. MDSs
make it possible for POSIX file system users to execute basic
commands—such as ls
or
find
—without placing an enormous burden on the
Ceph storage cluster.
You can deploy MDS either during the initial cluster deployment process as described in Section 4.3, “Cluster Deployment”, or add it to an already deployed cluster as described in Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.1 “Adding New Cluster Nodes”.
After you deploy your MDS, allow the Ceph OSD/MDS
service in the firewall setting of the server where MDS is deployed: Start
yast
, navigate to › › and in the drop–down menu select . If the Ceph MDS node is not allowed full traffic,
mounting of a file system fails, even though other operations may work
properly.
You can fine-tune the MDS behavior by inserting relevant options in the
ceph.conf
configuration file.
mds cache memory limit
The soft memory limit (in bytes) that the MDS will enforce for its
cache. Administrators should use this instead of the old mds
cache size
setting. Defaults to 1GB.
mds cache reservation
The cache reservation (memory or inodes) for the MDS cache to maintain. When the MDS begins touching its reservation, it will recall client state until its cache size shrinks to restore the reservation. Defaults to 0.05.
For a detailed list of MDS related configuration options, see http://docs.ceph.com/docs/master/cephfs/mds-config-ref/.
For a detailed list of MDS journaler configuration options, see http://docs.ceph.com/docs/master/cephfs/journaler/.
When you have a healthy Ceph storage cluster with at least one Ceph metadata server, you can create and mount your Ceph file system. Ensure that your client has network connectivity and a proper authentication keyring.
A CephFS requires at least two RADOS pools: one for data and one for metadata. When configuring these pools, you might consider:
Using a higher replication level for the metadata pool, as any data loss in this pool can render the whole file system inaccessible.
Using lower-latency storage such as SSDs for the metadata pool, as this will improve the observed latency of file system operations on clients.
When assigning a role-mds
in the
policy.cfg
, the required pools are automatically
created. You can manually create the pools cephfs_data
and cephfs_metadata
for manual performance tuning before
setting up the Metadata Server. DeepSea will not create these pools if they already
exist.
For more information on managing pools, see Book “Administration Guide”, Chapter 8 “Managing Storage Pools”.
To create the two required pools—for example, 'cephfs_data' and 'cephfs_metadata'—with default settings for use with CephFS, run the following commands:
cephadm >
ceph osd pool create cephfs_data pg_numcephadm >
ceph osd pool create cephfs_metadata pg_num
It is possible to use EC pools instead of replicated pools. We recommend to
only use EC pools for low performance requirements and infrequent random
access, for example cold storage, backups, archiving. CephFS on EC pools
requires BlueStore to be enabled and the pool must have the
allow_ec_overwrite
option set. This option can be set by
running ceph osd pool set ec_pool allow_ec_overwrites
true
.
Erasure coding adds significant overhead to file system operations, especially small updates. This overhead is inherent to using erasure coding as a fault tolerance mechanism. This penalty is the trade off for significantly reduced storage space overhead.
When the pools are created, you may enable the file system with the
ceph fs new
command:
cephadm >
ceph fs new fs_name metadata_pool_name data_pool_name
For example:
cephadm >
ceph fs new cephfs cephfs_metadata cephfs_data
You can check that the file system was created by listing all available CephFSs:
cephadm >
ceph
fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
When the file system has been created, your MDS will be able to enter an active state. For example, in a single MDS system:
cephadm >
ceph
mds stat
e5: 1/1/1 up
You can find more information of specific tasks—for example mounting, unmounting, and advanced CephFS setup—in Book “Administration Guide”, Chapter 15 “Clustered File System”.
A CephFS instance can be served by multiple active MDS daemons. All active MDS daemons that are assigned to a CephFS instance will distribute the file system's directory tree between themselves, and thus spread the load of concurrent clients. In order to add an active MDS daemon to a CephFS instance, a spare standby is needed. Either start an additional daemon or use an existing standby instance.
The following command will display the current number of active and passive MDS daemons.
cephadm >
ceph mds stat
The following command sets the number of active MDS's to two in a file system instance.
cephadm >
ceph fs set fs_name max_mds 2
In order to shrink the MDS cluster prior to an update, two steps are
necessary. First set max_mds
so that only one instance
remains:
cephadm >
ceph fs set fs_name max_mds 1
and after that explicitly deactivate the other active MDS daemons:
cephadm >
ceph mds deactivate fs_name:rank
where rank is the number of an active MDS daemon
of a file system instance, ranging from 0 to max_mds
-1.
See
http://docs.ceph.com/docs/luminous/cephfs/multimds/
for additional information.
During Ceph updates, the feature flags on a file system instance may change (usually by adding new features). Incompatible daemons (such as the older versions) are not able to function with an incompatible feature set and will refuse to start. This means that updating and restarting one daemon can cause all other not yet updated daemons to stop and refuse to start. For this reason we, recommend shrinking the active MDS cluster to size one and stopping all standby daemons before updating Ceph. The manual steps for this update procedure are as follows:
Update the Ceph related packages using zypper
.
Shrink the active MDS cluster as described above to 1 instance and stop
all standby MDS daemons using their systemd
units on all other nodes:
root #
systemctl stop ceph-mds\*.service ceph-mds.target
Only then restart the single remaining MDS daemon, causing it to restart using the updated binary.
root #
systemctl restart ceph-mds\*.service ceph-mds.target
Restart all other MDS daemons and re-set the desired
max_mds
setting.
root #
systemctl start ceph-mds.target
If you use DeepSea, it will follow this procedure in case the ceph package was updated during Stages 0 and 4. It is possible to perform this procedure while clients have the CephFS instance mounted and I/O is ongoing. Note however that there will be a very brief I/O pause while the active MDS restarts. Clients will recover automatically.
It is good practice to reduce the I/O load as much as possible before updating an MDS cluster. An idle MDS cluster will go through this update procedure quicker. Conversely, on a heavily loaded cluster with multiple MDS daemons it is essential to reduce the load in advance to prevent a single MDS daemon from being overwhelmed by ongoing I/O.
NFS Ganesha provides NFS access to either the Object Gateway or the CephFS. In SUSE Enterprise Storage 5.5, NFS versions 3 and 4 are supported. NFS Ganesha runs in the user space instead of the kernel space and directly interacts with the Object Gateway or CephFS.
Native CephFS and NFS clients are not restricted by file locks obtained via Samba, and vice-versa. Applications that rely on cross protocol file locking may experience data corruption if CephFS backed Samba share paths are accessed via other means.
To successfully deploy NFS Ganesha, you need to add a
role-ganesha
to your
/srv/pillar/ceph/proposals/policy.cfg
. For details,
see Section 4.5.1, “The policy.cfg
File”. NFS Ganesha also needs either a
role-rgw
or a role-mds
present in the
policy.cfg
.
Although it is possible to install and run the NFS Ganesha server on an already existing Ceph node, we recommend running it on a dedicated host with access to the Ceph cluster. The client hosts are typically not part of the cluster, but they need to have network access to the NFS Ganesha server.
To enable the NFS Ganesha server at any point after the initial installation,
add the role-ganesha
to the
policy.cfg
and re-run at least DeepSea stages 2 and
4. For details, see Section 4.3, “Cluster Deployment”.
NFS Ganesha is configured via the file
/etc/ganesha/ganesha.conf
that exists on the NFS Ganesha
node. However, this file is overwritten each time DeepSea stage 4 is
executed. Therefore we recommend to edit the template used by Salt, which
is the file
/srv/salt/ceph/ganesha/files/ganesha.conf.j2
on the
Salt master. For details about the configuration file, see
Book “Administration Guide”, Chapter 16 “NFS Ganesha: Export Ceph Data via NFS”, Section 16.2 “Configuration”.
The following requirements need to be met before DeepSea stages 2 and 4 can be executed to install NFS Ganesha:
At least one node needs to be assigned the
role-ganesha
.
You can define only one role-ganesha
per minion.
NFS Ganesha needs either an Object Gateway or CephFS to work.
If NFS Ganesha is supposed to use the Object Gateway to interface with the cluster,
the /srv/pillar/ceph/rgw.sls
on the Salt master needs
to be populated.
The kernel based NFS needs to be disabled on minions with the
role-ganesha
role.
This procedure provides an example installation that uses both the Object Gateway and CephFS File System Abstraction Layers (FSAL) of NFS Ganesha.
If you have not done so, execute DeepSea stages 0 and 1 before continuing with this procedure.
root@master #
salt-run
state.orch ceph.stage.0root@master #
salt-run
state.orch ceph.stage.1
After having executed stage 1 of DeepSea, edit the
/srv/pillar/ceph/proposals/policy.cfg
and add the
line
role-ganesha/cluster/NODENAME
Replace NODENAME with the name of a node in your cluster.
Also make sure that a role-mds
and a
role-rgw
are assigned.
Create a file with '.yml' extension in the
/srv/salt/ceph/rgw/users/users.d
directory and insert
the following content:
- { uid: "demo", name: "Demo", email: "demo@demo.nil" } - { uid: "demo1", name: "Demo1", email: "demo1@demo.nil" }
These users are later created as Object Gateway users, and API keys are generated.
On the Object Gateway node, you can later run radosgw-admin user
list
to list all created users and radosgw-admin user
info --uid=demo
to obtain details about single users.
DeepSea makes sure that Object Gateway and NFS Ganesha both receive the credentials
of all users listed in the rgw
section of the
rgw.sls
.
The exported NFS uses these user names on the first level of the file
system, in this example the paths /demo
and
/demo1
would be exported.
Execute at least stages 2 and 4 of DeepSea. Running stage 3 in between is recommended.
root@master #
salt-run
state.orch ceph.stage.2root@master #
salt-run
state.orch ceph.stage.3 # optional but recommendedroot@master #
salt-run
state.orch ceph.stage.4
Verify that NFS Ganesha is working by mounting the NFS share from a client node:
root #
mount
-o sync -t nfs GANESHA_NODE:/ /mntroot #
ls
/mnt cephfs demo demo1
/mnt
should contain all exported paths. Directories
for CephFS and both Object Gateway users should exist. For each bucket a user
owns, a path
/mnt/USERNAME/BUCKETNAME
would be exported.
This section provides an example of how to set up a two-node active-passive
configuration of NFS Ganesha servers. The setup requires the SUSE Linux Enterprise High Availability Extension. The
two nodes are called earth
and mars
.
For details about SUSE Linux Enterprise High Availability Extension, see https://documentation.suse.com/sle-ha/12-SP5/.
In this setup earth
has the IP address
192.168.1.1
and mars
has
the address 192.168.1.2
.
Additionally, two floating virtual IP addresses are used, allowing clients
to connect to the service independent of which physical node it is running
on. 192.168.1.10
is used for
cluster administration with Hawk2 and
192.168.2.1
is used exclusively
for the NFS exports. This makes it easier to apply security restrictions
later.
The following procedure describes the example installation. More details can be found at https://documentation.suse.com/sle-ha/12-SP5/single-html/SLE-HA-install-quick/.
Prepare the NFS Ganesha nodes on the Salt master:
Run DeepSea stages 0 and 1.
root@master #
salt-run
state.orch ceph.stage.0root@master #
salt-run
state.orch ceph.stage.1
Assign the nodes earth
and mars
the
role-ganesha
in the
/srv/pillar/ceph/proposals/policy.cfg
:
role-ganesha/cluster/earth*.sls role-ganesha/cluster/mars*.sls
Run DeepSea stages 2 to 4.
root@master #
salt-run
state.orch ceph.stage.2root@master #
salt-run
state.orch ceph.stage.3root@master #
salt-run
state.orch ceph.stage.4
Register the SUSE Linux Enterprise High Availability Extension on earth
and mars
.
root #
SUSEConnect
-r ACTIVATION_CODE -e E_MAIL
Install ha-cluster-bootstrap on both nodes:
root #
zypper
in ha-cluster-bootstrap
Initialize the cluster on earth
:
root@earth #
ha-cluster-init
Let mars
join the cluster:
root@mars #
ha-cluster-join
-c earth
Check the status of the cluster. You should see two nodes added to the cluster:
root@earth #
crm
status
On both nodes, disable the automatic start of the NFS Ganesha service at boot time:
root #
systemctl
disable nfs-ganesha
Start the crm
shell on earth
:
root@earth #
crm
configure
The next commands are executed in the crm shell.
On earth
, run the crm shell to execute the following commands to
configure the resource for NFS Ganesha daemons as clone of systemd resource
type:
crm(live)configure#
primitive nfs-ganesha-server systemd:nfs-ganesha \ op monitor interval=30scrm(live)configure#
clone nfs-ganesha-clone nfs-ganesha-server meta interleave=truecrm(live)configure#
commitcrm(live)configure#
status 2 nodes configured 2 resources configured Online: [ earth mars ] Full list of resources: Clone Set: nfs-ganesha-clone [nfs-ganesha-server] Started: [ earth mars ]
Create a primitive IPAddr2 with the crm shell:
crm(live)configure#
primitive ganesha-ip IPaddr2 \ params ip=192.168.2.1 cidr_netmask=24 nic=eth0 \ op monitor interval=10 timeout=20crm(live)#
status Online: [ earth mars ] Full list of resources: Clone Set: nfs-ganesha-clone [nfs-ganesha-server] Started: [ earth mars ] ganesha-ip (ocf::heartbeat:IPaddr2): Started earth
To set up a relationship between the NFS Ganesha server and the floating Virtual IP, we use collocation and ordering.
crm(live)configure#
colocation ganesha-ip-with-nfs-ganesha-server inf: ganesha-ip nfs-ganesha-clonecrm(live)configure#
order ganesha-ip-after-nfs-ganesha-server Mandatory: nfs-ganesha-clone ganesha-ip
Use the mount
command from the client to ensure that
cluster setup is complete:
root #
mount
-t nfs -v -o sync,nfsvers=4 192.168.2.1:/ /mnt
In the event of an NFS Ganesha failure at one of the node, for example
earth
, fix the issue and clean up the resource. Only after the
resource is cleaned up can the resource fail back to earth
in case
NFS Ganesha fails at mars
.
To clean up the resource:
root@earth #
crm
resource cleanup nfs-ganesha-clone earthroot@earth #
crm
resource cleanup ganesha-ip earth
It may happen that the server is unable to reach the client because of a network issue. A ping resource can detect and mitigate this problem. Configuring this resource is optional.
Define the ping resource:
crm(live)configure#
primitive ganesha-ping ocf:pacemaker:ping \
params name=ping dampen=3s multiplier=100 host_list="CLIENT1 CLIENT2" \
op monitor interval=60 timeout=60 \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60
host_list
is a list of IP addresses separated by space
characters. The IP addresses will be pinged regularly to check for
network outages. If a client must always have access to the NFS server,
add it to host_list
.
Create a clone:
crm(live)configure#
clone ganesha-ping-clone ganesha-ping \
meta interleave=true
The following command creates a constraint for the NFS Ganesha service. It
forces the service to move to another node when
host_list
is unreachable.
crm(live)configure#
location nfs-ganesha-server-with-ganesha-ping
nfs-ganesha-clone \
rule -inf: not_defined ping or ping lte 0
When a service goes down, the TCP connection that is in use by NFS Ganesha is required to be closed otherwise it continues to run until a system-specific timeout occurs. This timeout can take upwards of 3 minutes.
To shorten the timeout time, the TCP connection needs to be reset.
We recommend configuring portblock
to reset stale TCP
connections.
You can choose to use portblock with or without the
tickle_dir
parameters that could unblock and
reconnect clients to the new service faster. We recommend to
have tickle_dir
as the shared CephFS mount
between two HA nodes (where NFS Ganesha services are running).
Configuring the following resource is optional.
On earth
, run the crm shell to execute the following
commands to configure the resource for NFS Ganesha daemons:
root@earth #
crm
configure
Configure the block
action for
portblock
and omit the tickle_dir
option if you have not configured a shared directory:
crm(live)configure#
primitive nfs-ganesha-block ocf:portblock \
protocol=tcp portno=2049 action=block ip=192.168.2.1 op monitor depth="0" timeout="10" interval="10" tickle_dir="/tmp/ganesha/tickle/"
Configure the unblock
action for
portblock
and omit the reset_local_on_unblock_stop
option if you have not configured a shared directory:
crm(live)configure#
primitive nfs-ganesha-unblock ocf:portblock \
protocol=tcp portno=2049 action=unblock ip=192.168.2.1 op monitor depth="0" timeout="10" interval="10" reset_local_on_unblock_stop=true tickle_dir="/tmp/ganesha/tickle/"
Configure the IPAddr2
resource with portblock
:
crm(live)configure#
colocation ganesha-portblock inf: ganesha-ip nfs-ganesha-block nfs-ganesha-unblockcrm(live)configure#
edit ganesha-ip-after-nfs-ganesha-server order ganesha-ip-after-nfs-ganesha-server Mandatory: nfs-ganesha-block nfs-ganesha-clone ganesha-ip nfs-ganesha-unblock
Save your changes:
crm(live)configure#
commit
Your configuration should look like this:
crm(live)configure#
show
" node 1084782956: nfs1 node 1084783048: nfs2 primitive ganesha-ip IPaddr2 \ params ip=192.168.2.1 cidr_netmask=24 nic=eth0 \ op monitor interval=10 timeout=20 primitive nfs-ganesha-block portblock \ params protocol=tcp portno=2049 action=block ip=192.168.2.1 \ tickle_dir="/tmp/ganesha/tickle/" op monitor timeout=10 interval=10 depth=0 primitive nfs-ganesha-server systemd:nfs-ganesha \ op monitor interval=30s primitive nfs-ganesha-unblock portblock \ params protocol=tcp portno=2049 action=unblock ip=192.168.2.1 \ reset_local_on_unblock_stop=true tickle_dir="/tmp/ganesha/tickle/" \ op monitor timeout=10 interval=10 depth=0 clone nfs-ganesha-clone nfs-ganesha-server \ meta interleave=true location cli-prefer-ganesha-ip ganesha-ip role=Started inf: nfs1 order ganesha-ip-after-nfs-ganesha-server Mandatory: nfs-ganesha-block nfs-ganesha-clone ganesha-ip nfs-ganesha-unblock colocation ganesha-ip-with-nfs-ganesha-server inf: ganesha-ip nfs-ganesha-clone colocation ganesha-portblock inf: ganesha-ip nfs-ganesha-block nfs-ganesha-unblock property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.16-6.5.1-77ea74d \ cluster-infrastructure=corosync \ cluster-name=hacluster \ stonith-enabled=false \ placement-strategy=balanced \ last-lrm-refresh=1544793779 rsc_defaults rsc-options: \ resource-stickiness=1 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true "
In this example /tmp/ganesha/
is the CephFS
mount on both nodes (nfs1 and nfs2):
172.16.1.11:6789:/ganesha on /tmp/ganesha type ceph (rw,relatime,name=admin,secret=...hidden...,acl,wsize=16777216)
The tickle
directory has been initially
created.
DeepSea does not support configuring NFS Ganesha HA. To prevent DeepSea from failing after NFS Ganesha HA was configured, exclude starting and stopping the NFS Ganesha service from DeepSea Stage 4:
Copy /srv/salt/ceph/ganesha/default.sls
to
/srv/salt/ceph/ganesha/ha.sls
.
Remove the .service
entry from
/srv/salt/ceph/ganesha/ha.sls
so that it looks as
follows:
include: - .keyring - .install - .configure
Add the following line to
/srv/pillar/ceph/stack/global.yml
:
ganesha_init: ha
To prevent DeepSea from restarting NFS Ganesha service on stage 4:
Copy /srv/salt/ceph/stage/ganesha/default.sls
to
/srv/salt/ceph/stage/ganesha/ha.sls
.
Remove the line - ...restart.ganesha.lax
from the
/srv/salt/ceph/stage/ganesha/ha.sls
so it looks as follows:
include: - .migrate - .core
Add the following line to /srv/pillar/ceph/stack/global.yml
:
stage_ganesha: ha
More information can be found in Book “Administration Guide”, Chapter 16 “NFS Ganesha: Export Ceph Data via NFS”.
This chapter describes how to export data stored in a Ceph cluster via a Samba/CIFS share so that you can easily access them from Windows* client machines. It also includes information that will help you configure a Ceph Samba gateway to join Active Directory in the Windows* domain to authenticate and authorize users.
Because of increased protocol overhead and additional latency caused by extra network hops between the client and the storage, accessing CephFS via a Samba Gateway may significantly reduce application performance when compared to native Ceph clients.
Native CephFS and NFS clients are not restricted by file locks obtained via Samba, and vice versa. Applications that rely on cross protocol file locking may experience data corruption if CephFS backed Samba share paths are accessed via other means.
To configure and export a Samba share, the following packages need to be installed: samba-ceph and samba-winbind. If these packages are not installed, install them:
cephadm@smb >
zypper install samba-ceph samba-winbind
In preparation for exporting a Samba share, choose an appropriate node to act as a Samba Gateway. The node needs to have access to the Ceph client network, as well as sufficient CPU, memory, and networking resources.
Failover functionality can be provided with CTDB and the SUSE Linux Enterprise High Availability Extension. Refer to Section 13.1.3, “High Availability Configuration” for more information on HA setup.
Make sure that a working CephFS already exists in your cluster. For details, see Chapter 11, Installation of CephFS.
Create a Samba Gateway specific keyring on the Ceph admin node and copy it to both Samba Gateway nodes:
cephadm >
ceph
auth get-or-create client.samba.gw mon 'allow r' \ osd 'allow *' mds 'allow *' -o ceph.client.samba.gw.keyringcephadm >
scp
ceph.client.samba.gw.keyring SAMBA_NODE:/etc/ceph/
Replace SAMBA_NODE with the name of the Samba gateway node.
The following steps are executed on the Samba Gateway node. Install Samba together with the Ceph integration package:
cephadm@smb >
sudo zypper in samba samba-ceph
Replace the default contents of the
/etc/samba/smb.conf
file with the following:
[global] netbios name = SAMBA-GW clustering = no idmap config * : backend = tdb2 passdb backend = tdbsam # disable print server load printers = no smbd: backgroundqueue = no [SHARE_NAME] path = / vfs objects = ceph ceph: config_file = /etc/ceph/ceph.conf ceph: user_id = samba.gw read only = no oplocks = no kernel share modes = no
oplocks
(also known as SMB2+ leases) allow for improved
performance through aggressive client caching, but are currently unsafe
when Samba is deployed together with other CephFS clients, such as
kernel mount.ceph
, FUSE, or NFS Ganesha.
Currently kernel share modes
needs to be disabled in a
share running with the CephFS vfs module for file serving to work
properly.
Since vfs_ceph
does not require a file system
mount, the share path is interpreted as an absolute path within the
Ceph file system on the attached Ceph cluster. For successful share
I/O, the path's access control list (ACL) needs to permit access from
the mapped user for the given Samba client. You can modify the ACL by
temporarily mounting via the CephFS kernel client and using the
chmod
, chown
or
setfacl
utilities against the share path. For
example, to permit access for all users, run:
root #
chmod 777 MOUNTED_SHARE_PATH
Start and enable the Samba daemon:
cephadm@smb >
sudo systemctl start smb.servicecephadm@smb >
sudo systemctl enable smb.servicecephadm@smb >
sudo systemctl start nmb.servicecephadm@smb >
sudo systemctl enable nmb.service
Although a multi-node Samba + CTDB deployment is more highly available compared to the single node (see Chapter 13, Exporting Ceph Data via Samba), client-side transparent failover is not supported. Applications will likely experience a short outage on Samba Gateway node failure.
This section provides an example of how to set up a two-node high
availability configuration of Samba servers. The setup requires the SUSE Linux Enterprise
High Availability Extension. The two nodes are called earth
(192.168.1.1
) and mars
(192.168.1.2
).
For details about SUSE Linux Enterprise High Availability Extension, see https://documentation.suse.com/sle-ha/12-SP5/.
Additionally, two floating virtual IP addresses allow clients to connect to
the service no matter which physical node it is running on.
192.168.1.10
is used for cluster
administration with Hawk2 and
192.168.2.1
is used exclusively
for the CIFS exports. This makes it easier to apply security restrictions
later.
The following procedure describes the example installation. More details can be found at https://documentation.suse.com/sle-ha/12-SP5/single-html/SLE-HA-install-quick/.
Create a Samba Gateway specific keyring on the Admin Node and copy it to both nodes:
cephadm >
ceph
auth get-or-create client.samba.gw mon 'allow r' \ osd 'allow *' mds 'allow *' -o ceph.client.samba.gw.keyringcephadm >
scp
ceph.client.samba.gw.keyringearth
:/etc/ceph/cephadm >
scp
ceph.client.samba.gw.keyringmars
:/etc/ceph/
SLE-HA setup requires a fencing device to avoid a split brain situation when active cluster nodes become unsynchronized. For this purpose, you can use a Ceph RBD image with Stonith Block Device (SBD). Refer to https://documentation.suse.com/sle-ha/12-SP5/single-html/SLE-HA-guide/#sec-ha-storage-protect-fencing-setup for more details.
If it does not yet exist, create an RBD pool called
rbd
(see
Book “Administration Guide”, Chapter 8 “Managing Storage Pools”, Section 8.2.2 “Create a Pool”) and associate it
with rbd
(see
Book “Administration Guide”, Chapter 8 “Managing Storage Pools”, Section 8.1 “Associate Pools with an Application”). Then create a related
RBD image called sbd01
:
cephadm >
ceph osd pool create rbd PG_NUM PGP_NUM replicatedcephadm >
ceph osd pool application enable rbd rbdcephadm >
rbd -p rbd create sbd01 --size 64M --image-shared
Prepare earth
and mars
to host the Samba service:
Make sure the following packages are installed before you proceed:
ctdb, tdb-tools, and
samba (needed for
smb.service
and
nmb.service
).
cephadm@smb >
zypper
in ctdb tdb-tools samba samba-ceph
Make sure the services ctdb
, smb
,
and nmb
are stopped and disabled:
cephadm@smb >
sudo systemctl disable ctdbcephadm@smb >
sudo systemctl disable smbcephadm@smb >
sudo systemctl disable nmbcephadm@smb >
sudo systemctl stop smbcephadm@smb >
sudo systemctl stop nmb
Open port 4379
of your firewall on all nodes. This
is needed for CTDB to communicate with other cluster nodes.
On earth
, create the configuration files for Samba. They will later
automatically synchronize to mars
.
Insert a list of private IP addresses of Samba Gateway nodes in the
/etc/ctdb/nodes
file. Find more details in the
ctdb manual page (man 7 ctdb
).
192.168.1.1 192.168.1.2
Configure Samba. Add the following lines in the
[global]
section of
/etc/samba/smb.conf
. Use the host name of your
choice in place of CTDB-SERVER (all nodes in
the cluster will appear as one big node with this name). Add a share
definition as well, consider SHARE_NAME as
an example:
[global] netbios name = SAMBA-HA-GW clustering = yes idmap config * : backend = tdb2 passdb backend = tdbsam ctdbd socket = /var/lib/ctdb/ctdb.socket # disable print server load printers = no smbd: backgroundqueue = no [SHARE_NAME] path = / vfs objects = ceph ceph: config_file = /etc/ceph/ceph.conf ceph: user_id = samba.gw read only = no oplocks = no kernel share modes = no
Note that the /etc/ctdb/nodes
and
/etc/samba/smb.conf
files need to match on all
Samba Gateway nodes.
Install and bootstrap the SUSE Linux Enterprise High Availability cluster.
Register the SUSE Linux Enterprise High Availability Extension on earth
and mars
:
root@earth #
SUSEConnect
-r ACTIVATION_CODE -e E_MAIL
root@mars #
SUSEConnect
-r ACTIVATION_CODE -e E_MAIL
Install ha-cluster-bootstrap on both nodes:
root@earth #
zypper
in ha-cluster-bootstrap
root@mars #
zypper
in ha-cluster-bootstrap
Map the RBD image sbd01
on both Samba Gateways via
rbdmap.service
.
Edit /etc/ceph/rbdmap
and add an entry for the SBD
image:
rbd/sbd01 id=samba.gw,keyring=/etc/ceph/ceph.client.samba.gw.keyring
Enable and start
rbdmap.service
:
root@earth #
systemctl enable rbdmap.service && systemctl start rbdmap.serviceroot@mars #
systemctl enable rbdmap.service && systemctl start rbdmap.service
The /dev/rbd/rbd/sbd01
device should be available
on both Samba Gateways.
Initialize the cluster on earth
and let mars
join it.
root@earth #
ha-cluster-init
root@mars #
ha-cluster-join
-c earth
During the process of initialization and joining the cluster, you will
be interactively asked whether to use SBD. Confirm with
y
and then specify
/dev/rbd/rbd/sbd01
as a path to the storage
device.
Check the status of the cluster. You should see two nodes added in the cluster:
root@earth #
crm
status 2 nodes configured 1 resource configured Online: [ earth mars ] Full list of resources: admin-ip (ocf::heartbeat:IPaddr2): Started earth
Execute the following commands on earth
to configure the CTDB
resource:
root@earth #
crm
configurecrm(live)configure#
primitive
ctdb ocf:heartbeat:CTDB params \ ctdb_manages_winbind="false" \ ctdb_manages_samba="false" \ ctdb_recovery_lock="!/usr/lib64/ctdb/ctdb_mutex_ceph_rados_helper ceph client.samba.gw cephfs_metadata ctdb-mutex" ctdb_socket="/var/lib/ctdb/ctdb.socket" \ op monitor interval="10" timeout="20" \ op start interval="0" timeout="200" \ op stop interval="0" timeout="100"crm(live)configure#
primitive
nmb systemd:nmb \ op start timeout="100" interval="0" \ op stop timeout="100" interval="0" \ op monitor interval="60" timeout="100"crm(live)configure#
primitive
smb systemd:smb \ op start timeout="100" interval="0" \ op stop timeout="100" interval="0" \ op monitor interval="60" timeout="100"crm(live)configure#
group
g-ctdb ctdb nmb smbcrm(live)configure#
clone
cl-ctdb g-ctdb meta interleave="true"crm(live)configure#
commit
The binary
/usr/lib64/ctdb/ctdb_mutex_ceph_rados_helper
in the
configuration option ctdb_recovery_lock
has the
parameters CLUSTER_NAME,
CEPHX_USER,
RADOS_POOL, and
RADOS_OBJECT, in this order.
An extra lock-timeout parameter can be appended to override the default value used (10 seconds). A higher value will increase the CTDB recovery master failover time, whereas a lower value may result in the recovery master being incorrectly detected as down, triggering flapping failovers.
Add a clustered IP address:
crm(live)configure#
primitive
ip ocf:heartbeat:IPaddr2 params ip=192.168.2.1 \ unique_clone_address="true" \ op monitor interval="60" \ meta resource-stickiness="0"crm(live)configure#
clone
cl-ip ip \ meta interleave="true" clone-node-max="2" globally-unique="true"crm(live)configure#
colocation
col-with-ctdb 0: cl-ip cl-ctdbcrm(live)configure#
order
o-with-ctdb 0: cl-ip cl-ctdbcrm(live)configure#
commit
If unique_clone_address
is set to
true
, the IPaddr2 resource agent adds a clone ID to
the specified address, leading to three different IP addresses. These are
usually not needed, but help with load balancing. For further information
about this topic, see
https://documentation.suse.com/sle-ha/15-SP1/single-html/SLE-HA-guide/#cha-ha-lb.
Check the result:
root@earth #
crm
status Clone Set: base-clone [dlm] Started: [ factory-1 ] Stopped: [ factory-0 ] Clone Set: cl-ctdb [g-ctdb] Started: [ factory-1 ] Started: [ factory-0 ] Clone Set: cl-ip [ip] (unique) ip:0 (ocf:heartbeat:IPaddr2): Started factory-0 ip:1 (ocf:heartbeat:IPaddr2): Started factory-1
Test from a client machine. On a Linux client, run the following command to see if you can copy files from and to the system:
root #
smbclient
//192.168.2.1/myshare
This chapter lists content changes for this document since the initial release of SUSE Enterprise Storage 4. You can find changes related to the cluster deployment that apply to previous versions in https://documentation.suse.com/ses/5.5/single-html/ses-deployment/#ap-deploy-docupdate.
The document was updated on the following dates:
Added an important box stating that encrypted OSDs boot longer than unencrypted in Section 4.5.1.6, “Deploying Encrypted OSDs” (https://bugzilla.suse.com/show_bug.cgi?id=1124813).
Added a step to disable AppArmor profiles in Important: Software Requirements (https://bugzilla.suse.com/show_bug.cgi?id=1127297).
Added Section 10.4.4, “Authentication and Access Control” and extended Section 10.4.6, “Advanced Settings” (https://bugzilla.suse.com/show_bug.cgi?id=1114705).
Added DeepSea stage 2 to the deployment process in Procedure 12.0, “” (https://bugzilla.suse.com/show_bug.cgi?id=1119167).
Added Section 1.3, “User Privileges and Command Prompts” (https://bugzilla.suse.com/show_bug.cgi?id=1116537).
Added Section 5.4.2, “Details on the salt target ceph.maintenance.upgrade
Command”
(https://bugzilla.suse.com/show_bug.cgi?id=1104794).
Added Chapter 3, Ceph Admin Node HA Setup (Fate#325622).
Encrypted OSDs during deployment and upgrade in Section 4.5.1.6, “Deploying Encrypted OSDs” and Section 5.3, “Encrypting OSDs during Upgrade” (Fate#321665).
Cleaned the update repository sections in Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5” (https://bugzilla.suse.com/show_bug.cgi?id=1109377).
Added Section 4.5.1.5, “Overriding Default Search for Disk Devices” (https://bugzilla.suse.com/show_bug.cgi?id=1105967).
Prepended DeepSea Stage 0 execution in Procedure 5.2, “Steps to Apply to the Salt master Node” (https://bugzilla.suse.com/show_bug.cgi?id=1110440).
Fixed creation of demo users in Section 12.2, “Example Installation” (https://bugzilla.suse.com/show_bug.cgi?id=1105739).
Removed the default stage_prep_master
and
stage_prep_minion
values in
Section 7.1.5, “Updates and Reboots during Stage 0”
(https://bugzilla.suse.com/show_bug.cgi?id=1103242).
Updated various parts of Chapter 13, Exporting Ceph Data via Samba (https://bugzilla.suse.com/show_bug.cgi?id=1101478).
Added information on preconfiguring network settings by a custom
cluster.yml
in Section 4.3, “Cluster Deployment”
(https://bugzilla.suse.com/show_bug.cgi?id=1099448).
Removed spare role-igw
definitions
(https://bugzilla.suse.com/show_bug.cgi?id=1099687).
Added a tip on running a second terminal session for the monitor mode in Section 4.4.1, “DeepSea CLI: Monitor Mode” (https://bugzilla.suse.com/show_bug.cgi?id=1099453).
Non-data Object Gateway pools need to be replicated in Section 9.1.1.2, “Create Pools (Optional)” (https://bugzilla.suse.com/show_bug.cgi?id=1095743).
FQDN of all nodes must be resolvable to the public network IP. See Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1067113).
Added a tip on sharing multiple roles in Chapter 4, Deploying with DeepSea/Salt (https://bugzilla.suse.com/show_bug.cgi?id=1093824).
Added Section 2.4, “Metadata Server Nodes” (https://bugzilla.suse.com/show_bug.cgi?id=1047230).
Manually edit policy.cfg
for openATTIC
(https://bugzilla.suse.com/show_bug.cgi?id=1073331).
Recommended Section 2.2, “Monitor Nodes” (https://bugzilla.suse.com/show_bug.cgi?id=1056322).
on SSD inAdded Section 1.5, “BlueStore” and Section 2.1.3, “Recommended Size for the BlueStore's WAL and DB Device” (https://bugzilla.suse.com/show_bug.cgi?id=1072502).
Extended the deployment of encrypted OSDs in Section 4.5.1.6, “Deploying Encrypted OSDs” (https://bugzilla.suse.com/show_bug.cgi?id=1093003).
Increased the number of bytes to erase to 4M in Step 12 (https://bugzilla.suse.com/show_bug.cgi?id=1093331).
Firewall breaks DeepSea stages, in Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1090683).
Added the list of repositories in Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1088170).
Added instructions to manually add repositories using
zypper
Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5”,
Section 5.5, “Upgrade from SUSE Enterprise Storage 4 (ceph-deploy
Deployment) to 5”, and
Section 5.6, “Upgrade from SUSE Enterprise Storage 4 (Crowbar Deployment) to 5”
(https://bugzilla.suse.com/show_bug.cgi?id=1073308).
Added a list of upgrade repositories + brief explanation of the
DeepSea's upgrade_init
option in
Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5”
(https://bugzilla.suse.com/show_bug.cgi?id=1073372).
Added Section 7.1.5, “Updates and Reboots during Stage 0” (https://bugzilla.suse.com/show_bug.cgi?id=1081524).
Fixed prompts in Procedure 5.2, “Steps to Apply to the Salt master Node” (https://bugzilla.suse.com/show_bug.cgi?id=1084307).
Added Section 2.12, “SUSE Enterprise Storage 5.5 and Other SUSE Products” (https://bugzilla.suse.com/show_bug.cgi?id=1089717).
Added a note on the Object Gateway configuration sections in
Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.12 “Adjusting ceph.conf
with Custom Settings”
(https://bugzilla.suse.com/show_bug.cgi?id=1089300).
Added WAL/DB snippets to Section 2.1.2, “Minimum Disk Size” (https://bugzilla.suse.com/show_bug.cgi?id=1057797).
MONs' public addresses are calculated dynamically (https://bugzilla.suse.com/show_bug.cgi?id=1089151).
Fixed keyrings location in Section 5.5, “Upgrade from SUSE Enterprise Storage 4 (ceph-deploy
Deployment) to 5”
(https://bugzilla.suse.com/show_bug.cgi?id=1073368).
Provided several helper snippets in Section 4.5.1.2, “Role Assignment” (https://bugzilla.suse.com/show_bug.cgi?id=1061629).
Engulfing custom ceph.conf
in
Procedure 5.2, “Steps to Apply to the Salt master Node”
(https://bugzilla.suse.com/show_bug.cgi?id=1085443).
Updated RAM value recommended for BlueStore deployment in Section 2.1.1, “Minimum Requirements” (https://bugzilla.suse.com/show_bug.cgi?id=1076385)).
Added manual steps to upgrade the iSCSI Gateways after the engulf command in
Section 5.5, “Upgrade from SUSE Enterprise Storage 4 (ceph-deploy
Deployment) to 5” and
Section 5.6, “Upgrade from SUSE Enterprise Storage 4 (Crowbar Deployment) to 5”
(https://bugzilla.suse.com/show_bug.cgi?id=1073327)).
The iSCSI Gateway deployment updated to the DeepSea way in Section 10.4, “Installation and Configuration” (https://bugzilla.suse.com/show_bug.cgi?id=1073327).
CephFS quotas are not supported. See Section 11.1, “Supported CephFS Scenarios and Guidance” (https://bugzilla.suse.com/show_bug.cgi?id=1077269). CephFS quotas are not supported. See Section 11.1, “Supported CephFS Scenarios and Guidance”. (https://bugzilla.suse.com/show_bug.cgi?id=1077269))
Include partitions with higher numbers than 9 in zeroing step, see Step 12. (https://bugzilla.suse.com/show_bug.cgi?id=1050230).
Enhanced the disk wiping strategy in Step 12 of Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1073897).
Added tip on disengaging safety measures in Section 5.4.1, “OSD Migration to BlueStore” (https://bugzilla.suse.com/show_bug.cgi?id=1073720).
Referred to Section 4.2.2.1, “Matching the Minion Name” in Procedure 4.1, “Running Deployment Stages” and Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.1 “Adding New Cluster Nodes” to unify the information source (https://bugzilla.suse.com/show_bug.cgi?id=1073374).
In Section 5.2, “General Upgrade Procedure” only update Salt master and
minion and not all packages. Therefore replace salt target
state.apply ceph.updates
with salt target state.apply
ceph.updates.salt
(https://bugzilla.suse.com/show_bug.cgi?id=1073373).
Added Section 5.6, “Upgrade from SUSE Enterprise Storage 4 (Crowbar Deployment) to 5” (https://bugzilla.suse.com/show_bug.cgi?id=1073317 and https://bugzilla.suse.com/show_bug.cgi?id=1073701).
Replaced '*' with target and extended the targeting introduction in Section 4.2.2, “Targeting the Minions” (https://bugzilla.suse.com/show_bug.cgi?id=1068956).
Added verification of Salt minions' fingerprints in Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1064045).
Removed advice to copy the example refactor.conf file in Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.9 “Automated Installation via Salt” (https://bugzilla.suse.com/show_bug.cgi?id=1065926).
Fixed path to network configuration YAML file in Procedure 4.1, “Running Deployment Stages” (https://bugzilla.suse.com/show_bug.cgi?id=1067730).
Verify the cluster layout in Procedure 5.2, “Steps to Apply to the Salt master Node” (https://bugzilla.suse.com/show_bug.cgi?id=1067189).
Added ceph osd set sortbitwise
to
Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5” and
Procedure 5.2, “Steps to Apply to the Salt master Node”
(https://bugzilla.suse.com/show_bug.cgi?id=1067146).
osd crush location
is gone,
ceph.conf
is customized differently in
Important: Software Requirements
(https://bugzilla.suse.com/show_bug.cgi?id=1067381).
Fixed 'role-admin' from 'role-master' in Section 4.5.1.2, “Role Assignment” (https://bugzilla.suse.com/show_bug.cgi?id=1064056).
Fixed path to cluster.yml
in
Procedure 4.1, “Running Deployment Stages”
(https://bugzilla.suse.com/show_bug.cgi?id=1066711).
Added Section 12.3.5, “NFS Ganesha HA and DeepSea” (https://bugzilla.suse.com/show_bug.cgi?id=1058313).
Re-added Section 2.7.2, “Monitor Nodes on Different Subnets” (https://bugzilla.suse.com/show_bug.cgi?id=1050115).
The file deepsea_minions.sls
must contain only one
deepsea_minions
entry. See
Procedure 4.1, “Running Deployment Stages”.
(https://bugzilla.suse.com/show_bug.cgi?id=1065403)
Changed order of steps in first procedure of Section 4.3, “Cluster Deployment”. (https://bugzilla.suse.com/show_bug.cgi?id=1064770)
Clarified Section 5.4.1, “OSD Migration to BlueStore”. (https://bugzilla.suse.com/show_bug.cgi?id=1063250)
Added Section 5.7, “Upgrade from SUSE Enterprise Storage 3 to 5” (Fate #323072).
Removed the obsolete Crowbar
installation tool in favor
of DeepSea.
Removed the obsolete ceph-deploy
tool in favor of
DeepSea.
Updated Chapter 2, Hardware Requirements and Recommendations (https://bugzilla.suse.com/show_bug.cgi?id=1029544 and https://bugzilla.suse.com/show_bug.cgi?id=1042283).
Updated Chapter 12, Installation of NFS Ganesha (https://bugzilla.suse.com/show_bug.cgi?id=1036495, https://bugzilla.suse.com/show_bug.cgi?id=1031444, FATE#322464).
DeepSea naming schema of profiles changed. See Section 4.5.1.4, “Profile Assignment” (https://bugzilla.suse.com/show_bug.cgi?id=1046108).
CephFS can be used on EC pools, see Section 11.3.1, “Creating CephFS” (FATE#321617).
Added Section 10.5, “Exporting RADOS Block Device Images using tcmu-runner
”
(https://bugzilla.suse.com/show_bug.cgi?id=1064467).
Improved the upgrade procedure to include the openATTIC role in Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5” (https://bugzilla.suse.com/show_bug.cgi?id=1064621).
Added a reference to Procedure 4.1, “Running Deployment Stages” in Step 4 (https://bugzilla.suse.com/show_bug.cgi?id=1064276).
Modified the upgrade procedure in Procedure 5.2, “Steps to Apply to the Salt master Node” (https://bugzilla.suse.com/show_bug.cgi?id=1061608 and https://bugzilla.suse.com/show_bug.cgi?id=1048959).
Added rgw.conf
in
Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.12 “Adjusting ceph.conf
with Custom Settings”
(https://bugzilla.suse.com/show_bug.cgi?id=1062109).
Moved DeepSea installation to the very end in Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1056292).
Added Section 4.5.1.6, “Deploying Encrypted OSDs” (https://bugzilla.suse.com/show_bug.cgi?id=1061751).
Updated and simplified the upgrade procedure in Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5” (https://bugzilla.suse.com/show_bug.cgi?id=1059362).
Check for DeepSea version before upgrade in Important: Software Requirements (https://bugzilla.suse.com/show_bug.cgi?id=1059331).
Prefixing custom .sls files with custom-
in
Section 7.1, “Using Customized Configuration Files”
(https://bugzilla.suse.com/show_bug.cgi?id=1048568).
Added a note about key caps mismatch in Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5” (https://bugzilla.suse.com/show_bug.cgi?id=1054186).
Merged redundant list items in Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1055140).
Added a note about the long time the cluster upgrade may take in Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5” (https://bugzilla.suse.com/show_bug.cgi?id=1054079).
Salt minions targeting with deepsea_minions:
is mandatory
(https://bugzilla.suse.com/show_bug.cgi?id=1054229).
Inserted Stage 1 after engulfing in Procedure 5.2, “Steps to Apply to the Salt master Node” (https://bugzilla.suse.com/show_bug.cgi?id=1054155).
Added Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.12 “Adjusting ceph.conf
with Custom Settings”
(https://bugzilla.suse.com/show_bug.cgi?id=1052806).
Added missing steps in Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5” (https://bugzilla.suse.com/show_bug.cgi?id=1052597).
Fixed radosgw-admin
command syntax in
Section 12.2, “Example Installation”
(https://bugzilla.suse.com/show_bug.cgi?id=1052698).
'salt' is not the required host name of the Salt master during the upgrade in Procedure 5.1, “Steps to Apply to All Cluster Nodes (including the Calamari Node)” (https://bugzilla.suse.com/show_bug.cgi?id=1052907).
Better wording and text flow in the 'important' section of Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5” (https://bugzilla.suse.com/show_bug.cgi?id=1052147).
Added a note about manual role assignment during engulfment in Procedure 5.2, “Steps to Apply to the Salt master Node” (https://bugzilla.suse.com/show_bug.cgi?id=1050554).
Added Section 5.4.1, “OSD Migration to BlueStore” (https://bugzilla.suse.com/show_bug.cgi?id=1052210).
Explained salt-run populate.engulf_existing_cluster
in
detail in Procedure 5.2, “Steps to Apply to the Salt master Node”
(https://bugzilla.suse.com/show_bug.cgi?id=1051258).
Added openATTIC role in Section 4.5.1.8, “Example policy.cfg
File”
(https://bugzilla.suse.com/show_bug.cgi?id=1052076).
Fixed profile-default
paths in
Section 4.5.1.8, “Example policy.cfg
File”
(https://bugzilla.suse.com/show_bug.cgi?id=1051760).
Detached previous section into a new chapter Chapter 7, Customizing the Default Configuration (https://bugzilla.suse.com/show_bug.cgi?id=1050238).
Referencing to Section 1.2.3, “Ceph Nodes and Daemons” from DeepSea Stages Description to keep the list of Ceph services up-to-date (https://bugzilla.suse.com/show_bug.cgi?id=1050221).
Improved Salt master description and wording in Chapter 4, Deploying with DeepSea/Salt (https://bugzilla.suse.com/show_bug.cgi?id=1050214).
Added optional node roles description in Section 1.2.3, “Ceph Nodes and Daemons” (https://bugzilla.suse.com/show_bug.cgi?id=1050085).
Updated the upgrade procedure in general (https://bugzilla.suse.com/show_bug.cgi?id=1048436, https://bugzilla.suse.com/show_bug.cgi?id=1048959, and https://bugzilla.suse.com/show_bug.cgi?id=104i7085).
Added a new DeepSea role Ceph Manager (https://bugzilla.suse.com/show_bug.cgi?id=1047472).
Added Section 5.5, “Upgrade from SUSE Enterprise Storage 4 (ceph-deploy
Deployment) to 5”
(https://bugzilla.suse.com/show_bug.cgi?id=1048436).
Made Stage 0 fully optional in DeepSea Stages Description (https://bugzilla.suse.com/show_bug.cgi?id=1045845).
Updated the list of default pools in Section 9.1.1, “Object Gateway Configuration” (https://bugzilla.suse.com/show_bug.cgi?id=1034039).
Added an 'important' snippet about Object Gateway being deployed by DeepSea now in Chapter 9, Ceph Object Gateway (https://bugzilla.suse.com/show_bug.cgi?id=1044928).
Fixed shell script in Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1044684).
Added "Set require-osd-release luminous
Flag" to
Section 5.2, “General Upgrade Procedure”
(https://bugzilla.suse.com/show_bug.cgi?id=1040750).
Added annotation to the example policy.cfg
in
Section 4.5.1.8, “Example policy.cfg
File”
(https://bugzilla.suse.com/show_bug.cgi?id=1042691).
Improved commands for OSD disk zapping in Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1042074).
Removed advice to install salt-minion
on Salt master in
Section 4.3, “Cluster Deployment”
(https://bugzilla.suse.com/show_bug.cgi?id=1041590).
Added firewall recommendation to Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1039344).
Removed XML-RPC references from openATTIC systemd
command lines in
Section 7.1, “Using Customized Configuration Files”
(https://bugzilla.suse.com/show_bug.cgi?id=1037371).
Fixed YAML syntax in Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1035498).
Added the 'ganesha' role explanation in Section 4.5.1.2, “Role Assignment” (https://bugzilla.suse.com/show_bug.cgi?id=1037365).
Clarified and improved text flow in Section 4.5.1, “The policy.cfg
File”
(https://bugzilla.suse.com/show_bug.cgi?id=1037360).
Added the SUSE Enterprise Storage 4 to 5 upgrade procedure in Section 5.4, “Upgrade from SUSE Enterprise Storage 4 (DeepSea Deployment) to 5” (https://bugzilla.suse.com/show_bug.cgi?id=1036266).
Replaced the term 'provisioning' with 'preparation' in Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1036400 and https://bugzilla.suse.com/show_bug.cgi?id=1036492).
Added warning about advanced techniques in Section 4.5.1.7, “Item Filtering” (https://bugzilla.suse.com/show_bug.cgi?id=1036278).
Replaced redundant role-admin
assignment in
Section 4.5.1.8, “Example policy.cfg
File”
(https://bugzilla.suse.com/show_bug.cgi?id=1036506).
Improved DeepSea Stages Description and Section 4.3, “Cluster Deployment” (https://bugzilla.suse.com/show_bug.cgi?id=1036278).
Added deployment steps modifications in Section 7.1, “Using Customized Configuration Files” (https://bugzilla.suse.com/show_bug.cgi?id=1026782).
Clarified and enhanced Chapter 4, Deploying with DeepSea/Salt as suggested by (https://bugzilla.suse.com/show_bug.cgi?id=1020920).
Recommended enabling custom openATTIC services in Section 7.1, “Using Customized Configuration Files” (https://bugzilla.suse.com/show_bug.cgi?id=989349).
Moved network recommendations to Chapter 2, Hardware Requirements and Recommendations and included Section 2.7.1, “Adding a Private Network to a Running Cluster” (https://bugzilla.suse.com/show_bug.cgi?id=1026569).