Several key packages in SUSE Enterprise Storage 6 are based on the Nautilus release series of Ceph. When the Ceph project (https://github.com/ceph/ceph) publishes new point releases in the Nautilus series, SUSE Enterprise Storage 6 is updated to ensure that the product benefits from the latest upstream bugfixes and feature backports.
This chapter contains summaries of notable changes contained in each upstream point release that has been—or is planned to be—included in the product.
This release includes a security fix that ensures the
global_id
value (a numeric value that should be unique for
every authenticated client or daemon in the cluster) is reclaimed after a
network disconnect or ticket renewal in a secure fashion. Two new health
alerts may appear during the upgrade indicating that there are clients or
daemons that are not yet patched with the appropriate fix.
To temporarily mute the health alerts around insecure clients for the duration of the upgrade, you may want to run:
cephadm@adm >
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1hcephadm@adm >
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1h
When all clients are updated, enable the new secure behavior, not allowing old insecure clients to join the cluster:
cephadm@adm >
ceph config set mon auth_allow_insecure_global_id_reclaim false
For more details, refer ro https://docs.ceph.com/en/latest/security/CVE-2021-20288/.
This release fixes a regression introduced in 14.2.17 in which the manager module tries to use a couple of Python modules that do not exist in some environments.
This release fixes issues loading the dashboard and volumes manager modules in some environments.
This release includes the following fixes:
$pid
expansion in configuration paths such as
admin_socket
will now properly expand to the daemon PID
for commands like ceph-mds
or
ceph-osd
. Previously, only ceph-fuse
and rbd-nbd
expanded $pid
with the
actual daemon PID.
RADOS: PG removal has been optimized.
RADOS: Memory allocations are tracked in finer detail in BlueStore and
displayed as a part of the dump_mempools
command.
CephFS: clients which acquire capabilities too quickly are throttled to
prevent instability. See new config option
mds_session_cap_acquisition_throttle
to control this
behavior.
This release fixes a security flaw in CephFS.
CVE-2020-27781 : OpenStack Manila use of
ceph_volume_client.py
library allowed tenant access to
any Ceph credentials' secret.
This release fixes a ceph-volume regression introduced in v14.2.13 and includes few other fixes.
ceph-volume: Fixes lvm batch –auto
, which breaks
backward compatibility when using non rotational devices only (SSD and/or
NVMe).
BlueStore: Fixes a bug in collection_list_legacy
which
makes PGs inconsistent during scrub when running OSDs older than 14.2.12
with newer ones.
MGR: progress module can now be turned on or off, using the commands
ceph progress on
and ceph progress
off
.
This releases fixes a security flaw affecting Messenger V2 for Octopus and Nautilus, among other fixes across components.
CVE 2020-25660: Fix a regression in Messenger V2 replay attacks.
This release fixes a regression introduced in v14.2.12, and a few ceph-volume amd RGW fixes.
Fixed a regression that caused breakage in clusters that referred to
ceph-mon hosts using dns names instead of IP addresses in the
mon_host
parameter in ceph.conf
.
ceph-volume: the lvm batch
subcommand received a major
rewrite.
In addition to bug fixes, this major upstream release brought a number of notable changes:
The ceph df command
now lists the number of PGs in each
pool.
MONs now have a config option mon_osd_warn_num_repaired
,
10 by default. If any OSD has repaired more than this many I/O errors in
stored data, a OSD_TOO_MANY_REPAIRS
health warning is
generated. In order to allow clearing of the warning, a new command
ceph tell osd.SERVICE_ID
clear_shards_repaired COUNT
has been
added. By default, it will set the repair count to 0. If you want to be
warned again if additional repairs are performed, you can provide a value
to the command and specify the value of
mon_osd_warn_num_repaired
. This command will be replaced
in future releases by the health mute/unmute feature.
It is now possible to specify the initial MON to contact for Ceph tools
and daemons using the mon_host_override config
option or
--mon-host-override IP
command-line switch. This generally should only be used for debugging and
only affects initial communication with Ceph’s MON cluster.
Fix an issue with osdmaps not being trimmed in a healthy cluster.
In addition to bug fixes, this major upstream release brought a number of notable changes:
RGW: The radosgw-admin
sub-commands dealing with orphans
– radosgw-admin orphans find
, radosgw-admin
orphans finish
, radosgw-admin orphans
list-jobs
– have been deprecated. They have not been actively
maintained and they store intermediate results on the cluster, which could
fill a nearly-full cluster. They have been replaced by a tool, currently
considered experimental, rgw-orphan-list
.
Now, when noscrub
and/or nodeep-scrub
flags are set globally or per pool, scheduled scrubs of the type disabled
will be aborted. All user initiated scrubs are not
interrupted.
Fixed a ceph-osd crash in committed OSD maps when there is a failure to encode the first incremental map.
This upstream release patched one security flaw:
CVE-2020-10753: rgw: sanitize newlines in s3 CORSConfiguration’s ExposeHeader
In addition to security flaws, this major upstream release brought a number of notable changes:
The pool parameter target_size_ratio
, used by the PG
autoscaler, has changed meaning. It is now normalized across pools, rather
than specifying an absolute ratio. If you have set target size ratios on
any pools, you may want to set these pools to autoscale
warn
mode to avoid data movement during the upgrade:
ceph osd pool set POOL_NAME pg_autoscale_mode warn
The behaviour of the -o
argument to the RADOS tool has
been reverted to its original behaviour of indicating an output file. This
reverts it to a more consistent behaviour when compared to other tools.
Specifying object size is now accomplished by using an upper case O
-O
.
The format of MDSs in ceph fs dump
has changed.
Ceph will issue a health warning if a RADOS pool’s
size
is set to 1 or, in other words, the pool is
configured with no redundancy. This can be fixed by setting the pool size
to the minimum recommended value with:
cephadm@adm >
ceph osd pool set pool-name size num-replicas
The warning can be silenced with:
cephadm@adm >
ceph config set global mon_warn_on_pool_no_redundancy false
RGW: bucket listing performance on sharded bucket indexes has been notably improved by heuristically – and significantly, in many cases – reducing the number of entries requested from each bucket index shard.
This upstream release patched two security flaws:
CVE-2020-1759: Fixed nonce reuse in msgr V2 secure mode
CVE-2020-1760: Fixed XSS due to RGW GetObject header-splitting
In SES 6, these flaws were patched in Ceph version 14.2.5.389+gb0f23ac248.
In addition to bug fixes, this major upstream release brought a number of notable changes:
The default value of bluestore_min_alloc_size_ssd
has been
changed to 4K to improve performance across all workloads.
The following OSD memory config options related to BlueStore cache autotuning can now be configured during runtime:
osd_memory_base (default: 768 MB) osd_memory_cache_min (default: 128 MB) osd_memory_expected_fragmentation (default: 0.15) osd_memory_target (default: 4 GB)
You can set the above options by running:
cephadm@adm >
ceph config set osd OPTION VALUE
The Ceph Manager now accepts profile rbd
and profile
rbd-read-only
user capabilities. You can use these capabilities
to provide users access to MGR-based RBD functionality such as rbd
perf image iostat
and rbd perf image iotop
.
The configuration value osd_calc_pg_upmaps_max_stddev
used
for upmap balancing has been removed. Instead, use the Ceph Manager balancer
configuration option upmap_max_deviation
which now is an
integer number of PGs of deviation from the target PGs per OSD. You can set
it with a following command:
cephadm@adm >
ceph config set mgr mgr/balancer/upmap_max_deviation 2
The default upmap_max_deviation
is 5. There are situations
where crush rules would not allow a pool to ever have completely balanced
PGs. For example, if crush requires 1 replica on each of 3 racks, but there
are fewer OSDs in 1 of the racks. In those cases, the configuration value
can be increased.
CephFS: multiple active Metadata Server forward scrub is now rejected. Scrub is
currently only permitted on a file system with a single rank. Reduce the
ranks to one via ceph fs set FS_NAME
max_mds 1
.
Ceph will now issue a health warning if a RADOS pool has a
pg_num
value that is not a power of two. This can be fixed
by adjusting the pool to an adjacent power of two:
cephadm@adm >
ceph osd pool set POOL_NAME pg_num NEW_PG_NUM
Alternatively, you can silence the warning with:
cephadm@adm >
ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
This upstream release patched two security flaws:
CVE-2020-1699: a path traversal flaw in Ceph Dashboard that could allow for potential information disclosure.
CVE-2020-1700: a flaw in the RGW beast front-end that could lead to denial of service from an unauthenticated client.
In SES 6, these flaws were patched in Ceph version 14.2.5.382+g8881d33957b.
This release fixed a Ceph Manager bug that caused MGRs becoming unresponsive on larger clusters. SES users were never exposed to the bug.
Health warnings are now issued if daemons have recently crashed. Ceph will now issue health warnings if daemons have recently crashed. Ceph has been collecting crash reports since the initial Nautilus release, but the health alerts are new. To view new crashes (or all crashes, if you have just upgraded), run:
cephadm@adm >
ceph crash ls-new
To acknowledge a particular crash (or all crashes) and silence the health warning, run:
cephadm@adm >
ceph crash archive CRASH-IDcephadm@adm >
ceph crash archive-all
pg_num
must be a power of two,
otherwise HEALTH_WARN
is reported. Ceph
will now issue a health warning if a RADOS pool has a
pg_num
value that is not a power of two. You can fix this
by adjusting the pool to a nearby power of two:
cephadm@adm >
ceph osd pool set POOL-NAME pg_num NEW-PG-NUM
Alternatively, you can silence the warning with:
cephadm@adm >
ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
Pool size needs to be greater than 1 otherwise
HEALTH_WARN
is reported. Ceph will issue a
health warning if a RADOS pool’s size is set to 1 or if the pool is
configured with no redundancy. Ceph will stop issuing the warning if the
pool size is set to the minimum recommended value:
cephadm@adm >
ceph osd pool set POOL-NAME size NUM-REPLICAS
You can silence the warning with:
cephadm@adm >
ceph config set global mon_warn_on_pool_no_redundancy false
Health warning is reported if average OSD heartbeat ping time exceeds the threshold. A health warning is now generated if the average OSD heartbeat ping time exceeds a configurable threshold for any of the intervals computed. The OSD computes 1 minute, 5 minute and 15 minute intervals with average, minimum, and maximum values.
A new configuration option, mon_warn_on_slow_ping_ratio
,
specifies a percentage of osd_heartbeat_grace
to determine
the threshold. A value of zero disables the warning.
A new configuration option, mon_warn_on_slow_ping_time
,
specified in milliseconds, overrides the computed value and causes a
warning when OSD heartbeat pings take longer than the specified amount.
A new command ceph daemon
mgr.MGR-NUMBER dump_osd_network
THRESHOLD
lists all connections with a
ping time longer than the specified threshold or value determined by the
configuration options, for the average for any of the 3 intervals.
A new command ceph daemon osd.# dump_osd_network
THRESHOLD
will do the same as the
previous one but only including heartbeats initiated by the specified OSD.
Changes in the telemetry MGR module.
A new 'device' channel (enabled by default) will report anonymized hard
disk and SSD health metrics to telemetry.ceph.com
in
order to build and improve device failure prediction algorithms.
Telemetry reports information about CephFS file systems, including:
How many MDS daemons (in total and per file system).
Which features are (or have been) enabled.
How many data pools.
Approximate file system age (year and the month of creation).
How many files, bytes, and snapshots.
How much metadata is being cached.
Other miscellaneous information:
Which Ceph release the monitors are running.
Whether msgr v1 or v2 addresses are used for the monitors.
Whether IPv4 or IPv6 addresses are used for the monitors.
Whether RADOS cache tiering is enabled (and the mode).
Whether pools are replicated or erasure coded, and which erasure code profile plug-in and parameters are in use.
How many hosts are in the cluster, and how many hosts have each type of daemon.
Whether a separate OSD cluster network is being used.
How many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled.
How many RGW daemons, zones, and zonegroups are present and which RGW frontends are in use.
Aggregate stats about the CRUSH Map, such as which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use.
If you had telemetry enabled before 14.2.5, you will need to re-opt-in with:
cephadm@adm >
ceph telemetry on
If you are not comfortable sharing device metrics, you can disable that channel first before re-opting-in:
cephadm@adm >
ceph config set mgr mgr/telemetry/channel_device falsecephadm@adm >
ceph telemetry on
You can view exactly what information will be reported first with:
cephadm@adm >
ceph telemetry show # see everythingcephadm@adm >
ceph telemetry show device # just the device infocephadm@adm >
ceph telemetry show basic # basic cluster info
New OSD daemon command
dump_recovery_reservations
. It reveals the
recovery locks held (in_progress
) and waiting in priority
queues. Usage:
cephadm@adm >
ceph daemon osd.ID dump_recovery_reservations
New OSD daemon command
dump_scrub_reservations
. It reveals the
scrub reservations that are held for local (primary) and remote (replica)
PGs. Usage:
cephadm@adm >
ceph daemon osd.ID dump_scrub_reservations
RGW now supports S3 Object Lock set of APIs. RGW now supports S3 Object Lock set of APIs allowing for a WORM model for storing objects. 6 new APIs have been added PUT/GET bucket object lock, PUT/GET object retention, PUT/GET object legal hold.
RGW now supports List Objects V2. RGW now supports List Objects V2 as specified at https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html.
This point release fixes a serious regression that found its way into the 14.2.3 point release. This regression did not affect SUSE Enterprise Storage customers because we did not ship a version based on 14.2.3.
Fixed a denial of service vulnerability where an unauthenticated client of Ceph Object Gateway could trigger a crash from an uncaught exception.
Nautilus-based librbd clients can now open images on Jewel clusters.
The Object Gateway num_rados_handles
has been removed. If you were
using a value of num_rados_handles
greater than 1,
multiply your current objecter_inflight_ops
and
objecter_inflight_op_bytes
parameters by the old
num_rados_handles
to get the same throttle behavior.
The secure mode of Messenger v2 protocol is no longer experimental with this release. This mode is now the preferred mode of connection for monitors.
osd_deep_scrub_large_omap_object_key_threshold
has been
lowered to detect an object with a large number of omap keys more easily.
The Ceph Dashboard now supports silencing Prometheus notifications.
The no{up,down,in,out}
related commands have been
revamped. There are now two ways to set the
no{up,down,in,out}
flags: the old command
ceph osd [un]set FLAG
which sets cluster-wide flags; and the new command
ceph osd [un]set-group FLAGS WHO
which sets flags in batch at the granularity of any crush node or device class.
radosgw-admin
introduces two subcommands that allow the
managing of expire-stale objects that might be left behind after a bucket
reshard in earlier versions of Object Gateway. Expire-stale objects are expired
objects that should have been automatically erased but still exist and need
to be listed and removed manually. One subcommand lists such objects and
the other deletes them.
Earlier Nautilus releases (14.2.1 and 14.2.0) have an issue where
deploying a single new Nautilus BlueStore OSD on an upgraded cluster
(i.e. one that was originally deployed pre-Nautilus) breaks the pool
utilization statistics reported by ceph df
. Until all
OSDs have been reprovisioned or updated (via ceph-bluestore-tool
repair
), the pool statistics will show values that are lower than
the true value. This is resolved in 14.2.2, such that the cluster only
switches to using the more accurate per-pool stats after
all OSDs are 14.2.2 or later, are Block Storage, and
have been updated via the repair function if they were created prior to
Nautilus.
The default value for mon_crush_min_required_version
has
been changed from firefly
to hammer
,
which means the cluster will issue a health warning if your CRUSH tunables
are older than Hammer. There is generally a small (but non-zero) amount of
data that will be re-balanced after making the switch to Hammer tunables.
If possible, we recommend that you set the oldest allowed client to
hammer
or later. To display what the current oldest
allowed client is, run:
cephadm@adm >
ceph osd dump | grep min_compat_client
If the current value is older than hammer
, run the
following command to determine whether it is safe to make this change by
verifying that there are no clients older than Hammer currently connected
to the cluster:
cephadm@adm >
ceph features
The newer straw2
CRUSH bucket type was introduced in
Hammer. If you verify that all clients are Hammer or newer, it allows new
features only supported for straw2
buckets to be used,
including the crush-compat
mode for the Balancer
(Section 21.1, “Balancer”).
Find detailed information about the patch at https://download.suse.com/Download?buildid=D38A7mekBz4~
This was the first point release following the original Nautilus release (14.2.0). The original ('General Availability' or 'GA') version of SUSE Enterprise Storage 6 was based on this point release.