Applies to SUSE Enterprise Storage 7

A Ceph maintenance updates based on upstream 'Octopus' point releases #

Several key packages in SUSE Enterprise Storage 7 are based on the Octopus release series of Ceph. When the Ceph project (https://github.com/ceph/ceph) publishes new point releases in the Octopus series, SUSE Enterprise Storage 7 is updated to ensure that the product benefits from the latest upstream bug fixes and feature backports.

This chapter contains summaries of notable changes contained in each upstream point release that has been—or is planned to be—included in the product.

Octopus 15.2.11 Point Release#

This release includes a security fix that ensures the global_id value (a numeric value that should be unique for every authenticated client or daemon in the cluster) is reclaimed after a network disconnect or ticket renewal in a secure fashion. Two new health alerts may appear during the upgrade indicating that there are clients or daemons that are not yet patched with the appropriate fix.

To temporarily mute the health alerts around insecure clients for the duration of the upgrade, you may want to run:

cephuser@adm > ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1h
cephuser@adm > ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1h

When all clients are updated, enable the new secure behavior, not allowing old insecure clients to join the cluster:

cephuser@adm > ceph config set mon auth_allow_insecure_global_id_reclaim false

For more details, refer ro https://docs.ceph.com/en/latest/security/CVE-2021-20288/.

Octopus 15.2.10 Point Release#

This backport release includes the following fixes:

The containers include an updated tcmalloc that avoids crashes seen on 15.2.9.
RADOS: BlueStore handling of huge (>4GB) writes from RocksDB to BlueFS has been fixed.

When upgrading from a previous cephadm release, systemctl may hang when trying to start or restart the monitoring containers. This is caused by a change in the systemd unit to use type=forking.) After the upgrade, please run:

cephuser@adm > ceph orch redeploy nfs
cephuser@adm > ceph orch redeploy iscsi
cephuser@adm > ceph orch redeploy node-exporter
cephuser@adm > ceph orch redeploy prometheus
cephuser@adm > ceph orch redeploy grafana
cephuser@adm > ceph orch redeploy alertmanager

Octopus 15.2.9 Point Release#

This backport release includes the following fixes:

MGR: progress module can now be turned on/off, using the commands: ceph progress on and ceph progress off.
OSD: PG removal has been optimized in this release.

Octopus 15.2.8 Point Release#

This release fixes a security flaw in CephFS and includes a number of bug fixes:

OpenStack Manila use of ceph_volume_client.py library allowed tenant access to any Ceph credential’s secret.
ceph-volume: The lvm batch subcommand received a major rewrite. This closed a number of bugs and improves usability in terms of size specification and calculation, as well as idempotency behaviour and disk replacement process. Please refer to https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/ for more detailed information.
MON: The cluster log now logs health detail every mon_health_to_clog_interval, which has been changed from 1hr to 10min. Logging of health detail will be skipped if there is no change in health summary since last known.
The ceph df command now lists the number of PGs in each pool.
The bluefs_preextend_wal_files option has been removed.
It is now possible to specify the initial monitor to contact for Ceph tools and daemons using the mon_host_override config option or --mon-host-override command line switch. This generally should only be used for debugging and only affects initial communication with Ceph's monitor cluster.

Octopus 15.2.7 Point Release#

This release fixes a serious bug in RGW that has been shown to cause data loss when a read of a large RGW object (for example, one with at least one tail segment) takes longer than one half the time specified in the configuration option rgw_gc_obj_min_wait. The bug causes the tail segments of that read object to be added to the RGW garbage collection queue, which will in turn cause them to be deleted after a period of time.

Octopus 15.2.6 Point Release#

This releases fixes a security flaw affecting Messenger V2 for Octopus and Nautilus.

Octopus 15.2.5 Point Release#

The Octopus point release 15.2.5 brought the following fixes and other changes:

CephFS: Automatic static sub-tree partitioning policies may now be configured using the new distributed and random ephemeral pinning extended attributes on directories. See the following documentation for more information: https://docs.ceph.com/docs/master/cephfs/multimds/
Monitors now have a configuration option mon_osd_warn_num_repaired, which is set to 10 by default. If any OSD has repaired more than this many I/O errors in stored data a OSD_TOO_MANY_REPAIRS health warning is generated.
Now, when no scrub and/or no deep-scrub flags are set globally or per pool, scheduled scrubs of the type disabled will be aborted. All user initiated scrubs are NOT interrupted.
Fixed an issue with osdmaps not being trimmed in a healthy cluster.

Octopus 15.2.4 Point Release#

The Octopus point release 15.2.4 brought the following fixes and other changes:

CVE-2020-10753: rgw: sanitize newlines in s3 CORSConfiguration’s ExposeHeader
Object Gateway: The radosgw-admin sub-commands dealing with orphans—radosgw-admin orphans find, radosgw-admin orphans finish, and radosgw-admin orphans list-jobs—have been deprecated. They had not been actively maintained, and since they store intermediate results on the cluster, they could potentially fill a nearly-full cluster. They have been replaced by a tool, rgw-orphan-list, which is currently considered experimental.
RBD: The name of the RBD pool object that is used to store RBD trash purge schedule is changed from rbd_trash_trash_purge_schedule to rbd_trash_purge_schedule. Users that have already started using RBD trash purge schedule functionality and have per pool or name space schedules configured should copy the rbd_trash_trash_purge_schedule object to rbd_trash_purge_schedule before the upgrade and remove rbd_trash_purge_schedule using the following commands in every RBD pool and name space where a trash purge schedule was previously configured:
```
rados -p pool-name [-N namespace] cp rbd_trash_trash_purge_schedule rbd_trash_purge_schedule
rados -p pool-name [-N namespace] rm rbd_trash_trash_purge_schedule
```
Alternatively, use any other convenient way to restore the schedule after the upgrade.

Octopus 15.2.3 Point Release#

The Octopus point release 15.2.3 was a hot-fix release to address an issue where WAL corruption was seen when bluefs_preextend_wal_files and bluefs_buffered_io were enabled at the same time. The fix in 15.2.3 is only a temporary measure (changing the default value of bluefs_preextend_wal_files to false). The permanent fix will be to remove the bluefs_preextend_wal_files option completely: this fix will most likely arrive in the 15.2.6 point release.

Octopus 15.2.2 Point Release#

The Octopus point release 15.2.2 patched one security vulnerability:

CVE-2020-10736: Fixed an authorization bypass in MONs and MGRs

Octopus 15.2.1 Point Release#

The Octopus point release 15.2.1 fixed an issue where upgrading quickly from Luminous (SES5.5) to Nautilus (SES6) to Octopus (SES7) caused OSDs to crash. In addition, it patched two security vulnerabilities that were present in the initial Octopus (15.2.0) release:

CVE-2020-1759: Fixed nonce reuse in msgr V2 secure mode
CVE-2020-1760: Fixed XSS because of RGW GetObject header-splitting