Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE Enterprise Storage 6

A Ceph Maintenance Updates Based on Upstream 'Nautilus' Point Releases Edit source

Several key packages in SUSE Enterprise Storage 6 are based on the Nautilus release series of Ceph. When the Ceph project (https://github.com/ceph/ceph) publishes new point releases in the Nautilus series, SUSE Enterprise Storage 6 is updated to ensure that the product benefits from the latest upstream bugfixes and feature backports.

This chapter contains summaries of notable changes contained in each upstream point release that has been—or is planned to be—included in the product.

Nautilus 14.2.20 Point ReleaseEdit source

This release includes a security fix that ensures the global_id value (a numeric value that should be unique for every authenticated client or daemon in the cluster) is reclaimed after a network disconnect or ticket renewal in a secure fashion. Two new health alerts may appear during the upgrade indicating that there are clients or daemons that are not yet patched with the appropriate fix.

To temporarily mute the health alerts around insecure clients for the duration of the upgrade, you may want to run:

cephadm@adm > ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1h
cephadm@adm > ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1h

When all clients are updated, enable the new secure behavior, not allowing old insecure clients to join the cluster:

cephadm@adm > ceph config set mon auth_allow_insecure_global_id_reclaim false

For more details, refer ro https://docs.ceph.com/en/latest/security/CVE-2021-20288/.

Nautilus 14.2.18 Point ReleaseEdit source

This release fixes a regression introduced in 14.2.17 in which the manager module tries to use a couple of Python modules that do not exist in some environments.

  • This release fixes issues loading the dashboard and volumes manager modules in some environments.

Nautilus 14.2.17 Point ReleaseEdit source

This release includes the following fixes:

  • $pid expansion in configuration paths such as admin_socket will now properly expand to the daemon PID for commands like ceph-mds or ceph-osd. Previously, only ceph-fuse and rbd-nbd expanded $pid with the actual daemon PID.

  • RADOS: PG removal has been optimized.

  • RADOS: Memory allocations are tracked in finer detail in BlueStore and displayed as a part of the dump_mempools command.

  • CephFS: clients which acquire capabilities too quickly are throttled to prevent instability. See new config option mds_session_cap_acquisition_throttle to control this behavior.

Nautilus 14.2.16 Point ReleaseEdit source

This release fixes a security flaw in CephFS.

  • CVE-2020-27781 : OpenStack Manila use of ceph_volume_client.py library allowed tenant access to any Ceph credentials' secret.

Nautilus 14.2.15 Point ReleaseEdit source

This release fixes a ceph-volume regression introduced in v14.2.13 and includes few other fixes.

  • ceph-volume: Fixes lvm batch –auto, which breaks backward compatibility when using non rotational devices only (SSD and/or NVMe).

  • BlueStore: Fixes a bug in collection_list_legacy which makes PGs inconsistent during scrub when running OSDs older than 14.2.12 with newer ones.

  • MGR: progress module can now be turned on or off, using the commands ceph progress on and ceph progress off.

Nautilus 14.2.14 Point ReleaseEdit source

This releases fixes a security flaw affecting Messenger V2 for Octopus and Nautilus, among other fixes across components.

  • CVE 2020-25660: Fix a regression in Messenger V2 replay attacks.

Nautilus 14.2.13 Point ReleaseEdit source

This release fixes a regression introduced in v14.2.12, and a few ceph-volume amd RGW fixes.

  • Fixed a regression that caused breakage in clusters that referred to ceph-mon hosts using dns names instead of IP addresses in the mon_host parameter in ceph.conf.

  • ceph-volume: the lvm batch subcommand received a major rewrite.

Nautilus 14.2.12 Point ReleaseEdit source

In addition to bug fixes, this major upstream release brought a number of notable changes:

  • The ceph df command now lists the number of PGs in each pool.

  • MONs now have a config option mon_osd_warn_num_repaired, 10 by default. If any OSD has repaired more than this many I/O errors in stored data, a OSD_TOO_MANY_REPAIRS health warning is generated. In order to allow clearing of the warning, a new command ceph tell osd.SERVICE_ID clear_shards_repaired COUNT has been added. By default, it will set the repair count to 0. If you want to be warned again if additional repairs are performed, you can provide a value to the command and specify the value of mon_osd_warn_num_repaired. This command will be replaced in future releases by the health mute/unmute feature.

  • It is now possible to specify the initial MON to contact for Ceph tools and daemons using the mon_host_override config option or --mon-host-override IP command-line switch. This generally should only be used for debugging and only affects initial communication with Ceph’s MON cluster.

  • Fix an issue with osdmaps not being trimmed in a healthy cluster.

Nautilus 14.2.11 Point ReleaseEdit source

In addition to bug fixes, this major upstream release brought a number of notable changes:

  • RGW: The radosgw-admin sub-commands dealing with orphans – radosgw-admin orphans find, radosgw-admin orphans finish, radosgw-admin orphans list-jobs – have been deprecated. They have not been actively maintained and they store intermediate results on the cluster, which could fill a nearly-full cluster. They have been replaced by a tool, currently considered experimental, rgw-orphan-list.

  • Now, when noscrub and/or nodeep-scrub flags are set globally or per pool, scheduled scrubs of the type disabled will be aborted. All user initiated scrubs are not interrupted.

  • Fixed a ceph-osd crash in committed OSD maps when there is a failure to encode the first incremental map.

Nautilus 14.2.10 Point ReleaseEdit source

This upstream release patched one security flaw:

  • CVE-2020-10753: rgw: sanitize newlines in s3 CORSConfiguration’s ExposeHeader

In addition to security flaws, this major upstream release brought a number of notable changes:

  • The pool parameter target_size_ratio, used by the PG autoscaler, has changed meaning. It is now normalized across pools, rather than specifying an absolute ratio. If you have set target size ratios on any pools, you may want to set these pools to autoscale warn mode to avoid data movement during the upgrade:

    ceph osd pool set POOL_NAME pg_autoscale_mode warn
  • The behaviour of the -o argument to the RADOS tool has been reverted to its original behaviour of indicating an output file. This reverts it to a more consistent behaviour when compared to other tools. Specifying object size is now accomplished by using an upper case O -O.

  • The format of MDSs in ceph fs dump has changed.

  • Ceph will issue a health warning if a RADOS pool’s size is set to 1 or, in other words, the pool is configured with no redundancy. This can be fixed by setting the pool size to the minimum recommended value with:

    cephadm@adm > ceph osd pool set pool-name size num-replicas

    The warning can be silenced with:

    cephadm@adm > ceph config set global mon_warn_on_pool_no_redundancy false
  • RGW: bucket listing performance on sharded bucket indexes has been notably improved by heuristically – and significantly, in many cases – reducing the number of entries requested from each bucket index shard.

Nautilus 14.2.9 Point ReleaseEdit source

This upstream release patched two security flaws:

  • CVE-2020-1759: Fixed nonce reuse in msgr V2 secure mode

  • CVE-2020-1760: Fixed XSS due to RGW GetObject header-splitting

In SES 6, these flaws were patched in Ceph version 14.2.5.389+gb0f23ac248.

Nautilus 14.2.8 Point ReleaseEdit source

In addition to bug fixes, this major upstream release brought a number of notable changes:

  • The default value of bluestore_min_alloc_size_ssd has been changed to 4K to improve performance across all workloads.

  • The following OSD memory config options related to BlueStore cache autotuning can now be configured during runtime:

    osd_memory_base (default: 768 MB)
    osd_memory_cache_min (default: 128 MB)
    osd_memory_expected_fragmentation (default: 0.15)
    osd_memory_target (default: 4 GB)

    You can set the above options by running:

    cephadm@adm > ceph config set osd OPTION VALUE
  • The Ceph Manager now accepts profile rbd and profile rbd-read-only user capabilities. You can use these capabilities to provide users access to MGR-based RBD functionality such as rbd perf image iostat and rbd perf image iotop.

  • The configuration value osd_calc_pg_upmaps_max_stddev used for upmap balancing has been removed. Instead, use the Ceph Manager balancer configuration option upmap_max_deviation which now is an integer number of PGs of deviation from the target PGs per OSD. You can set it with a following command:

    cephadm@adm > ceph config set mgr mgr/balancer/upmap_max_deviation 2

    The default upmap_max_deviation is 5. There are situations where crush rules would not allow a pool to ever have completely balanced PGs. For example, if crush requires 1 replica on each of 3 racks, but there are fewer OSDs in 1 of the racks. In those cases, the configuration value can be increased.

  • CephFS: multiple active Metadata Server forward scrub is now rejected. Scrub is currently only permitted on a file system with a single rank. Reduce the ranks to one via ceph fs set FS_NAME max_mds 1.

  • Ceph will now issue a health warning if a RADOS pool has a pg_num value that is not a power of two. This can be fixed by adjusting the pool to an adjacent power of two:

    cephadm@adm > ceph osd pool set POOL_NAME pg_num NEW_PG_NUM

    Alternatively, you can silence the warning with:

    cephadm@adm > ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false

Nautilus 14.2.7 Point ReleaseEdit source

This upstream release patched two security flaws:

  • CVE-2020-1699: a path traversal flaw in Ceph Dashboard that could allow for potential information disclosure.

  • CVE-2020-1700: a flaw in the RGW beast front-end that could lead to denial of service from an unauthenticated client.

In SES 6, these flaws were patched in Ceph version 14.2.5.382+g8881d33957b.

Nautilus 14.2.6 Point ReleaseEdit source

This release fixed a Ceph Manager bug that caused MGRs becoming unresponsive on larger clusters. SES users were never exposed to the bug.

Nautilus 14.2.5 Point ReleaseEdit source

  • Health warnings are now issued if daemons have recently crashed. Ceph will now issue health warnings if daemons have recently crashed. Ceph has been collecting crash reports since the initial Nautilus release, but the health alerts are new. To view new crashes (or all crashes, if you have just upgraded), run:

    cephadm@adm > ceph crash ls-new

    To acknowledge a particular crash (or all crashes) and silence the health warning, run:

    cephadm@adm > ceph crash archive CRASH-ID
    cephadm@adm > ceph crash archive-all
  • pg_num must be a power of two, otherwise HEALTH_WARN is reported. Ceph will now issue a health warning if a RADOS pool has a pg_num value that is not a power of two. You can fix this by adjusting the pool to a nearby power of two:

    cephadm@adm > ceph osd pool set POOL-NAME pg_num NEW-PG-NUM

    Alternatively, you can silence the warning with:

    cephadm@adm > ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
  • Pool size needs to be greater than 1 otherwise HEALTH_WARN is reported. Ceph will issue a health warning if a RADOS pool’s size is set to 1 or if the pool is configured with no redundancy. Ceph will stop issuing the warning if the pool size is set to the minimum recommended value:

    cephadm@adm > ceph osd pool set POOL-NAME size NUM-REPLICAS

    You can silence the warning with:

    cephadm@adm > ceph config set global mon_warn_on_pool_no_redundancy false
  • Health warning is reported if average OSD heartbeat ping time exceeds the threshold. A health warning is now generated if the average OSD heartbeat ping time exceeds a configurable threshold for any of the intervals computed. The OSD computes 1 minute, 5 minute and 15 minute intervals with average, minimum, and maximum values.

    A new configuration option, mon_warn_on_slow_ping_ratio, specifies a percentage of osd_heartbeat_grace to determine the threshold. A value of zero disables the warning.

    A new configuration option, mon_warn_on_slow_ping_time, specified in milliseconds, overrides the computed value and causes a warning when OSD heartbeat pings take longer than the specified amount.

    A new command ceph daemon mgr.MGR-NUMBER dump_osd_network THRESHOLD lists all connections with a ping time longer than the specified threshold or value determined by the configuration options, for the average for any of the 3 intervals.

    A new command ceph daemon osd.# dump_osd_network THRESHOLD will do the same as the previous one but only including heartbeats initiated by the specified OSD.

  • Changes in the telemetry MGR module.

    A new 'device' channel (enabled by default) will report anonymized hard disk and SSD health metrics to telemetry.ceph.com in order to build and improve device failure prediction algorithms.

    Telemetry reports information about CephFS file systems, including:

    • How many MDS daemons (in total and per file system).

    • Which features are (or have been) enabled.

    • How many data pools.

    • Approximate file system age (year and the month of creation).

    • How many files, bytes, and snapshots.

    • How much metadata is being cached.

    Other miscellaneous information:

    • Which Ceph release the monitors are running.

    • Whether msgr v1 or v2 addresses are used for the monitors.

    • Whether IPv4 or IPv6 addresses are used for the monitors.

    • Whether RADOS cache tiering is enabled (and the mode).

    • Whether pools are replicated or erasure coded, and which erasure code profile plug-in and parameters are in use.

    • How many hosts are in the cluster, and how many hosts have each type of daemon.

    • Whether a separate OSD cluster network is being used.

    • How many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled.

    • How many RGW daemons, zones, and zonegroups are present and which RGW frontends are in use.

    • Aggregate stats about the CRUSH Map, such as which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use.

    If you had telemetry enabled before 14.2.5, you will need to re-opt-in with:

    cephadm@adm > ceph telemetry on

    If you are not comfortable sharing device metrics, you can disable that channel first before re-opting-in:

    cephadm@adm > ceph config set mgr mgr/telemetry/channel_device false
    cephadm@adm > ceph telemetry on

    You can view exactly what information will be reported first with:

    cephadm@adm > ceph telemetry show        # see everything
    cephadm@adm > ceph telemetry show device # just the device info
    cephadm@adm > ceph telemetry show basic  # basic cluster info
  • New OSD daemon command dump_recovery_reservations. It reveals the recovery locks held (in_progress) and waiting in priority queues. Usage:

    cephadm@adm > ceph daemon osd.ID dump_recovery_reservations
  • New OSD daemon command dump_scrub_reservations. It reveals the scrub reservations that are held for local (primary) and remote (replica) PGs. Usage:

    cephadm@adm > ceph daemon osd.ID dump_scrub_reservations
  • RGW now supports S3 Object Lock set of APIs. RGW now supports S3 Object Lock set of APIs allowing for a WORM model for storing objects. 6 new APIs have been added PUT/GET bucket object lock, PUT/GET object retention, PUT/GET object legal hold.

  • RGW now supports List Objects V2. RGW now supports List Objects V2 as specified at https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html.

Nautilus 14.2.4 Point ReleaseEdit source

This point release fixes a serious regression that found its way into the 14.2.3 point release. This regression did not affect SUSE Enterprise Storage customers because we did not ship a version based on 14.2.3.

Nautilus 14.2.3 Point ReleaseEdit source

  • Fixed a denial of service vulnerability where an unauthenticated client of Ceph Object Gateway could trigger a crash from an uncaught exception.

  • Nautilus-based librbd clients can now open images on Jewel clusters.

  • The Object Gateway num_rados_handles has been removed. If you were using a value of num_rados_handles greater than 1, multiply your current objecter_inflight_ops and objecter_inflight_op_bytes parameters by the old num_rados_handles to get the same throttle behavior.

  • The secure mode of Messenger v2 protocol is no longer experimental with this release. This mode is now the preferred mode of connection for monitors.

  • osd_deep_scrub_large_omap_object_key_threshold has been lowered to detect an object with a large number of omap keys more easily.

  • The Ceph Dashboard now supports silencing Prometheus notifications.

Nautilus 14.2.2 Point ReleaseEdit source

  • The no{up,down,in,out} related commands have been revamped. There are now two ways to set the no{up,down,in,out} flags: the old command

    ceph osd [un]set FLAG

    which sets cluster-wide flags; and the new command

    ceph osd [un]set-group FLAGS WHO

    which sets flags in batch at the granularity of any crush node or device class.

  • radosgw-admin introduces two subcommands that allow the managing of expire-stale objects that might be left behind after a bucket reshard in earlier versions of Object Gateway. Expire-stale objects are expired objects that should have been automatically erased but still exist and need to be listed and removed manually. One subcommand lists such objects and the other deletes them.

  • Earlier Nautilus releases (14.2.1 and 14.2.0) have an issue where deploying a single new Nautilus BlueStore OSD on an upgraded cluster (i.e. one that was originally deployed pre-Nautilus) breaks the pool utilization statistics reported by ceph df. Until all OSDs have been reprovisioned or updated (via ceph-bluestore-tool repair), the pool statistics will show values that are lower than the true value. This is resolved in 14.2.2, such that the cluster only switches to using the more accurate per-pool stats after all OSDs are 14.2.2 or later, are Block Storage, and have been updated via the repair function if they were created prior to Nautilus.

  • The default value for mon_crush_min_required_version has been changed from firefly to hammer, which means the cluster will issue a health warning if your CRUSH tunables are older than Hammer. There is generally a small (but non-zero) amount of data that will be re-balanced after making the switch to Hammer tunables.

    If possible, we recommend that you set the oldest allowed client to hammer or later. To display what the current oldest allowed client is, run:

    cephadm@adm > ceph osd dump | grep min_compat_client

    If the current value is older than hammer, run the following command to determine whether it is safe to make this change by verifying that there are no clients older than Hammer currently connected to the cluster:

    cephadm@adm > ceph features

    The newer straw2 CRUSH bucket type was introduced in Hammer. If you verify that all clients are Hammer or newer, it allows new features only supported for straw2 buckets to be used, including the crush-compat mode for the Balancer (Section 21.1, “Balancer”).

Find detailed information about the patch at https://download.suse.com/Download?buildid=D38A7mekBz4~

Nautilus 14.2.1 Point ReleaseEdit source

This was the first point release following the original Nautilus release (14.2.0). The original ('General Availability' or 'GA') version of SUSE Enterprise Storage 6 was based on this point release.

Print this page