SUSE Enterprise Storage 7

Troubleshooting Guide

Authors: Tomáš Bažant, Alexandra Settle, and Liam Proven
Publication Date: 09/21/2021
About this guide
Available documentation
Giving feedback
Documentation conventions
Product life cycle and support
Ceph contributors
Commands and command prompts used in this guide
1 Reporting software problems
2 Troubleshooting logging and debugging
2.1 Accessing configuration settings at runtime
2.2 Activating Ceph debugging at boot time
2.3 Accelerating log rotation
2.4 Monitoring memory utilization
2.5 Enable system, log, and debug settings
2.6 Logging kernel RBD and CephFS clients
2.7 Per-service and per-daemon events
3 Troubleshooting cephadm
3.1 Pausing or disabling cephadm
3.2 Checking cephadm logs
3.3 Accessing Ceph daemon logs
3.4 Collecting systemd status
3.5 Listing configured container images
3.6 Listing all downloaded container images
3.7 Running containers manually
3.8 Failing to infer CIDR network error
3.9 Accessing the admin socket
3.10 Deploying a Ceph Manager manually
3.11 Distributing a program temporary fix (PTF)
3.12 Failure When Adding Hosts with cephadm
3.13 Disabling automatic deployment of daemons
4 Troubleshooting OSDs
4.1 Obtain OSD data
4.2 Stopping without rebalancing
4.3 OSDs not running
4.4 Unresponsive or slow OSDs
4.5 OSD weight is 0
4.6 OSD is down
4.7 Finding slow OSDs
4.8 Flapping OSDs
5 Troubleshooting placement groups (PGs)
5.1 Identifying troubled placement groups
5.2 Placement groups never get clean
5.3 Stuck placement groups
5.4 Peering failure of placement groups
5.5 Failing unfound objects
5.6 Identifying homeless placement groups
5.7 Only a few OSDs receive data
5.8 Unable to write data
5.9 Identifying inconsistent placement groups
5.10 Identifying inactive erasure coded PGs
6 Troubleshooting Ceph Monitors and Ceph Managers
6.1 Initial troubleshooting
6.2 Using the monitor's admin socket
6.3 Understanding mons_status
6.4 Restoring the MONs quorum
6.5 Most common monitor issues
6.6 Monitor store failures
6.7 Next steps
6.8 Manually deploying a MGR daemon
7 Troubleshooting networking
7.1 Identifying OSD networking issues
8 Troubleshooting NFS Ganesha
8.1 Debugging NFS Ganesha logs
8.2 Changing the default port
9 Troubleshooting Ceph health status
10 Troubleshooting the Ceph Dashboard
10.1 Locating the Ceph Dashboard
10.2 Accessing the Ceph Dashboard
10.3 Troubleshooting logging into the Ceph Dashboard
10.4 Determining if a Ceph Dashboard feature is not working
10.5 Ceph Dashboard logs
11 Troubleshooting Object Gateway
11.1 Running a basic health check
11.2 Identifying gateway issues
11.3 Diagnosing crashed Object Gateway process
11.4 Identifying blocked Object Gateway requests
11.5 Large OMAP issues
12 Troubleshooting CephFS
12.1 Slow or stuck operations
12.2 Checking RADOS health
12.3 MDS
12.4 Kernel mount debugging
12.5 Disconnecting and remounting the file system
12.6 Mounting
12.7 Mounting CephFS using old kernel clients
13 Hints and tips
13.1 Identifying orphaned volumes
13.2 Adjusting scrubbing
13.3 Stopping OSDs without rebalancing
13.4 Checking for unbalanced data writing
13.5 Increasing file descriptors
13.6 Integration with virtualization software
13.7 Firewall settings for Ceph
13.8 Testing network performance
13.9 Locating physical disks using LED lights
13.10 Sending large objects with rados fails with full OSD
13.11 Managing the 'Too Many PGs per OSD' status message
13.12 Managing the 'nn pg stuck inactive' status message
13.13 Fixing clock skew warnings
13.14 Determining poor cluster performance caused by network problems
13.15 Managing /var running out of space
14 Frequently asked questions
14.1 How does the number of placement groups affect the cluster performance?
14.2 Can I use SSDs and hard disks on the same cluster?
14.3 What are the trade-offs of ssing a journal on SSD?
14.4 What happens when a disk fails?
14.5 What happens when a journal disk fails?
A Ceph maintenance updates based on upstream 'Octopus' point releases

Copyright © 2020– 2021 SUSE LLC and contributors.

Licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA 4.0): https://creativecommons.org/licenses/by-sa/4.0/legalcode.

For SUSE trademarks, see http://www.suse.com/company/legal/. All third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.

All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof.

