Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]

17 Troubleshooting

This chapter provides guidance on registering cloned systems with SUSE Manager. This includes both Salt and Traditional clients. For more information, see https://www.novell.com/support/kb/doc.php?id=7012170.

17.1 Registering Cloned Salt Minions

Procedure: Registering a Cloned Salt Minion with SUSE Manager
  1. Clone your system (for example using the existing cloning mechanism of your favorite Hypervisor)

    Note
    Note: Quick Tips

    Each step in this section is performed on the cloned system, this procedure does not manipulate the original system, which will still be registered to SUSE Manager. The cloned virtual machine should have a different UUID from the original (this UUID is generated by your hypervisor) or SUSE Manager will overwrite the original system data with the new one.

  2. Make sure your machines have different hostnames and IP addresses, also check that /etc/hosts contains the changes you made and the correct host entries.

The next step you take will depend on the Operating System of the clone.

The following scenario can occur after on-boarding cloned Salt minions. If after accepting all cloned minion keys from the onboarding page and you see only one minion on the System Overview page, this is likely due to these machines being clones of the original and using a duplicate machine-id. Perform the following steps to resolve this conflict based on OS.

Procedure: SLES 12 Registering Salt Clones
  1. SLES 12: If your machines have the same machine ids then delete the file on each minion and recreate it:

    # rm /etc/machine-id
    # rm /var/lib/dbus/machine-id
    # dbus-uuidgen --ensure
    # systemd-machine-id-setup
Procedure: SLES 11 Registering Salt Clones
  1. SLES 11: As there is no systemd machine id, generate one from dbus:

    # rm /var/lib/dbus/machine-id
    # dbus-uuidgen --ensure

If your machines still have the same minion id then delete the minion_id file on each minion (FQDN will be used when it is regenerated on minion restart):

# rm /etc/salt/minion_id

Finally delete accepted keys from Onboarding page and system profile from SUSE Manager, and restart the minion with:

# systemctl restart salt-minion

You should be able to re-register them again, but each minion will use a different '/etc/machine-id' and should now be correctly displayed on the System Overview page.

17.2 Registering Cloned Traditional Systems

This section provides guidance on troubleshooting cloned traditional systems registered via bootstrap.

Procedure: Registering a Cloned System with SUSE Manager (Traditional Systems)
  1. Clone your system (using your favorite hypervisor.)

    Note
    Note: Quick Tips

    Each step in this section is performed on the cloned system, this procedure does not manipulate the original system, which will still be registered to SUSE Manager. The cloned virtual machine should have a different UUID from the original (this UUID is generated by your hypervisor) or SUSE Manager will overwrite the original system data with the new one.

  2. Change the Hostname and IP addresses, also make sure /etc/hosts contains the changes you made and the correct host entries.

  3. Stop the rhnsd daemon, on Red Hat Enterprise Linux Server 6 and SUSE Linux Enterprise 11 with:

    # /etc/init.d/rhnsd stop

    or, on newer systemd-based systems, with:

    # service rhnsd stop
  4. Stop osad with:

    # /etc/init.d/osad stop

    or alternativly:

    # rcosad stop
  5. Remove the osad authentifcation configuration file and the systemid with:

    # rm -f /etc/sysconfig/rhn/{osad-auth.conf,systemid}

The next step you take will depend on the Operating System of the clone.

Procedure: SLES 12 Registering A Cloned Traditional System
  1. If your machines have the same machine ids then delete the file on each client and recreate it:

    # rm /etc/machine-id
    # rm /var/lib/dbus/machine-id
    # dbus-uuidgen --ensure
    # systemd-machine-id-setup
  2. Remove the following credential files:

    # rm  -f /etc/zypp/credentials.d/{SCCcredentials,NCCcredentials}
  3. Re-run the bootstrap script. You should now see the cloned system in SUSE Manager without overwriting the system it was cloned from.

Procedure: SLES 11 Registering A Cloned Traditional System
  1. Continued from section 1 step 5:

    # suse_register -E

    (--erase-local-regdata, Erase all local files created from a previous executed registration. This option make the system look like never registered)

  2. Re-run the bootstrap script. You should now see the cloned system in SUSE Manager without overwriting the system it was cloned from.

Procedure: SLES 10 Registering A Cloned Traditional System
  1. Continued from section 1 step 5:

    # rm -rf /etc/{zmd,zypp}
  2. # ¡¡¡¡¡ everthing in /var/lib/zypp/ except /var/lib/zypp/db/products/ !!!!!
    # check whether this command works for you
    # rm -rf /var/lib/zypp/!(db)
  3. # rm -rf /var/lib/zmd/
  4. Re-run the bootstrap script. You should now see the cloned system in SUSE Manager without overwriting the system it was cloned from.

Procedure: RHEL 5,6 and 7
  1. Continued from section 1 step 5:

    # rm  -f /etc/NCCcredentials
  2. Re-run the bootstrap script. You should now see the cloned system in SUSE Manager without overwriting the system it was cloned from.

17.3 Typical OSAD/jabberd Challenges

This section provides answers for typical issues regarding OSAD and jabberd.

17.3.1 Open File Count Exceeded

SYMPTOMS: OSAD clients cannot contact the SUSE Manager Server, and jabberd requires long periods of time to respond on port 5222.

CAUSE: The number of maximum files that a jabber user can open is lower than the number of connected clients. Each client requires one permanently open TCP connection and each connection requires one file handler. The result is jabberd begins to queue and refuse connections.

CURE: Edit the /etc/security/limits.conf to something similar to the following: jabbersoftnofile<#clients + 100> jabberhardnofile<#clients + 1000>

This will vary according to your setup. For example in the case of 5000 clients: jabbersoftnofile5100 jabberhardnofile6000

Ensure you update the /etc/jabberd/c2s.xml max_fds parameter as well. For example: <max_fds>6000</max_fds>

EXPLANATION: The soft file limit is the limit of the maximum number of open files for a single process. In SUSE Manager the highest consuming process is c2s, which opens a connection per client. 100 additional files are added, here, to accommodate for any non-connection file that c2s requires to work correctly. The hard limit applies to all processes belonging to the jabber user, and accounts for open files from the router, s2s and sm processes additionally.

17.3.2 jabberd Database Corruption

SYMPTOMS: After a disk is full error or a disk crash event, the jabberd database may have become corrupted. jabberd may then fail to start during spacewalk-service start:

Starting spacewalk services...
   Initializing jabberd processes...
       Starting router                                                                   done
       Starting sm startproc:  exit status of parent of /usr/bin/sm: 2                   failed
   Terminating jabberd processes...

/var/log/messages shows more details:

jabberd/sm[31445]: starting up
jabberd/sm[31445]: process id is 31445, written to /var/lib/jabberd/pid/sm.pid
jabberd/sm[31445]: loading 'db' storage module
jabberd/sm[31445]: db: corruption detected! close all jabberd processes and run db_recover
jabberd/router[31437]: shutting down

CURE: Remove the jabberd database and restart. Jabberd will automatically re-create the database:

spacewalk-service stop
 rm -Rf /var/lib/jabberd/db/*
 spacewalk-service start

An alternative approach would be to test another database, but SUSE Manager does not deliver drivers for this:

rcosa-dispatcher stop
 rcjabberd stop
 cd /var/lib/jabberd/db
 rm *
 cp /usr/share/doc/packages/jabberd/db-setup.sqlite .
 sqlite3 sqlite.db < db-setup.sqlite
 chown jabber:jabber *
 rcjabberd start
 rcosa-dispatcher start

17.3.3 Capturing XMPP Network Data for Debugging Purposes

If you are experiencing bugs regarding OSAD, it can be useful to dump network messages in order to help with debugging. The following procedures provide information on capturing data from both the client and server side.

Procedure: Server Side Capture
  1. Install the tcpdump package on the SUSE Manager Server as root: zypper in tcpdump

  2. Stop the OSA dispatcher and Jabber processes with rcosa-dispatcher stop and rcjabberd stop.

  3. Start data capture on port 5222: tcpdump -s 0 port 5222 -w server_dump.pcap

  4. Start the OSA dispatcher and Jabber processes: rcosa-dispatcher start and rcjabberd start.

  5. Open a second terminal and execute the following commands: rcosa-dispatcher start and rcjabberd start.

  6. Operate the SUSE Manager server and clients so the bug you formerly experienced is reproduced.

  7. Once you have finished your capture re-open terminal 1 and stop the capture of data with: CTRLc

Procedure: Client Side Capture
  1. Install the tcpdump package on your client as root: zypper in tcpdump

  2. Stop the OSA process: rcosad stop.

  3. Begin data capture on port 5222: tcpdump -s 0 port 5222 -w client_client_dump.pcap

  4. Open a second terminal and start the OSA process: rcosad start

  5. Operate the SUSE Manager server and clients so the bug you formerly experienced is reproduced.

  6. Once you have finished your capture re-open terminal 1 and stop the capture of data with: CTRLc

17.3.4 Engineering Notes: Analyzing Captured Data

This section provides information on analyzing the previously captured data from client and server.

  1. Obtain the certificate file from your SUSE Manager server: /etc/pki/spacewalk/jabberd/server.pem

  2. Edit the certificate file removing all lines before ----BEGIN RSA PRIVATE KEY-----, save it as key.pem

  3. Install Wireshark as root with: zypper in wireshark

  4. Open the captured file in wireshark.

  5. From Eidt › ]menu:Preferences[ select SSL from the left pane.

  6. Select RSA keys list: Edit › ]menu:New[

    • IP Address any

    • Port: 5222

    • Protocol: xmpp

    • Key File: open the key.pem file previously edited.

    • Password: leave blank

    For more information see also:

17.4 Gathering Information with spacewalk-report

The spacewalk-report command is used to produce a variety of reports for system administrators. These reports can be helpful for taking inventory of your entitlements, subscribed systems, users, and organizations. Using reports is often simpler than gathering information manually from the SUSE Manager Web UI, especially if you have many systems under management.

Note
Note: spacewalk-reports Package

To use spacewalk-report, you must have the spacewalk-reports package installed.

spacewalk-report allows administrators to organize and display reports about content, systems, and user resources across SUSE Manager. Using spacewalk-report, you can receive reports on:

  1. System Inventory: lists all of the systems registered to SUSE Manager.

  2. Entitlements: lists all organizations on SUSE Manager, sorted by system or channel entitlements.

  3. Patches: lists all the patches relevant to the registered systems and sorts patches by severity, as well as the systems that apply to a particular patch.

  4. Users: lists all the users registered to SUSE Manager and any systems associated with a particular user.

spacewalk-report allows administrators to organize and display reports about content, systems, and user resources across SUSE Manager. To get the report in CSV format, run the following at the command line of your SUSE Manager server.

spacewalk-report report_name

The following reports are available:

Table 17.1: spacewalk-report Reports
ReportInvoked asDescription

Channel Packages

channel-packages

List of packages in a channel.

Channel Report

channels

Detailed report of a given channel.

Cloned Channel Report

cloned-channels

Detailed report of cloned channels.

Custom Info

custom-info

System custom information.

Entitlements

entitlements

Lists all organizations on SUSE Manager with their system or channel entitlements.

Patches in Channels

errata-channels

Lists of patches in channels.

Patches Details

errata-list

Lists all patches that affect systems registered to SUSE Manager.

All patches

errata-list-all

Complete list of all patches.

Patches for Systems

errata-systems

Lists applicable patches and any registered systems that are affected.

Host Guests

host-guests

List of host-guests mapping.

Inactive Systems

inactive-systems

List of inactive systems.

System Inventory

inventory

List of systems registered to the server, together with hardware and software information.

Kickstart Trees

kickstartable-trees

List of kickstartable trees.

All Upgradable Versions

packages-updates-all

List of all newer package versions that can be upgraded.

Newest Upgradable Version

packages-updates-newest

List of only newest package versions that can be upgraded.

Result of SCAP

scap-scan

Result of OpenSCAP sccdf eval.

Result of SCAP

scap-scan-results

Result of OpenSCAP sccdf eval, in a different format.

System Data

splice-export

System data needed for splice integration.

System Groups

system-groups

List of system groups.

Activation Keys for System Groups

system-groups-keys

List of activation keys for system groups.

Systems in System Groups

system-groups-systems

List of systems in system groups.

System Groups Users

system-groups-users

Report of system groups users.

Installed Packages

system-packages-installed

List of packages installed on systems.

Users in the System

users

Lists all users registered to SUSE Manager.

Systems administered

users-systems

List of systems that individual users can administer.

For more information about an individual report, run spacewalk-report with the option --info or --list-fields-info and the report name. The description and list of possible fields in the report will be shown.

For further information on program invocations and options, see the spacewalk-report(8) man page as well as the --helpparameter of the spacewalk-report.

17.5 RPC Connection Timeout Settings

RPC connection timeouts are configurable on the SUSE Manager server, SUSE Manager Proxy server, and the clients. For example, if package downloads take longer then expected, you can increase timeout values. spacewalk-proxy restart should be run after the setting is added or modified.

Set the following variables to a value in seconds specifying how long an RPC connection may take at maximum:

Server -/etc/rhn/rhn.conf
server.timeout =`number`
Proxy Server -/etc/rhn/rhn.conf
proxy.timeout =`number`
SUSE Linux Enterprise Server Clients (using zypp-plugin-spacewalk ) -/etc/zypp/zypp.conf
## Valid values:  [0,3600]
## Default value: 180
download.transfer_timeout = 180

This is the maximum time in seconds that a transfer operation is allowed to take. This is useful for preventing batch jobs from hanging for hours due to slow networks or links going down. If limiting operations to less than a few minutes, you risk aborting perfectly normal operations.

Red Hat Enterprise Linux Clients (using yum-rhn-plugin ) -/etc/yum.conf
timeout =`number`

17.6 Client/Server Package Inconsistency

In some cases, updates are available in the web interface, but not appearing on the client. If you schedule an update on the client, it will fail with an error stating that no updates are available. This can be caused by a metadata regeneration problem, or because update packages have been locked.

The notice that updates are available will appear immediately, but new metadata is only generated on the server after synchronizing. In this case, an inconsistency can occur if taskomatic crashes, or because taskomatic is still running and creating new metadata.

To address this issue, check the /var/log/rhn/rhn_taskomatic_daemon.log file to determine if any processess are still running, or an exception which could indicate a crash. In the case of a crash, restart taskomatic.

Check package locks and exclude lists to determine if packages are locked or excluded on the client:

On Expanded Support Platform, check /etc/yum.conf and search for exclude=.

On SUSE Linux Enterprise Server, use the zypper locks command.

17.7 Corrupted Repository Data

If the information in /var/cache/rhn/repodata/sles12-sp4-updates-x86_64 becomes out of date, it will cause problems with updating the server. The repository data file can be regenerated using the spacemd command:

Procedure: Rebuild repodata file
  1. Remove all files from /var/cache/rhn/repodata/sles12-sp4-updates-x86_64

  2. Regenerate the file with spacecmd softwarechannel_regenerateyumcache sles12-sp4-updates-x86_64

17.8 Unable to Get Local Issuer Certificate

Some older bootstrap scripts will will create a link to the local certificate in the wrong place, which can cause problems with zypper returning an Unrecognized error about the local issuer certificate. In this case, ensure that the link to the local issuer certificate has been created in /etc/ssl/certs/, and consider updating your bootstrap scripts.

Print this page