Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / SUSE Linux Enterprise Micro – Dokumentation  / Administration Guide / Monitoring and debugging / Health checker
Applies to SUSE Linux Enterprise Micro 5.5

8 Health checker

Health checker is a program delivered with SLE Micro that checks whether services are running properly during booting of your system.

During the boot process, systemd calls Health checker, which in turn calls its plugins. Each plugin checks a particular service or condition. If each check passes, a status file (/var/lib/misc/health-check.state) is created. The status file marks the current root file system as correct.

If any of the health checker plugins reports an error, the action taken depends on a particular condition, as described below:

The snapshot is booted for the first time.

If the current snapshot is different from the last one that worked properly, an automatic rollback to the last working snapshot is performed. This means that the last change performed to the file system broke the snapshot.

The snapshot has already booted correctly in the past.

There could be just a temporary problem, and the system is rebooted automatically.

The reboot of a previously correctly booted snapshot has failed.

If there was already a problem during boot and automatic reboot has been triggered, but the problem still persists, then the system is kept running to enable to the administrator to fix the problem. The services that are tested by the health checker plugins are stopped if possible.

8.1 Adding custom plugins

Health checker supports the addition of your own plugins to check services during the boot process. Each plugin is a bash script that must fulfill the following requirements:

  • Plugins are located within a specific directory—/usr/libexec/health-checker

  • The service that will be checked by the particular plugin must be defined in the Unit section of the /usr/lib/systemd/system/health-checker.service file. For example, the etcd service is defined as follows:

    [Unit]
    ...
    After=etcd.service
    ...
  • Each plugin must have functions called run_checks and stop_services defined. The run_checks function checks whether a particular service has started properly. Bear in mind that service that has not been enabled by systemd, should be ignored. The function stop_services is called to stop the particular service in case the service has not been started properly. You can use the plugin template for your reference.