8 Health checker #
Health checker is a program delivered with SLE Micro that checks whether services are running properly during booting of your system.
During the boot process, systemd
calls Health checker,
which in turn calls its plugins. Each plugin checks a particular service or
condition. If each check passes, a status file
(/var/lib/misc/health-check.state
) is created. The
status file marks the current root file system as correct.
If any of the health checker plugins reports an error, the action taken depends on a particular condition, as described below:
- The snapshot is booted for the first time.
If the current snapshot is different from the last one that worked properly, an automatic rollback to the last working snapshot is performed. This means that the last change performed to the file system broke the snapshot.
- The snapshot has already booted correctly in the past.
There could be just a temporary problem, and the system is rebooted automatically.
- The reboot of a previously correctly booted snapshot has failed.
If there was already a problem during boot and automatic reboot has been triggered, but the problem still persists, then the system is kept running to enable to the administrator to fix the problem. The services that are tested by the health checker plugins are stopped if possible.
8.1 Adding custom plugins #
Health checker supports the addition of your own plugins to check services during the boot process. Each plugin is a bash script that must fulfill the following requirements:
Plugins are located within a specific directory—
/usr/libexec/health-checker
The service that will be checked by the particular plugin must be defined in the
Unit
section of the/usr/lib/systemd/system/health-checker.service
file. For example, theetcd
service is defined as follows:[Unit] ... After=etcd.service ...
Each plugin must have functions called
run.checks
andstop_services
defined. Therun.checks
function checks whether a particular service has started properly. Bear in mind that service that has not been enabled by systemd, should be ignored. The functionstop_services
is called to stop the particular service in case the service has not been started properly. You can use the plugin template for your reference.