Applies to SUSE Linux Enterprise High Availability 15 SP4

9 Managing services on remote hosts #

Revision History: SUSE Linux Enterprise High Availability Documentation

The possibilities for monitoring and managing services on remote hosts has become increasingly important during the last few years. SUSE Linux Enterprise High Availability 11 SP3 offered fine-grained monitoring of services on remote hosts via monitoring plug-ins. The recent addition of the pacemaker_remote service now allows SUSE Linux Enterprise High Availability 15 SP4 to fully manage and monitor resources on remote hosts just as if they were a real cluster node—without the need to install the cluster stack on the remote machines.

9.1 Monitoring services on remote hosts with monitoring plug-ins #

Monitoring of virtual machines can be done with the VM agent (which only checks if the guest shows up in the hypervisor), or by external scripts called from the VirtualDomain or Xen agent. Up to now, more fine-grained monitoring was only possible with a full setup of the High Availability stack within the virtual machines.

By providing support for monitoring plug-ins (formerly named Nagios plug-ins), SUSE Linux Enterprise High Availability now also allows you to monitor services on remote hosts. You can collect external statuses on the guests without modifying the guest image. For example, VM guests might run Web services or simple network resources that need to be accessible. With the Nagios resource agents, you can now monitor the Web service or the network resource on the guest. If these services are not reachable anymore, SUSE Linux Enterprise High Availability triggers a restart or migration of the respective guest.

If your guests depend on a service (for example, an NFS server to be used by the guest), the service can either be an ordinary resource, managed by the cluster, or an external service that is monitored with Nagios resources instead.

To configure the Nagios resources, the following packages must be installed on the host:

monitoring-plugins
monitoring-plugins-metadata

YaST or Zypper will resolve any dependencies on further packages, if required.

A typical use case is to configure the monitoring plug-ins as resources belonging to a resource container, which usually is a VM. The container will be restarted if any of its resources has failed. Refer to Example 9.1, “Configuring resources for monitoring plug-ins” for a configuration example. Alternatively, Nagios resource agents can also be configured as ordinary resources to use them for monitoring hosts or services via the network.

Example 9.1: Configuring resources for monitoring plug-ins #

primitive vm1 VirtualDomain \
    params hypervisor="qemu:///system" config="/etc/libvirt/qemu/vm1.xml" \
    op start interval="0" timeout="90" \
    op stop interval="0" timeout="90" \
    op monitor interval="10" timeout="30"
primitive vm1-sshd nagios:check_tcp \
    params hostname="vm1" port="22" \ 1
    op start interval="0" timeout="120" \ 2
    op monitor interval="10"
group g-vm1-and-services vm1 vm1-sshd \
    meta container="vm1" 3

1	The supported parameters are the same as the long options of a monitoring plug-in. Monitoring plug-ins connect to services with the parameter `hostname`. Therefore the attribute's value must be a resolvable host name or an IP address.
2	As it takes some time to get the guest operating system up and its services running, the start timeout of the monitoring resource must be long enough.
3	A cluster resource container of type `ocf:heartbeat:Xen`, `ocf:heartbeat:VirtualDomain` or `ocf:heartbeat:lxc`. It can either be a VM or a Linux Container.

The example above contains only one resource for the check_tcpplug-in, but multiple resources for different plug-in types can be configured (for example, check_http or check_udp).

If the host names of the services are the same, the hostname parameter can also be specified for the group, instead of adding it to the individual primitives. For example:

group g-vm1-and-services vm1 vm1-sshd vm1-httpd \
     meta container="vm1" \
     params hostname="vm1"

If any of the services monitored by the monitoring plug-ins fail within the VM, the cluster will detect that and restart the container resource (the VM). Which action to take in this case can be configured by specifying the on-fail attribute for the service's monitoring operation. It defaults to restart-container.

Failure counts of services will be taken into account when considering the VM's migration-threshold.

9.2 Managing services on remote nodes with `pacemaker_remote` #

With the pacemaker_remote service, High Availability clusters can be extended to virtual nodes or remote bare-metal machines. They do not need to run the cluster stack to become members of the cluster.

SUSE Linux Enterprise High Availability can now launch virtual environments (KVM and LXC), plus the resources that live within those virtual environments without requiring the virtual environments to run Pacemaker or Corosync.

For the use case of managing both virtual machines as cluster resources plus the resources that live within the VMs, you can now use the following setup:

The “normal” (bare-metal) cluster nodes run SUSE Linux Enterprise High Availability.
The virtual machines run the pacemaker_remote service (almost no configuration required on the VM's side).
The cluster stack on the “normal” cluster nodes launches the VMs and connects to the pacemaker_remote service running on the VMs to integrate them as remote nodes into the cluster.

As the remote nodes do not have the cluster stack installed, this has the following implications:

Remote nodes do not take part in quorum.
Remote nodes cannot become the DC.
Remote nodes are not bound by the scalability limits (Corosync has a member limit of 32 nodes).

Find more information about the remote_pacemaker service, including multiple use cases with detailed setup instructions in Pacemaker Remote Quick Start.

9 Managing services on remote hosts #

9.1 Monitoring services on remote hosts with monitoring plug-ins #

9.2 Managing services on remote nodes with pacemaker_remote #

9.2 Managing services on remote nodes with `pacemaker_remote` #