9 Managing services on remote hosts #
The possibilities for monitoring and managing services on remote hosts
has become increasingly important during the last few years.
SUSE Linux Enterprise High Availability 11 SP3 offered fine-grained monitoring of services on
remote hosts via monitoring plug-ins. The recent addition of the
pacemaker_remote
service now allows SUSE Linux Enterprise High Availability
15 SP4 to fully manage and monitor resources on remote hosts
just as if they were a real cluster node—without the need to
install the cluster stack on the remote machines.
9.1 Monitoring services on remote hosts with monitoring plug-ins #
Monitoring of virtual machines can be done with the VM agent (which only checks if the guest shows up in the hypervisor), or by external scripts called from the VirtualDomain or Xen agent. Up to now, more fine-grained monitoring was only possible with a full setup of the High Availability stack within the virtual machines.
By providing support for monitoring plug-ins (formerly named Nagios plug-ins), SUSE Linux Enterprise High Availability now also allows you to monitor services on remote hosts. You can collect external statuses on the guests without modifying the guest image. For example, VM guests might run Web services or simple network resources that need to be accessible. With the Nagios resource agents, you can now monitor the Web service or the network resource on the guest. If these services are not reachable anymore, SUSE Linux Enterprise High Availability triggers a restart or migration of the respective guest.
If your guests depend on a service (for example, an NFS server to be used by the guest), the service can either be an ordinary resource, managed by the cluster, or an external service that is monitored with Nagios resources instead.
To configure the Nagios resources, the following packages must be installed on the host:
monitoring-plugins
monitoring-plugins-metadata
YaST or Zypper will resolve any dependencies on further packages, if required.
A typical use case is to configure the monitoring plug-ins as resources belonging to a resource container, which usually is a VM. The container will be restarted if any of its resources has failed. Refer to Example 9.1, “Configuring resources for monitoring plug-ins” for a configuration example. Alternatively, Nagios resource agents can also be configured as ordinary resources to use them for monitoring hosts or services via the network.
primitive vm1 VirtualDomain \ params hypervisor="qemu:///system" config="/etc/libvirt/qemu/vm1.xml" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op monitor interval="10" timeout="30" primitive vm1-sshd nagios:check_tcp \ params hostname="vm1" port="22" \ 1 op start interval="0" timeout="120" \ 2 op monitor interval="10" group g-vm1-and-services vm1 vm1-sshd \ meta container="vm1" 3
The supported parameters are the same as the long options of a
monitoring plug-in. Monitoring plug-ins connect to services with the
parameter | |
As it takes some time to get the guest operating system up and its services running, the start timeout of the monitoring resource must be long enough. | |
A cluster resource container of type
|
The example above contains only one resource for the
check_tcp
plug-in, but multiple resources for
different plug-in types can be configured (for example,
check_http
or check_udp
).
If the host names of the services are the same, the
hostname
parameter can also be specified for the
group, instead of adding it to the individual primitives. For example:
group g-vm1-and-services vm1 vm1-sshd vm1-httpd \ meta container="vm1" \ params hostname="vm1"
If any of the services monitored by the monitoring plug-ins fail within
the VM, the cluster will detect that and restart the container resource
(the VM). Which action to take in this case can be configured by
specifying the on-fail
attribute for the service's
monitoring operation. It defaults to
restart-container
.
Failure counts of services will be taken into account when considering the VM's migration-threshold.
9.2 Managing services on remote nodes with pacemaker_remote
#
With the pacemaker_remote
service, High Availability clusters
can be extended to virtual nodes or remote bare-metal machines. They do
not need to run the cluster stack to become members of the cluster.
SUSE Linux Enterprise High Availability can now launch virtual environments (KVM and LXC), plus the resources that live within those virtual environments without requiring the virtual environments to run Pacemaker or Corosync.
For the use case of managing both virtual machines as cluster resources plus the resources that live within the VMs, you can now use the following setup:
The “normal” (bare-metal) cluster nodes run SUSE Linux Enterprise High Availability.
The virtual machines run the
pacemaker_remote
service (almost no configuration required on the VM's side).The cluster stack on the “normal” cluster nodes launches the VMs and connects to the
pacemaker_remote
service running on the VMs to integrate them as remote nodes into the cluster.
As the remote nodes do not have the cluster stack installed, this has the following implications:
Remote nodes do not take part in quorum.
Remote nodes cannot become the DC.
Remote nodes are not bound by the scalability limits (Corosync has a member limit of 32 nodes).
Find more information about the remote_pacemaker
service, including multiple use cases with detailed setup instructions
in Pacemaker Remote Quick Start.