Applies to SUSE Linux Enterprise High Availability 15 SP4

Part II Configuration and administration #

Revision History: SUSE Linux Enterprise High Availability Documentation

5 Configuration and administration basics

The main purpose of an HA cluster is to manage user services. Typical examples of user services are an Apache Web server or a database. From the user's point of view, the services do something specific when ordered to do so. To the cluster, however, they are only resources which may be started or stopped—the nature of the service is irrelevant to the cluster.

This chapter introduces some basic concepts you need to know when administering your cluster. The following chapters show you how to execute the main configuration and administration tasks with each of the management tools SUSE Linux Enterprise High Availability provides.

6 Configuring cluster resources

As a cluster administrator, you need to create cluster resources for every resource or application you run on servers in your cluster. Cluster resources can include Web sites, e-mail servers, databases, file systems, virtual machines, and any other server-based applications or services you want to make available to users at all times.

7 Configuring resource constraints

Having all the resources configured is only part of the job. Even if the cluster knows all needed resources, it still might not be able to handle them correctly. Resource constraints let you specify which cluster nodes resources can run on, what order resources will load in, and what other resources a specific resource is dependent on.

8 Managing cluster resources

After configuring the resources in the cluster, use the cluster management tools to start, stop, clean up, remove or migrate the resources. This chapter describes how to use Hawk2 or crmsh for resource management tasks.

9 Managing services on remote hosts

The possibilities for monitoring and managing services on remote hosts has become increasingly important during the last few years. SUSE Linux Enterprise High Availability 11 SP3 offered fine-grained monitoring of services on remote hosts via monitoring plug-ins. The recent addition of the pacemaker_remote service now allows SUSE Linux Enterprise High Availability 15 SP4 to fully manage and monitor resources on remote hosts just as if they were a real cluster node—without the need to install the cluster stack on the remote machines.

10 Adding or modifying resource agents

All tasks that need to be managed by a cluster must be available as a resource. There are two major groups here to consider: resource agents and STONITH agents. For both categories, you can add your own agents, extending the abilities of the cluster to your own needs.

11 Monitoring clusters

This chapter describes how to monitor a cluster's health and view its history.

12 Fencing and STONITH

Fencing is a very important concept in computer clusters for HA (High Availability). A cluster sometimes detects that one of the nodes is behaving strangely and needs to remove it. This is called fencing and is commonly done with a STONITH resource. Fencing may be defined as a method to bring an HA cluster to a known state.

Every resource in a cluster has a state attached. For example: “resource r1 is started on alice”. In an HA cluster, such a state implies that “resource r1 is stopped on all nodes except alice”, because the cluster must make sure that every resource may be started on only one node. Every node must report every change that happens to a resource. The cluster state is thus a collection of resource states and node states.

When the state of a node or resource cannot be established with certainty, fencing comes in. Even when the cluster is not aware of what is happening on a given node, fencing can ensure that the node does not run any important resources.

13 Storage protection and SBD

SBD (STONITH Block Device) provides a node fencing mechanism for Pacemaker-based clusters through the exchange of messages via shared block storage (SAN, iSCSI, FCoE, etc.). This isolates the fencing mechanism from changes in firmware version or dependencies on specific firmware controllers. SBD needs a watchdog on each node to ensure that misbehaving nodes are really stopped. Under certain conditions, it is also possible to use SBD without shared storage, by running it in diskless mode.

The cluster bootstrap scripts provide an automated way to set up a cluster with the option of using SBD as fencing mechanism. For details, see the Installation and Setup Quick Start. However, manually setting up SBD provides you with more options regarding the individual settings.

This chapter explains the concepts behind SBD. It guides you through configuring the components needed by SBD to protect your cluster from potential data corruption in case of a split brain scenario.

In addition to node level fencing, you can use additional mechanisms for storage protection, such as LVM exclusive activation or OCFS2 file locking support (resource level fencing). They protect your system against administrative or application faults.

14 QDevice and QNetd

QDevice and QNetd participate in quorum decisions. With assistance from the arbitrator corosync-qnetd, corosync-qdevice provides a configurable number of votes, allowing a cluster to sustain more node failures than the standard quorum rules allow. We recommend deploying corosync-qnetd and corosync-qdevice for clusters with an even number of nodes, and especially for two-node clusters.

15 Access control lists

The cluster administration tools like CRM Shell (crmsh) or Hawk2 can be used by root or any user in the group haclient. By default, these users have full read/write access. To limit access or assign more fine-grained access rights, you can use Access control lists (ACLs).

Access control lists consist of an ordered set of access rules. Each rule allows read or write access or denies access to a part of the cluster configuration. Rules are typically combined to produce a specific role, then users may be assigned to a role that matches their tasks.

16 Network device bonding

For many systems, it is desirable to implement network connections that comply to more than the standard data security or availability requirements of a typical Ethernet device. In these cases, several Ethernet devices can be aggregated to a single bonding device.

17 Load balancing

Load Balancing makes a cluster of servers appear as one large, fast server to outside clients. This apparent single server is called a virtual server. It consists of one or more load balancers dispatching incoming requests and several real servers running the actual services. With a load balancing setup of SUSE Linux Enterprise High Availability, you can build highly scalable and highly available network services, such as Web, cache, mail, FTP, media and VoIP services.

18 High Availability for virtualization

This chapter explains how to configure virtual machines as highly available cluster resources.

19 Geo clusters (multi-site clusters)

Apart from local clusters and metro area clusters, SUSE® Linux Enterprise High Availability 15 SP4 also supports geographically dispersed clusters (Geo clusters, sometimes also called multi-site clusters). That means you can have multiple, geographically dispersed sites with a local cluster each. Failover between these clusters is coordinated by a higher level entity, the so-called booth. For details on how to use and set up Geo clusters, refer to Geo Clustering Quick Start and Geo Clustering Guide.