Jump to content
documentation.suse.com / Geo Clustering Guide
SUSE Linux Enterprise High Availability 12 SP5

Geo Clustering Guide

Geo clustering allows you to have multiple, geographically dispersed sites with a local cluster each. Failover between these clusters is coordinated by a higher level entity: the booth cluster ticket manager. This document explains in detail the setup options and parameters for booth, the Csync2 setup for Geo clusters, how to configure the cluster resources and how to transfer them to other cluster site in case of changes. It also describes how to manage Geo clusters from command line and with Hawk and how to upgrade them to the latest product version.

Publication Date: December 12, 2024

Copyright © 2006–2024 SUSE LLC and contributors. All rights reserved.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled GNU Free Documentation License.

For SUSE trademarks, see https://www.suse.com/company/legal/. All third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.

All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof.

1 Challenges for Geo Clusters

Typically, Geo environments are too far apart to support synchronous communication between the sites. That leads to the following challenges:

  • How to make sure that a cluster site is up and running?

  • How to make sure that resources are only started once?

  • How to make sure that quorum can be reached between the different sites and a split-brain scenario can be avoided?

  • How to keep the CIB up to date on all nodes and sites?

  • How to manage failover between the sites?

  • How to deal with high latency in case of resources that need to be stopped?

In the following sections, learn how to meet these challenges with SUSE Linux Enterprise High Availability.

2 Conceptual Overview

Geo clusters based on SUSE® Linux Enterprise High Availability can be considered overlay clusters where each cluster site corresponds to a cluster node in a traditional cluster. The overlay cluster is managed by the booth cluster ticket manager (in the following called booth).

Each of the parties involved in a Geo cluster runs a service, the boothd. It connects to the booth daemons running at the other sites and exchanges connectivity details. For making cluster resources highly available across sites, booth relies on cluster objects called tickets. A ticket grants the right to run certain resources on a specific cluster site. Booth guarantees that every ticket is granted to no more than one site at a time.

If the communication between two booth instances breaks down, it might be because of a network breakdown between the cluster sites or because of an outage of one cluster site. In this case, you need an additional instance (a third cluster site or an arbitrator) to reach consensus about decisions (such as failover of resources across sites). Arbitrators are single machines (outside of the clusters) that run a booth instance in a special mode. Each Geo cluster can have one or multiple arbitrators.

The most common scenario probably is a Geo cluster with two sites and a single arbitrator on a third site. This requires three booth instances, see Figure 2.1, “Two-Site Cluster (2x2 Nodes + Arbitrator)”.

Two-Site Cluster (2x2 Nodes + Arbitrator)
Figure 2.1: Two-Site Cluster (2x2 Nodes + Arbitrator)

The following list explains the components and mechanisms for Geo clusters in more detail.

Arbitrator

Each site runs one booth instance that is responsible for communicating with the other sites. If you have a setup with an even number of sites, you need an additional instance to reach consensus about decisions such as failover of resources across sites. In this case, add one or more arbitrators running at additional sites. Arbitrators are single machines that run a booth instance in a special mode. As all booth instances communicate with each other, arbitrators help to make more reliable decisions about granting or revoking tickets. Arbitrators cannot hold any tickets.

An arbitrator is especially important for a two-site scenario: For example, if site A can no longer communicate with site B, there are two possible causes for that:

  • A network failure between A and B.

  • Site B is down.

However, if site C (the arbitrator) can still communicate with site B, site B must still be up and running.

Booth Cluster Ticket Manager

Booth is the instance managing the ticket distribution, and thus, the failover process between the sites of a Geo cluster. Each of the participating clusters and arbitrators runs a service, the boothd. It connects to the booth daemons running at the other sites and exchanges connectivity details. After a ticket has been granted to a site, the booth mechanism can manage the ticket automatically: If the site that holds the ticket is out of service, the booth daemons will vote which of the other sites will get the ticket. To protect against brief connection failures, sites that lose the vote (either explicitly or implicitly by being disconnected from the voting body) need to relinquish the ticket after a time-out. Thus, it is made sure that a ticket will only be redistributed after it has been relinquished by the previous site. See also Dead Man Dependency (loss-policy="fence").

For a Geo cluster with two sites and arbitrator, you need 3 booth instances: one instance per site plus the instance running on the arbitrator.

Note
Note: Limited Number of Booth Instances

The upper limit is (currently) 16 booth instances.

Dead Man Dependency (loss-policy="fence")

After a ticket is revoked, it can take a long time until all resources depending on that ticket are stopped, especially in case of cascaded resources. To cut that process short, the cluster administrator can configure a loss-policy (together with the ticket dependencies) for the case that a ticket gets revoked from a site. If the loss-policy is set to fence, the nodes that are hosting dependent resources are fenced.

Warning
Warning: Potential Loss of Data

On the one hand, loss-policy="fence" considerably speeds up the recovery process of the cluster and makes sure that resources can be migrated more quickly.

On the other hand, it can lead to loss of all unwritten data, such as:

  • Data lying on shared storage (for example, DRBD).

  • Data in a replicating database (for example, MariaDB or PostgreSQL) that has not yet reached the other site, because of a slow network link.

Ticket

A ticket grants the right to run certain resources on a specific cluster site. A ticket can only be owned by one site at a time. Initially, none of the sites has a ticket—each ticket must be granted once by the cluster administrator. After that, tickets are managed by the booth for automatic failover of resources. But administrators may also intervene and grant or revoke tickets manually.

After a ticket is administratively revoked, it is not managed by booth anymore. For booth to start managing the ticket again, the ticket must be again granted to a site.

Resources can be bound to a certain ticket by dependencies. Only if the defined ticket is available at a site, the respective resources are started. Vice versa, if the ticket is removed, the resources depending on that ticket are automatically stopped.

The presence or absence of tickets for a site is stored in the CIB as a cluster status. With regard to a certain ticket, there are only two states for a site: true (the site has the ticket) or false (the site does not have the ticket). The absence of a certain ticket (during the initial state of the Geo cluster) is not treated differently from the situation after the ticket has been revoked. Both are reflected by the value false.

A ticket within an overlay cluster is similar to a resource in a traditional cluster. But in contrast to traditional clusters, tickets are the only type of resource in an overlay cluster. They are primitive resources that do not need to be configured or cloned.

Ticket Failover

If the ticket gets lost, which means other booth instances do not hear from the ticket owner in a sufficiently long time, one of the remaining sites will acquire the ticket. This is what is called ticket failover. If the remaining members cannot form a majority, then the ticket cannot fail over.

3 Requirements

Software Requirements
  • All machines (cluster nodes and arbitrators) that will be part of the Geo cluster have the following software installed:

    • SUSE® Linux Enterprise Server 12 SP5

    • SUSE Linux Enterprise High Availability 12 SP5

    • Geo Clustering for SUSE Linux Enterprise High Availability 12 SP5

Network Requirements
  • The virtual IPs to be used for each cluster site must be accessible across the Geo cluster.

  • The sites must be reachable on one UDP and TCP port per booth instance. That means any firewalls or IPsec tunnels in between must be configured accordingly.

  • Other setup decisions may require to open more ports (for example, for DRBD or database replication).

Other Requirements and Recommendations
  • All cluster nodes on all sites should synchronize to an NTP server outside the cluster. For more information, see https://documentation.suse.com/sles-12/html/SLES-all/cha-netz-xntp.html.

    If nodes are not synchronized, log files and cluster reports are very hard to analyze.

  • Use an uneven number of members in your Geo cluster. In case the network connection breaks down, this makes sure that there still is a majority of sites (to avoid a split brain scenario). In case you have an even number of cluster sites, use an arbitrator.

  • The cluster on each site has a meaningful name, for example: amsterdam and berlin.

    The cluster names for each site are defined in the respective /etc/corosync/corosync.conf files:

    totem {
        [...]
        cluster_name: amsterdam
        }

    This can either be done manually (by editing /etc/corosync/corosync.conf) or with the YaST cluster module (by switching to the Communication Channels category and defining a Cluster Name). Afterward, stop and start the pacemaker service for the changes to take effect:

    # systemctl stop pacemaker
    # systemctl start pacemaker

4 Setting Up the Booth Services

This chapter describes the setup and configuration options for booth, how to synchronize the booth configuration to all sites and arbitrators, how to enable and start the booth services, and how to reconfigure booth while its services are running.

4.1 Booth Configuration and Setup Options

The default booth configuration is /etc/booth/booth.conf. This file must be the same on all sites of your Geo cluster, including the arbitrator or arbitrators. To keep the booth configuration synchronous across all sites and arbitrators, use Csync2, as described in Section 4.4, “Synchronizing the Booth Configuration to All Sites and Arbitrators”.

Note
Note: Ownership of /etc/booth and Files

The directory /etc/booth and all files therein need to belong to the user hacluster and the group haclient. Whenever you copy a new file from this directory, use the option -p for the cp command to preserve the ownership. Alternatively, when you create a new file, set the user and group afterward with chown hacluster:haclient FILE.

For setups including multiple Geo clusters, it is possible to share the same arbitrator (as of SUSE Linux Enterprise High Availability 12). By providing several booth configuration files, you can start multiple booth instances on the same arbitrator, with each booth instance running on a different port. That way, you can use one machine to serve as arbitrator for different Geo clusters. For details on how to configure booth for multiple Geo clusters, refer to Section 4.3, “Using a Multi-Tenant Booth Setup”.

To prevent malicious parties from disrupting the booth service, you can configure authentication for talking to booth, based on a shared key. For details, see 5 in Example 4.1, “A Booth Configuration File”. All hosts that communicate with various booth servers need this key. Therefore make sure to include the key file in the Csync2 configuration or to synchronize it manually across all parties.

4.2 Using the Default Booth Setup

If you have set up your basic Geo cluster with the ha-cluster-bootstrap scripts as described in the Geo Clustering Quick Start, the scripts have created a default booth configuration on all sites with a minimal set of parameters. To extend or fine-tune the minimal booth configuration, have a look at Example 4.1 or at the examples in Section 4.3, “Using a Multi-Tenant Booth Setup”.

To add or change parameters needed for booth, either edit the booth configuration files manually or use the YaST Geo Cluster module. To access the YaST module, start it from command line with yast2 geo-cluster (or start YaST and select High Availability › Geo Cluster).

Example 4.1: A Booth Configuration File
transport = UDP 1
port = 9929 2
arbitrator = 192.168.203.100 3
site =  192.168.201.100 4
site =  192.168.202.100 4
authfile = /etc/booth/authkey 5
ticket = "ticket-nfs" 6
     expire = 600 7
     timeout = 10 8
     retries = 5 9
     renewal-freq = 30 10
     before-acquire-handler11 = /etc/booth/ticket-nfs12 ms_drbd_nfs13
     acquire-after = 60 14
ticket = "ticketA" 6
     expire = 600 7
     timeout = 10 8
     retries = 5 9
     renewal-freq = 30 10
     before-acquire-handler11 = /etc/booth/ticket-A12 db-1 13
     acquire-after = 60 14
ticket = "ticketB" 6
     expire = 600 7
     timeout = 10 8
     retries = 5 9
     renewal-freq = 30 10
     before-acquire-handler11 = /etc/booth/ticket-B12 db-8 13
     acquire-after = 60 14

1

The transport protocol used for communication between the sites. Only UDP is supported, but other transport layers will follow in the future. Currently, this parameter can therefore be omitted.

2

The port to be used for communication between the booth instances at each site. When not using the default port (9929), choose a port that is not already used for different services. Make sure to open the port in the nodes' and arbitrators' firewalls. The booth clients use TCP to communicate with the boothd. Booth will always bind and listen to both UDP and TCP ports.

3

The IP address of the machine to use as arbitrator. Add an entry for each arbitrator you use in your Geo cluster setup.

4

The IP address used for the boothd on a site. Add an entry for each site you use in your Geo cluster setup. Make sure to insert the correct virtual IP addresses (IPaddr2) for each site, otherwise the booth mechanism will not work correctly. Booth works with both IPv4 and IPv6 addresses.

If you have set up booth with the ha-cluster-bootstrap scripts, the virtual IPs you have specified during setup have been written to the booth configuration already (and have been added to the cluster configuration, too). To set up the cluster resources manually, see Section 6.2, “Configuring a Resource Group for boothd.

5

Optional parameter. Enables booth authentication for clients and servers on the basis of a shared key. This parameter specifies the path to the key file.

Key Requirements
  • The key can be either binary or text.

    If it is text, the following characters are ignored: leading and trailing white space, new lines.

  • The key must be between 8 and 64 characters long.

  • The key must belong to the user hacluster and the group haclient.

  • The key must be readable only by the file owner.

6

The ticket to be managed by booth. For each ticket, add a ticket entry. For example, the ticket ticket-nfs specified here can be used for failover of NFS and DRBD as explained in https://documentation.suse.com/sbp/all/html/SBP-DRBD/index.html.

7

Optional parameter. Defines the ticket's expiry time in seconds. A site that has been granted a ticket will renew the ticket regularly. If booth does not receive any information about renewal of the ticket within the defined expiry time, the ticket will be revoked and granted to another site. If no expiry time is specified, the ticket will expire after 600 seconds by default. The parameter should not be set to a value less than 120 seconds. The default value set by the ha-cluster-init scripts is 600.

8

Optional parameter. Defines a timeout period in seconds. After that time, booth will resend packets if it did not receive a reply within this period. The timeout defined should be long enough to allow packets to reach other booth members (all arbitrators and sites).

9

Optional parameter. Defines how many times booth retries sending packets before giving up waiting for confirmation by other sites. Values smaller than 3 are invalid and will prevent booth from starting.

10

Optional parameter. Sets the ticket renewal frequency period. Ticket renewal occurs every half expiry time by default. If the network reliability is often reduced over prolonged periods, it is advisable to renew more often. Before every renewal the before-acquire-handler is run.

11

Optional parameter. It supports one or more scripts. To use more than one script, each script can be responsible for different checks, like cluster state, data center connectivity, environment health sensors, and more. Store all scripts in the directory /etc/booth.d/TICKET_NAME and make sure they have the correct ownership (user hacluster and group haclient). Assign the directory name as a value to the parameter before-acquire-handler.

The scripts in this directory are executed in alphabetical order. All scripts will be called before boothd tries to acquire or renew a ticket. For the ticket to be granted or renewed, all scripts must succeed. The semantics are the same as for a single script: On exit code other than 0, boothd relinquishes the ticket.

12

The /usr/share/booth/service-runnable script is included in the product as an example. To use it, link it into the respective ticket directory:

# ln -s /usr/share/booth/service-runnable /etc/booth.d/TICKET_NAME

Assume that the /etc/booth.dTICKET_NAME directory contains the service-runnable script. This simple script is based on crm_simulate. It can be used to test whether a particular cluster resource can be run on the current cluster site. That means, it checks if the cluster is healthy enough to run the resource (all resource dependencies are fulfilled, the cluster partition has quorum, no dirty nodes, etc.). For example, if a service in the dependency-chain has a failcount of INFINITY on all available nodes, the service cannot be run on that site. In that case, it is of no use to claim the ticket.

13

The resource to be tested by the before-acquire-handler (in this case, by the service-runnable script). You need to reference the resource that is protected by the respective ticket. In this example, resource db-1 is protected by ticketA whereas db-8 is protected by ticketB. The resource for DRBD (ms_drbd_nfs) is protected by the ticket ticket-nfs.

14

Optional parameter. After a ticket is lost, booth will wait this time in addition before acquiring the ticket. This is to allow for the site that lost the ticket to relinquish the resources, by either stopping them or fencing a node. A typical delay might be 60 seconds, but ultimately it depends on the protected resources and the fencing configuration. The default value is 0.

If you are unsure how long stopping or demoting the resources or fencing a node may take (depending on the loss-policy), use this parameter to prevent resources from running on two sites at the same time.

4.2.1 Manually Editing The Booth Configuration File

  1. Log in to a cluster node as root or equivalent.

  2. If /etc/booth/booth.conf does not exist yet, copy the example booth configuration file /etc/booth/booth.conf.example to /etc/booth/booth.conf:

    # cp -p /etc/booth/booth.conf.example /etc/booth/booth.conf
  3. Edit /etc/booth/booth.conf according to Example 4.1, “A Booth Configuration File”.

  4. Verify your changes and save the file.

  5. On all cluster nodes and arbitrators, open the port in the firewall that you have configured for booth. See Example 4.1, “A Booth Configuration File”, position 2.

4.2.2 Setting Up Booth with YaST

  1. Log in to a cluster node as root or equivalent.

  2. Start the YaST Geo Cluster module.

  3. Choose to Edit an existing booth configuration file or click Add to create a new booth configuration file:

    1. In the screen that appears configure the following parameters:

      • Configuration File.  A name for the booth configuration file. YaST suggests booth by default. This results in the booth configuration being written to /etc/booth/booth.conf. Only change this value if you need to set up multiple booth instances for different Geo clusters as described in Section 4.3, “Using a Multi-Tenant Booth Setup”.

      • Transport.  The transport protocol used for communication between the sites. Only UDP is supported, but other transport layers will follow in the future. See also Example 4.1, “A Booth Configuration File”, position 1.

      • Port.  The port to be used for communication between the booth instances at each site. See also Example 4.1, “A Booth Configuration File”, position 2.

      • Arbitrator.  The IP address of the machine to use as arbitrator. See also Example 4.1, “A Booth Configuration File”, position 3.

        To specify an Arbitrator, click Add. In the dialog that opens, enter the IP address of your arbitrator and click OK.

      • Site.  The IP address used for the boothd on a site. See also Example 4.1, “A Booth Configuration File”, position 4.

        To specify a Site of your Geo cluster, click Add. In the dialog that opens, enter the IP address of one site and click OK.

      • Ticket.  The ticket to be managed by booth. See also Example 4.1, “A Booth Configuration File”, position 6.

        To specify a Ticket, click Add. In the dialog that opens, enter a unique Ticket name. If you need to define multiple tickets with the same parameters and values, save configuration effort by creating a ticket template that specifies the default parameters and values for all tickets. To do so, use __default__ as Ticket name.

      • Authentication.  To enable authentication for booth, click Authentication and in the dialog that opens, activate Enable Security Auth. If you already have an existing key, specify the path and file name in Authentication file. To generate a key file for a new Geo cluster, click Generate Authentication Key File. The key will be created and written to the location specified in Authentication file.

        Additionally, you can specify optional parameters for your ticket. For an overview, see Example 4.1, “A Booth Configuration File”, positions 7 to 14.

        Click OK to confirm your changes.

      Example Ticket Dependency
      Figure 4.1: Example Ticket Dependency
    2. Click OK to close the current booth configuration screen. YaST shows the name of the booth configuration file that you have defined.

  4. Before closing the YaST module, switch to the Firewall Configuration category.

  5. To open the port you have configured for booth, enable Open Port in Firewall.

    Important
    Important: Firewall Setting for Local Machine Only

    The firewall setting is only applied to the current machine. It will open the UDP/TCP ports for all ports that have been specified in /etc/booth/booth.conf or any other booth configuration files (see Section 4.3, “Using a Multi-Tenant Booth Setup”).

    Make sure to open the respective ports on all other cluster nodes and arbitrators of your Geo cluster setup, too. Do so either manually or by synchronizing the following files with Csync2:

    • /etc/sysconfig/SuSEfirewall2

    • /etc/sysconfig/SuSEfirewall2.d/services/booth

  6. Click Finish to confirm all settings and close the YaST module. Depending on the NAME of the Configuration File specified in Step 3.a, the configuration is written to /etc/booth/NAME.conf.

4.3 Using a Multi-Tenant Booth Setup

For setups including multiple Geo clusters, it is possible to share the same arbitrator (as of SUSE Linux Enterprise High Availability 12). By providing several booth configuration files, you can start multiple booth instances on the same arbitrator, with each booth instance running on a different port. That way, you can use one machine to serve as arbitrator for different Geo clusters.

Let us assume you have two Geo clusters, one in EMEA (Europe, the Middle East and Africa), and one in the Asia-Pacific region (APAC).

To use the same arbitrator for both Geo clusters, create two configuration files in the /etc/booth directory: /etc/booth/emea.conf and /etc/booth/apac.conf. Both must minimally differ in the following parameters:

  • The port used for the communication of the booth instances.

  • The sites belonging to the different Geo clusters that the arbitrator is used for.

Example 4.2: /etc/booth/apac.conf
transport = UDP 1
port = 9133 2
arbitrator = 192.168.203.100 3
site = 192.168.2.254 4
site = 192.168.1.112 4
authfile = /etc/booth/authkey-apac 5
ticket ="tkt-db-apac-intern" 6
     timeout = 10 
     retries = 5 
     renewal-freq = 60 
     before-acquire-handler11 = /usr/share/booth/service-runnable12 db-apac-intern 13 
ticket = "tkt-db-apac-cust" 6
     timeout = 10 
     retries = 5 
     renewal-freq = 60 
     before-acquire-handler = /usr/share/booth/service-runnable db-apac-cust
Example 4.3: /etc/booth/emea.conf
transport = UDP 1
port = 9150 2
arbitrator = 192.168.203.100 3
site = 192.168.201.100 4
site = 192.168.202.100 4
authfile = /etc/booth/authkey-emea 5
ticket = "tkt-sap-crm" 6
     expire = 900 
     renewal-freq = 60 
     before-acquire-handler11 = /usr/share/booth/service-runnable12 sap-crm 13
ticket = "tkt-sap-prod" 6
     expire = 600 
     renewal-freq = 60 
     before-acquire-handler = /usr/share/booth/service-runnable sap-prod

1

The transport protocol used for communication between the sites. Only UDP is supported, but other transport layers will follow in the future. Currently, this parameter can therefore be omitted.

2

The port to be used for communication between the booth instances at each site. The configuration files use different ports to allow for start of multiple booth instances on the same arbitrator.

3

The IP address of the machine to use as arbitrator. In the examples above, we use the same arbitrator for different Geo clusters.

4

The IP address used for the boothd on a site. The sites defined in both booth configuration files are different, because they belong to two different Geo clusters.

5

Optional parameter. Enables booth authentication for clients and servers on the basis of a shared key. This parameter specifies the path to the key file. Use different key files for different tenants.

Key Requirements
  • The key can be either binary or text.

    If it is text, the following characters are ignored: leading and trailing white space, new lines.

  • The key must be between 8 and 64 characters long.

  • The key must belong to the user hacluster and the group haclient.

  • The key must be readable only by the file owner.

6

The ticket to be managed by booth. Theoretically the same ticket names can be defined in different booth configuration files—the tickets will not interfere because they are part of different Geo clusters that are managed by different booth instances. However, (for better overview) we advise to use distinct ticket names for each Geo cluster as shown in the examples above.

11

Optional parameter. If set, the specified command will be called before boothd tries to acquire or renew a ticket. On exit code other than 0, boothd relinquishes the ticket.

12

The service-runnable script referenced here is included in the product as an example. It is a simple script based on crm_simulate. It can be used to test whether a particular cluster resource can be run on the current cluster site. That means, it checks if the cluster is healthy enough to run the resource (all resource dependencies are fulfilled, the cluster partition has quorum, no dirty nodes, etc.). For example, if a service in the dependency-chain has a failcount of INFINITY on all available nodes, the service cannot be run on that site. In that case, it is of no use to claim the ticket.

13

The resource to be tested by the before-acquire-handler (in this case, by the service-runnable script). You need to reference the resource that is protected by the respective ticket.

Procedure 4.1: Using the Same Arbitrator for Different Geo Clusters
  1. Create different booth configuration files in /etc/booth as shown in Example 4.2, “/etc/booth/apac.conf and Example 4.3, “/etc/booth/emea.conf. Do so either manually or with YaST, as outlined in Section 4.2.2, “Setting Up Booth with YaST”.

  2. On the arbitrator, open the ports that are defined in any of the booth configuration files in /etc/booth.

  3. On the nodes belonging to the individual Geo clusters that the arbitrator is used for, open the port that is used for the respective booth instance.

  4. Synchronize the respective booth configuration files across all cluster nodes and arbitrators that use the same booth configuration. For details, see Section 4.4, “Synchronizing the Booth Configuration to All Sites and Arbitrators”.

  5. On the arbitrator, start the individual booth instances as described in Starting the Booth Services on Arbitrators for multi-tenancy setups.

  6. On the individual Geo clusters, start the booth service as described in Starting the Booth Services on Cluster Sites.

4.4 Synchronizing the Booth Configuration to All Sites and Arbitrators

Note
Note: Use the Same Booth Configuration On All Sites and Arbitrators

To make booth work correctly, all cluster nodes and arbitrators within one Geo cluster must use the same booth configuration.

You can use Csync2 to synchronize the booth configuration. For details, see Section 5.1, “Csync2 Setup for Geo Clusters” and Section 5.2, “Synchronizing Changes with Csync2”.

In case of any booth configuration changes, make sure to update the configuration files accordingly on all parties and to restart the booth services as described in Section 4.6, “Reconfiguring Booth While Running”.

4.5 Enabling and Starting the Booth Services

Starting the Booth Services on Cluster Sites

The booth service for each cluster site is managed by the booth resource group (that has either been configured automatically if you used the ha-cluster-init scripts for Geo cluster setup, or manually as described in Section 6.2, “Configuring a Resource Group for boothd). To start one instance of the booth service per site, start the respective booth resource group on each cluster site.

Starting the Booth Services on Arbitrators

Starting with SUSE Linux Enterprise 12, booth arbitrators are managed with systemd. The unit file is named booth@.service. The @ denotes the possibility to run the service with a parameter, which is in this case the name of the configuration file.

To enable the booth service on an arbitrator, use the following command:

# systemctl enable booth@booth

After the service has been enabled from command line, YaST Services Manager can be used to manage the service (as long as the service is not disabled). In that case, it will disappear from the service list in YaST the next time systemd is restarted.

The command to start the booth service depends on your booth setup:

  • If you are using the default setup as described in Section 4.2, only /etc/booth/booth.conf is configured. In that case, log in to each arbitrator and use the following command:

    # systemctl start booth@booth
  • If you are running booth in multi-tenancy mode as described in Section 4.3, you have configured multiple booth configuration files in /etc/booth. To start the services for the individual booth instances, use systemctl start booth@ NAME, where NAME stands for the name of the respective configuration file /etc/booth/NAME.conf.

    For example, if you have the booth configuration files /etc/booth/emea.conf and /etc/booth/apac.conf, log in to your arbitrator and execute the following commands:

    # systemctl start booth@emea
    # systemctl start booth@apac

This starts the booth service in arbitrator mode. It can communicate with all other booth daemons but in contrast to the booth daemons running on the cluster sites, it cannot be granted a ticket. Booth arbitrators take part in elections only. Otherwise, they are dormant.

4.6 Reconfiguring Booth While Running

In case you need to change the booth configuration while the booth services are already running, proceed as follows:

  1. Adjust the booth configuration files as desired.

  2. Synchronize the updated booth configuration files to all cluster nodes and arbitrators that are part of your Geo cluster. For details, see Chapter 5, Synchronizing Configuration Files Across All Sites and Arbitrators.

  3. Restart the booth services on the arbitrators and cluster sites as described in Section 4.5, “Enabling and Starting the Booth Services”. This does not have any effect on tickets that have already been granted to sites.

5 Synchronizing Configuration Files Across All Sites and Arbitrators

To replicate important configuration files across all nodes in the cluster and across Geo clusters, use Csync2. Csync2 can handle any number of hosts, sorted into synchronization groups. Each synchronization group has its own list of member hosts and its include/exclude patterns that define which files should be synchronized in the synchronization group. The groups, the host names belonging to each group, and the include/exclude rules for each group are specified in the Csync2 configuration file, /etc/csync2/csync2.cfg.

For authentication, Csync2 uses the IP addresses and pre-shared keys within a synchronization group. You need to generate one key file for each synchronization group and copy it to all group members.

Csync2 will contact other servers via a TCP port (by default 30865), and start remote Csync2 instances. For detailed information about Csync2, refer to http://oss.linbit.com/csync2/paper.pdf

5.1 Csync2 Setup for Geo Clusters

How to set up Csync2 for individual clusters with YaST is explained in the Administration Guide for SUSE Linux Enterprise High Availability, chapter Using the YaST Cluster Module, section Transferring the Configuration to All Nodes. However, YaST cannot handle more complex Csync2 setups, like those that are needed for Geo clusters. For the following setup, as shown in Figure 5.1, “Example Csync2 Setup for Geo Clusters”, configure Csync2 manually by editing the configuration files.

To adjust Csync2 for synchronizing files not only within local clusters but also across geographically dispersed sites, you need to define two synchronization groups in the Csync2 configuration:

  • A global group ha_global (for the files that need to be synchronized globally, across all sites and arbitrators belonging to a Geo cluster).

  • A group for the local cluster site ha_local (for the files that need to be synchronized within the local cluster).

For an overview of the multiple Csync2 configuration files for the two synchronization groups, see Figure 5.1, “Example Csync2 Setup for Geo Clusters”.

Example Csync2 Setup for Geo Clusters
Figure 5.1: Example Csync2 Setup for Geo Clusters

Authentication key files and their references are displayed in red. The names of Csync2 configuration files are displayed in blue, and their references are displayed in green. For details, refer to Example Csync2 Setup: Configuration Files.

Example Csync2 Setup: Configuration Files
/etc/csync2/csync2.cfg

The main Csync2 configuration file. It is kept short and simple on purpose and only contains the following:

  • The definition of the synchronization group ha_local. The group consists of two nodes (this-site-host-1 and this-site-host-2) and uses /etc/csync2/ha_local.key for authentication. A list of files to be synchronized for this group only is defined in another Csync2 configuration file, /etc/csync2/ha_local.cfg. It is included with the config statement.

  • A reference to another Csync2 configuration file, /etc/csync2.cfg/ha_global.cfg, included with the config statement.

/etc/csync2/ha_local.cfg

This file concerns only the local cluster. It specifies a list of files to be synchronized only within the ha_local synchronization group, as these files are specific per cluster. The most important ones are the following:

  • /etc/csync2/csync2.cfg, as this file contains the list of the local cluster nodes.

  • /etc/csync2/ha_local.key, the authentication key to be used for Csync2 synchronization within the local cluster.

  • /etc/corosync/corosync.conf, as this file defines the communication channels between the local cluster nodes.

  • /etc/corosync/authkey, the Corosync authentication key.

The rest of the file list depends on your specific cluster setup. The files listed in Figure 5.1, “Example Csync2 Setup for Geo Clusters” are only examples. If you also want to synchronize files for any site-specific applications, include them in ha_local.cfg, too. Even though ha_local.cfg is targeted at the nodes belonging to one site of your Geo cluster, the content may be identical on all sites. If you need different sets of hosts or different keys, adding extra groups may be necessary.

/etc/csync2.cfg/ha_global.cfg

This file defines the Csync2 synchronization group ha_global. The group spans all cluster nodes across multiple sites, including the arbitrator. As it is recommended to use a separate key for each Csync2 synchronization group, this group uses /etc/csync2/ha_global.key for authentication. The include statements define the list of files to be synchronized within the ha_global synchronization group. The most important ones are the following:

  • /etc/csync2/ha_global.cfg and /etc/csync2/ha_global.key (the configuration file for the ha_globalsynchronization group and the authentication key used for synchronization within the group).

  • /etc/booth/, the default directory holding the booth configuration. In case you are using a booth setup for multiple tenants, it contains more than one booth configuration file. If you use authentication for booth, it is useful to place the key file in this directory, too.

  • /etc/drbd.conf and /etc/drbd.d (if you are using DRBD within your cluster setup). The DRBD configuration can be globally synchronized, as it derives the configuration from the host names contained in the resource configuration file.

  • /etc/zypp/repos.de. The package repositories are likely to be the same on all cluster nodes.

The other files shown (/etc/root/*) are examples that may be included for reasons of convenience (to make a cluster administrator's life easier).

Note
Note

The files csync2.cfg and ha_local.key are site-specific, which means you need to create different ones for each cluster site. The files are identical on the nodes belonging to the same cluster but different on another cluster. Each csync2.cfg file needs to contain a lists of hosts (cluster nodes) belonging to the site, plus a site-specific authentication key.

The arbitrator needs a csync2.cfg file, too. It only needs to reference ha_global.cfg though.

5.2 Synchronizing Changes with Csync2

To successfully synchronize the files with Csync2, the following prerequisites must be met:

  • The same Csync2 configuration is available on all machines that belong to the same synchronization group.

  • The Csync2 authentication key for each synchronization group must be available on all members of that group.

  • Csync2 must be running on all nodes and the arbitrator.

Before the first Csync2 run, you therefore need to make the following preparations:

  1. Log in to one machine per synchronization group and generate an authentication key for the respective group:

    # csync2 -k NAME_OF_KEYFILE

    However, do not regenerate the key file on any other member of the same group.

    With regard to Figure 5.1, “Example Csync2 Setup for Geo Clusters”, this would result in the following key files: /etc/csync2/ha_global.key and one local key (/etc/csync2/ha_local.key) per site.

  2. Copy each key file to all members of the respective synchronization group. With regard to Figure 5.1, “Example Csync2 Setup for Geo Clusters”:

    1. Copy /etc/csync2/ha_global.key to all parties (the arbitrator and all cluster nodes on all sites of your Geo cluster). The key file needs to be available on all hosts listed within the ha_global group that is defined in ha_global.cfg.

    2. Copy the local key file for each site (/etc/csync2/ha_local.key) to all cluster nodes belonging to the respective site of your Geo cluster.

  3. Copy the site-specific /etc/csync2/csync2.cfg configuration file to all cluster nodes belonging to the respective site of your Geo cluster and to the arbitrator.

  4. Execute the following command on all nodes and the arbitrator to make the csync2 service start automatically at boot time:

    # systemctl enable csync2.socket
  5. Execute the following command on all nodes and the arbitrator to start the service now:

    # systemctl start csync2.socket
Procedure 5.1: Synchronizing Files with Csync2
  1. To initially synchronize all files once, execute the following command on the machine that you want to copy the configuration from:

    # csync2 -xv

    This will synchronize all the files once by pushing them to the other members of the synchronization groups. If all files are synchronized successfully, Csync2 will finish with no errors.

    If one or several files that are to be synchronized have been modified on other machines (not only on the current one), Csync2 will report a conflict. You will get an output similar to the one below:

    While syncing file /etc/corosync/corosync.conf:
    ERROR from peer site-2-host-1: File is also marked dirty here!
    Finished with 1 errors.
  2. If you are sure that the file version on the current machine is the best one, you can resolve the conflict by forcing this file and re-synchronizing:

    # csync2 -f /etc/corosync/corosync.conf
    # csync2 -x

For more information on the Csync2 options, run csync2  -help.

Note
Note: Pushing Synchronization After Any Changes

Csync2 only pushes changes. It does not continuously synchronize files between the machines.

Each time you update files that need to be synchronized, you need to push the changes to the other machines of the same synchronization group: Run csync2  -xv on the machine where you did the changes. If you run the command on any of the other machines with unchanged files, nothing will happen.

6 Configuring Cluster Resources and Constraints

Apart from the resources and constraints that you need to define for your specific cluster setup, Geo clusters require additional resources and constraints as described below. You can either configure them with the crm shell (crmsh) as demonstrated in the examples below, or with the HA Web Console (Hawk2).

This chapter focuses on tasks specific to Geo clusters. For an introduction to your preferred cluster management tool and general instructions on how to configure resources and constraints with it, refer to one of the following chapters in the Administration Guide for SUSE Linux Enterprise High Availability:

  • Book “Administration Guide”, Chapter 6 “Configuring and Managing Cluster Resources with Hawk2”

  • Book “Administration Guide”, Chapter 7 “Configuring and Managing Cluster Resources (Command Line)”

If you have set up your Geo cluster with the bootstrap scripts, the cluster resources needed for booth have been configured already (including a resource group for boothd). In this case, you can skip Section 6.2 and only need to execute the remaining steps below to complete the cluster resource configuration.

If you are setting up your Geo cluster manually, you need to execute all of the following steps:

Important
Important: No CIB Synchronization Across Sites

The CIB is not automatically synchronized across cluster sites of a Geo cluster. All resources that must be highly available across the Geo cluster need to be configured for each site accordingly or need to be transferred to the other site or sites.

To simplify transfer, any resources with site-specific parameters can be configured in such a way that the parameters' values depend on the name of the cluster site where the resource is running (see also Chapter 3, Requirements, Other Requirements and Recommendations).

After you have configured the resources on one site, you can tag the resources that are needed on all cluster sites, export them from the current CIB, and import them into the CIB of another cluster site. For details, see Section 6.4, “Transferring the Resource Configuration to Other Cluster Sites”.

6.1 Configuring Ticket Dependencies of Resources

For Geo clusters, you can specify which resources depend on a certain ticket. Together with this special type of constraint, you can set a loss-policy that defines what should happen to the respective resources if the ticket is revoked. The attribute loss-policy can have the following values:

  • fence: Fence the nodes that are running the relevant resources.

  • stop: Stop the relevant resources.

  • freeze: Do nothing to the relevant resources.

  • demote: Demote relevant resources that are running in master mode to slave mode.

Procedure 6.1: Configuring Ticket Dependencies of Resources with crmsh
  1. On one of the nodes of cluster amsterdam, start a shell and log in as root or equivalent.

  2. Enter crm configure to switch to the interactive crm shell.

  3. Configure constraints that define which resources depend on a certain ticket. For example, to make a primitive resource rsc1 depend on ticketA:

    crm(live)configure# rsc_ticket rsc1-req-ticketA ticketA: \
      rsc1 loss-policy="fence"

    In case ticketA is revoked, the node running the resource should be fenced.

  4. If you want other resources to depend on further tickets, create as many constraints as necessary with rsc_ticket.

  5. Review your changes with show.

  6. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

    The configuration is saved to the CIB.

6.2 Configuring a Resource Group for boothd

If you have set up your Geo cluster with the ha-cluster-init bootstrap scripts, you can skip the following procedure as the resources and the resource group for boothd have already been configured in this case.

Each site needs to run one instance of boothd that communicates with the other booth daemons. The daemon can be started on any node, therefore it should be configured as primitive resource. To make the boothd resource stay on the same node, if possible, add resource stickiness to the configuration. As each daemon needs a persistent IP address, configure another primitive with a virtual IP address. Group both primitives:

  1. On one of the nodes of cluster amsterdam, start a shell and log in as root or equivalent.

  2. Enter crm configure to switch to the interactive crm shell.

  3. Enter the following to create both primitive resources and to add them to one group, g-booth:

    crm(live)configure# primitive ip-booth ocf:heartbeat:IPaddr2 \
      params iflabel="ha" nic="eth1" cidr_netmask="24"
      params rule #cluster-name eq amsterdam ip="192.168.201.100" \
      params rule #cluster-name eq berlin ip="192.168.202.100"
    crm(live)configure# primitive booth-site ocf:pacemaker:booth-site \
      meta resource-stickiness="INFINITY" \
      params config="nfs" op monitor interval="10s"
    crm(live)configure# group g-booth ip-booth booth-site

    With this configuration, each booth daemon will be available at its individual IP address, independent of the node the daemon is running on.

  4. Review your changes with show.

  5. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

    The configuration is saved to the CIB.

6.3 Adding an Ordering Constraint

If a ticket has been granted to a site but all nodes of that site should fail to host the boothd resource group for any reason, a split-brain situation among the geographically dispersed sites may occur. In that case, no boothd instance would be available to safely manage failover of the ticket to another site. To avoid a potential concurrency violation of the ticket (the ticket is granted to multiple sites simultaneously), add an ordering constraint:

  1. On one of the nodes of cluster amsterdam, start a shell and log in as root or equivalent.

  2. Enter crm configure to switch to the interactive crm shell.

  3. Create an ordering constraint, for example:

    crm(live)configure# order o-booth-before-rsc1 inf: g-booth rsc1

    It defines that rsc1 (which depends on ticketA) can only be started after the g-booth resource group.

  4. For any other resources that depend on a certain ticket, define further ordering constraints.

  5. Review your changes with show.

  6. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

    The configuration is saved to the CIB.

6.4 Transferring the Resource Configuration to Other Cluster Sites

After having completed or changed your resource configuration for one cluster site, transfer it to the other sites of your Geo cluster.

To simplify the transfer, you can tag any resources that are needed on all cluster sites, export them from the current CIB, and import them into the CIB of another cluster site. Tagging does not create any colocation or ordering relationship between the resources.

Procedure 6.2, “Tagging and Exporting a Resource Configuration” and Procedure 6.3, “Importing a Tagged Resource Configuration” give an example of how to do so. They are based on the following prerequisites:

Prerequisites
  • You have a Geo cluster with two sites: cluster amsterdam and cluster berlin.

  • The cluster names for each site are defined in the respective /etc/corosync/corosync.conf files:

    totem {
         [...]
         cluster_name: amsterdam
         }

    This can either be done manually (by editing /etc/corosync/corosync.conf) or with the YaST cluster module (by switching to the Communication Channels category and defining a Cluster Name). Afterward, stop and start the pacemaker service for the changes to take effect:

    # systemctl stop pacemaker
    # systemctl start pacemaker
  • The necessary resources for booth and for all services that should be highly available across your Geo cluster have been configured in the CIB on site amsterdam. They will be imported to the CIB on site berlin.

Procedure 6.2: Tagging and Exporting a Resource Configuration
  1. Log in to one of the nodes of cluster amsterdam.

  2. Start the cluster with:

    # systemctl start pacemaker
  3. Enter crm configure to switch to the interactive crm shell.

  4. Review the current CIB configuration:

    crm(live)configure# show
  5. Mark the resources and constraints that are needed across the Geo cluster with the tag geo_resources:

    crm(live)configure# tag geo_resources: \
      LIST_OF_RESOURCES_and_CONSTRAINTS_FOR_REQUIRED_SERVICES 1\
      rsc1-req-ticketA ip-booth booth-site g-booth o-booth-before-rsc1 2

    1

    Any resources and constraints of your specific setup that you need on all sites of the Geo cluster (for example, resources for DRBD as described at https://documentation.suse.com/sbp/all/html/SBP-DRBD/index.html).

    2

    Resources and constraints for boothd (primitives, booth resource group, ticket dependency, additional ordering constraint), see Section 6.1 to Section 6.3.

  6. Review your changes with show.

  7. If the configuration is according to your wishes, submit your changes with submit and leave the crm live shell with exit.

  8. Export the tagged resources and constraints to a file named exported.cib:

    # crm configure show tag:geo_resources geo_resources > exported.cib

    The command crm configure show tag:TAGNAME shows all resources that belong to the tag TAGNAME.

Procedure 6.3: Importing a Tagged Resource Configuration

To import the saved configuration file into the CIB of the second cluster site, proceed as follows:

  1. Log in to one of the nodes of cluster berlin.

  2. Start the cluster with:

    # systemctl start pacemaker
  3. Copy the file exported.cib from cluster amsterdam to this node.

  4. Import the tagged resources and constraints from the file exported.cib into the CIB of cluster berlin:

    # crm configure load update PATH_TO_FILE/exported.cib

    When using the update parameter for the crm configure load command, crmsh tries to integrate the contents of the file into the current CIB configuration (instead of replacing the current CIB with the file contents).

  5. View the updated CIB configuration with the following command:

    # crm configure show

    The imported resources and constraints will appear in the CIB.

7 Setting Up IP Relocation via DNS Update

In case one site of your Geo cluster is down and a ticket failover appears, you usually need to adjust the network routing accordingly (or you need to have configured a network failover for each ticket). Depending on the kind of service that is bound to a ticket, there is an alternative solution to reconfiguring the routing: You can use dynamic DNS update and instead change the IP address for a service.

The following prerequisites must be fulfilled for this scenario:

Example 7.1, “Resource Configuration for Dynamic DNS Update” illustrates how to use the ocf:heartbeat:dnsupdate resource agent to manage the nsupdate command. The resource agent supports both IPv4 and IPv6.

Example 7.1: Resource Configuration for Dynamic DNS Update
crm(live)configure# primitive dns-update-ip ocf:heartbeat:dnsupdate params \
  hostname="www.domain.com"1 ip="192.168.3.4"2\
  keyfile="/etc/whereever/Kgeo-update*.key"3\
  server="192.168.1.1"4 serverport="53"5

1

Host name bound to the service that needs to fail over together with the ticket. The IP address of this host name needs to be updated via dynamic DNS.

2

IP address of the server hosting the service to be migrated. The IP address specified here can be under cluster control, too. This does not handle local failover, but it ensures that outside parties will be directed to the right site after a ticket failover.

3

Path to the public key file generated with dnssec-keygen.

4

IP address of the DNS server to send the updates to. If no server is provided, this defaults to the master server for the correct zone.

5

Port to use for communication with the DNS server. This option will only take effect if a DNS server is specified.

With the resource configuration above, the resource agent takes care of removing the failed Geo cluster site from the DNS record and changing the IP for a service via dynamic DNS update.

8 Managing Geo Clusters

Before booth can manage a certain ticket within the Geo cluster, you initially need to grant it to a site manually—either with the booth command line client or with Hawk2.

8.1 Managing Tickets From Command Line

Use the booth client command line tool to grant, list, or revoke tickets as described in Overview of booth client Commands. The booth client commands can be run on any machine in the cluster, not only the ones having the boothd running. The booth client commands try to find the local cluster by looking at the booth configuration file and the locally defined IP addresses. If you do not specify a site which the booth client should connect to (using the -s option), it will always connect to the local site.

Note
Note: Syntax Changes

The syntax of booth client commands has been simplified since SUSE Linux Enterprise High Availability 11. The former syntax is still supported. For detailed information, see the Synopsis section in the booth man page.

The examples in this manual use the simplified syntax.

Overview of booth client Commands
Listing All Tickets
# booth list
ticket: ticketA, leader: none
ticket: ticketB, leader: 10.2.12.101, expires: 2014-08-13 10:28:57

If you do not specify a certain site with -s, the information about the tickets will be requested from the local booth instance.

Granting a Ticket to a Site
# booth grant -s 192.168.201.100 ticketA
booth[27891]: 2014/08/13_10:21:23 info: grant request sent, waiting for the result ...
booth[27891]: 2014/08/13_10:21:23 info: grant succeeded!

In this case, ticketA will be granted to the site 192.168.201.100. Without the -s option, booth would automatically connect to the current site (the site you are running the booth client on) and would request the grant operation.

Before granting a ticket, the command executes a sanity check. If the same ticket is already granted to another site, you are warned about that and are prompted to revoke the ticket from the current site first.

Revoking a Ticket From a Site
# booth revoke ticketA
booth[27900]: 2014/08/13_10:21:23 info: revoke succeeded!

Booth checks to which site the ticket is currently granted and requests the revoke operation for ticketA. The revoke operation will be executed immediately.

The grant and (under certain circumstances), revoke operations may take a while to return a definite operation's outcome. The client waits for the result up to the ticket's timeout value before it gives up waiting. If the -w option was used, the client will wait indefinitely instead. Find the exact status in the log files or with the crm_ticket -L command.

Warning
Warning: crm_ticket and crm site ticket

If the booth service is not running for any reasons, you may also manage tickets manually with crm_ticket or crm site ticket. Both commands are only available on cluster nodes. Use them with great care as they cannot verify if the same ticket is already granted elsewhere. For more information, read the man pages.

As long as booth is up and running, only use the booth client for manual intervention.

After you have initially granted a ticket to a site, the booth mechanism takes over and manages the ticket automatically. If the site holding a ticket should be out of service, the ticket is automatically revoked after the expiry time and granted to another site. The resources that depend on that ticket fail over to the new site that holds the ticket. The loss-policy set within the constraint specifies what happens to the nodes that have run the resources before.

Procedure 8.1: Managing Tickets Manually

Assuming that you want to manually move ticketA from site amsterdam (with the virtual IP 192.168.201.100) to site berlin (with the virtual IP 192.168.202.100), proceed as follows:

  1. Set ticketA to standby with the following command:

    # crm_ticket -t ticketA -s
  2. Wait for any resources that depend on ticketA to be stopped or demoted cleanly.

  3. Revoke ticketA from site amsterdam with:

    # booth revoke -s 192.168.201.100 ticketA
  4. After the ticket has been revoked from its original site, grant it to the site berlin with:

    # booth grant -s 192.168.202.100 ticketA

8.2 Managing Tickets With Hawk2

Tickets can be viewed in both the Dashboard and the Status view. Hawk2 displays the following ticket statuses:

  • Granted: Tickets that are granted to the current site.

  • Elsewhere: Tickets that are granted to another site.

  • Revoked: Tickets that have been revoked. Additionally, Hawk2 also displays tickets as revoked if they are referenced in a ticket dependency, but have not been granted to any site yet.

Note
Note: Granting Tickets to Current Site and Revoking Tickets

Though you can view tickets for all sites with Hawk2, any grant or revoke operations triggered by Hawk2 only apply to the current site (that you are currently connected to with Hawk2). To grant a ticket to another site of your Geo cluster, start Hawk2 on one of the cluster nodes belonging to the respective site.

You can only grant tickets that are not already given to any site.

Procedure 8.2: Viewing, Granting and Revoking Tickets with Hawk2
  1. Start a Web browser and log in to Hawk2.

  2. In the left navigation bar, select Status.

    Along with information about cluster nodes and resources, Hawk2 also displays a Tickets category. It lists the ticket status, the ticket name and when the ticket was last granted. From the Granted column you can manage the tickets.

  3. To show further information about the ticket, along with information about the cluster sites and arbitrators, click the Details icon next to the ticket.

    Hawk2—Ticket Details
    Figure 8.1: Hawk2—Ticket Details
  4. To revoke a granted ticket from the current site or to grant a ticket to the current site, click the switch in the Granted column next to the ticket. On clicking, it shows the available action. Confirm your choice when Hawk2 prompts for a confirmation.

    If the ticket cannot be granted or revoked for any reason, Hawk2 shows an error message. If the ticket has been successfully granted or revoked, Hawk2 will update the ticket Status.

Procedure 8.3: Simulating Granting and Revoking Tickets

Hawk2's Batch Mode allows you to explore failure scenarios before they happen. To explore whether your resources that depend on a certain ticket behave as expected, you can also test the impact of granting or revoking tickets.

  1. Start a Web browser and log in to Hawk2.

  2. From the top-level row, select Batch Mode.

  3. In the batch mode bar, click Show to open the Batch Mode window.

  4. To simulate a status change of a ticket:

    1. Click Inject › Ticket Event.

    2. Select the Ticket you want to manipulate and select the Action you want to simulate.

    3. Confirm your changes. Your event is added to the queue of events listed in the Batch Mode dialog. Any event listed here is simulated immediately and is reflected on the Status screen.

    4. Close the Batch Mode dialog and review the simulated changes.

  5. To leave the batch mode, either Apply or Discard the simulated changes.

Hawk2 Simulator—Tickets
Figure 8.2: Hawk2 Simulator—Tickets

For more information about Hawk2's Batch Mode (and which other scenarios can be explored with it), refer to Book “Administration Guide”, Chapter 6 “Configuring and Managing Cluster Resources with Hawk2”, Section 6.9 “Using the Batch Mode”.

9 Troubleshooting

Booth uses the same logging mechanism as the CRM. Thus, changing the log level will also take effect on booth logging. The booth log messages also contain information about any tickets.

Both the booth log messages and the booth configuration file are included in the crm report.

In case of unexpected booth behavior or any problems, check the logging data with sudo journalctl -n or create a detailed cluster report with crm report.

In case you can access the cluster nodes on all sites (plus the arbitrators) from one single host via SSH, it is possible to collect log files from all of them within the same crm report. When calling crm report with the -n option, it gets the log files from all hosts that you specify with -n. (Without -n, it would try to obtain the list of nodes from the respective cluster). For example, to create a single crm report that includes the log files from two two-node clusters (192.168.201.111|192.168.201.112 and 192.168.202.111|192.168.202.112) plus an arbitrator (147.2.207.14), use the following command:

#  crm report -n "147.2.207.14 192.168.201.111 192.168.201.112 \
 192.168.202.111 192.168.202.112"  -f 10:00 -t 11:00 db-incident

If the issue is about booth only and you know on which cluster nodes (within a site) booth is running, then specify only those two nodes plus the arbitrator.

If there is no way to access all sites from one host, run crm report individually on the arbitrator, and on the cluster nodes of the individual sites, specifying the same period of time. To collect the log files on an arbitrator, you must use the -S option for single node operation:

amsterdam # crm report -f 10:00 -t 11:00 db-incident-amsterdam
berlin # crm report -f 10:00 -t 11:00 db-incident-berlin
arbitrator # crm report -S -f 10:00 -t 11:00 db-incident-arbitrator

However, it is preferable to produce one single crm report for all machines that you need log files from.

10 Upgrading to the Latest Product Version

For general instructions on how to upgrade a cluster, see Book “Administration Guide”, Chapter 24 “Upgrading Your Cluster and Updating Software Packages”. The chapter also describes which preparations to take care of before starting the upgrade process.

Table 10.1: Supported Upgrade Paths for SLE HA and SLE HA Geo

Upgrade From ... To

Upgrade Path

For Details See

SLE HA 11 SP3 to SLE HA (Geo) 12

Cluster Offline Upgrade

  • Base System: SUSE Linux Enterprise Server 12 Deployment Guide, part Updating and Upgrading SUSE Linux Enterprise

  • SLE HA: Administration Guide for SUSE Linux Enterprise High Availability 12, chapter Upgrading Your Cluster and Updating Software Packages, section Cluster Offline Upgrade

  • SLE HA Geo: Cluster Offline Upgrade (New Booth Mechanism)

SLE HA (Geo) 11 SP4 to SLE HA (Geo) 12 SP1

Cluster Offline Upgrade

  • Base System: SUSE Linux Enterprise Server 12 SP1 Deployment Guide, part Updating and Upgrading SUSE Linux Enterprise

  • SLE HA: Administration Guide for SUSE Linux Enterprise High Availability 12 SP1, chapter Upgrading Your Cluster and Updating Software Packages, section Offline Migration

  • SLE HA Geo: Cluster Node and Arbitrator Upgrade

SLE HA (Geo) 12 to SLE HA (Geo) 12 SP1

Cluster Rolling Upgrade

  • Base System: SUSE Linux Enterprise Server 12 SP1 Deployment Guide, part Updating and Upgrading SUSE Linux Enterprise

  • SLE HA: Administration Guide for SUSE Linux Enterprise High Availability 12 SP1, chapter Upgrading Your Cluster and Updating Software Packages, section Rolling Upgrade

  • SLE HA Geo: Cluster Node and Arbitrator Upgrade

SLE HA (Geo) 12 SP1 to SLE HA (Geo) 12 SP2

Cluster Rolling Upgrade

  • Base System: SUSE Linux Enterprise Server 12 SP2 Deployment Guide, part Updating and Upgrading SUSE Linux Enterprise

  • SLE HA: Administration Guide for SUSE Linux Enterprise High Availability 12 SP2, chapter Upgrading Your Cluster and Updating Software Packages, section Rolling Upgrade

  • SLE HA Geo: Cluster Node and Arbitrator Upgrade

  • DRBD 8 to DRBD 9: Administration Guide for SUSE Linux Enterprise High Availability 12 SP2, chapter DRBD, section Migrating from DRBD 8 to DRBD 9

SLE HA (Geo) 12 SP2 to SLE HA (Geo) 12 SP3

Cluster Rolling Upgrade

  • Base System: SUSE Linux Enterprise Server 12 SP3 Deployment Guide, part Updating and Upgrading SUSE Linux Enterprise

  • SLE HA: Administration Guide for SUSE Linux Enterprise High Availability 12 SP3, chapter Upgrading Your Cluster and Updating Software Packages, section Rolling Upgrade

  • SLE HA Geo: Cluster Node and Arbitrator Upgrade

SLE HA (Geo) 12 SP3 to SLE HA (Geo) 12 SP4

Cluster Rolling Upgrade

  • Base System: SUSE Linux Enterprise Server 12 SP4 Deployment Guide, part Updating and Upgrading SUSE Linux Enterprise

  • SLE HA: Administration Guide for SUSE Linux Enterprise High Availability 12 SP4, chapter Upgrading Your Cluster and Updating Software Packages, section Rolling Upgrade

  • SLE HA Geo: Cluster Node and Arbitrator Upgrade

SLE HA (Geo) 12 SP4 to SLE HA (Geo) 12 SP5

Cluster Rolling Upgrade

  • Base System: SUSE Linux Enterprise Server 12 SP5 Deployment Guide, part Updating and Upgrading SUSE Linux Enterprise

  • SLE HA: Administration Guide for SUSE Linux Enterprise High Availability 12 SP5, chapter Upgrading Your Cluster and Updating Software Packages, section Rolling Upgrade

  • SLE HA Geo: Cluster Node and Arbitrator Upgrade

10.1 Cluster Offline Upgrade (New Booth Mechanism)

The booth version (v0.1) in SUSE Linux Enterprise High Availability 11 to 11 SP3 was based on the Paxos algorithm. The booth version (v0.2) in SUSE Linux Enterprise High Availability 11 SP4 and SUSE Linux Enterprise High Availability 12 and 12 SPx is loosely based on raft and is incompatible with the one running v0.1. Therefore, a cluster rolling upgrade from any system running the old booth version to one running the new booth version is not possible. Instead, all cluster nodes must be offline and the cluster needs to be migrated as a whole as described in Procedure 10.1, “Performing a Cluster Offline Upgrade”.

Because of the new multi-tenancy feature, the new arbitrator init script cannot stop or test the status of the Paxos v0.1 arbitrator. On upgrade to v0.2, the arbitrator will be stopped, if running. The OCF resource-agent ocf:pacemaker:booth-site is capable of stopping and monitoring the booth v0.1 site daemon.

Procedure 10.1: Performing a Cluster Offline Upgrade
  1. For an upgrade of the cluster nodes, follow the instructions in the Administration Guide, chapter Upgrading Your Cluster and Updating Software Packages, section Cluster Offline Upgrade.

  2. If you use arbitrators outside of the cluster sites:

    1. Upgrade each arbitrator to the desired SUSE Linux Enterprise Server version. To find the details for the individual upgrade processes, see Table 10.1, “Supported Upgrade Paths for SLE HA and SLE HA Geo”.

    2. Add the Geo clustering extension and install the packages as described in the Article “Geo Clustering Quick Start”.

  3. As the syntax and the consensus algorithm for booth has changed, you need to update the booth configuration files to match the latest requirements. Previously you could optionally specify expiry time and weights by appending them to the ticket name with a semicolon (;) as separator. The new syntax has separate tokens for all ticket options. See Chapter 4, Setting Up the Booth Services for details. If you did not specify expiry time or weights different from the defaults, and do not want to use the multi-tenancy feature, you can still use the old /etc/booth/booth.conf.

  4. Synchronize the updated booth configuration files across all cluster sites and arbitrators.

  5. Start the booth service on the cluster sites and the arbitrators as described in Section 4.5, “Enabling and Starting the Booth Services”.

10.2 Cluster Node and Arbitrator Upgrade

  1. For an upgrade of the cluster nodes, follow the instructions in the Administration Guide, chapter Upgrading Your Cluster and Updating Software Packages.

  2. If you use arbitrators outside of the cluster sites, proceed as follows for each arbitrator:

    1. Perform an upgrade to the desired target version of SUSE Linux Enterprise Server. To find the details for the individual upgrade processes, see Table 10.1, “Supported Upgrade Paths for SLE HA and SLE HA Geo”.

    2. Add the Geo clustering extension and install the packages as described in Article “Geo Clustering Quick Start”.

11 For More Information