Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE Linux Enterprise High Availability Extension 15 SP2

12 QDevice and QNetd Edit source

QDevice and QNetd participate in quorum decisions. With the assistance from the arbitrator corosync-qnetd, corosync-qdevice provides a configurable number of votes, so allowing a cluster to sustain more node failures than the standard quorum rules allow. We strongly recommend deploying corosync-qnetd and corosync-qdevice for two-node clusters, but using QNetd and QDevice is also recommended in general for clusters with an even number of nodes.

12.1 Conceptual Overview Edit source

In comparison to calculating quora among cluster nodes, the QDevice-and-QNetd approach has the following benefits:

  • It provides better sustainability in case of node failures.

  • You can write your own heuristics scripts to affect votes. This is especially useful for complex setups, such as SAP applications.

  • It enables you to configure a QNetd server to provide votes for multiple clusters.

  • Allows using diskless SBD for two-node clusters.

  • It helps with quorum decisions for clusters with an even number of nodes under split-brain situations, especially for two-node clusters.

A setup with QDevice/QNetd consists of the following components and mechanisms:

QDevice/QNetd Components and Mechanisms
QNetd (corosync-qnetd)

A systemd service (a daemon, the QNetd server) which is not part of the cluster. The systemd service provides a vote to the corosync-qdevice daemon.

To improve security, corosync-qnetd can work with TLS for client certificate checking.

QDevice (corosync-qdevice)

A systemd service (a daemon) on each cluster node running together with Corosync. This is the client of corosync-qnetd. Its primary use is to allow a cluster to sustain more node failures than standard quorum rules allow.

QDevice is designed to work with different arbitrators. However, currently, only QNetd is supported.

Algorithms

QDevice supports different algorithms which determine the behaviour how votes are assigned. Currently, the following exist:

  • FFSplit (fifty-fifty split is the default. It is used for clusters with an even number of nodes. If the cluster splits into two similar partitions, this algorithm provides one vote to one of the partitions, based on the results of heuristics checks and other factors.

  • LMS (last man standing) allows the only remaining node that can see the QNetd server to get the votes. So this algorithm is useful when a cluster with only one active node should remain quorate.

Heuristics

QDevice supports a set of commands (heuristics). The commands are executed locally on startup of cluster services, cluster membership change, successful connect to corosync-qnetd, or optionally, at regular times. The heuristics can be set with the quorum.device.heuristics key (in the corosync.conf file) or with the --qdevice-heuristics-mode option. Both know the values off (default), sync, and on. The difference between sync and on is you can additonally execute the above commands regularly.

Only if all commands executed successfully are the heuristics considered to have passed; otherwise, they failed. The heuristics' result is sent to corosync-qnetd where it is used in calculations to determine which partition should be quorate.

Tiebreaker

This is used as a fallback if the cluster partitions are completely equal even with the same heuristics results. It can be configured to be the lowest, the highest, or a specific node ID.

12.2 Requirements and Prerequisites Edit source

Before setting up QDevice and QNetd, you need to prepare the environment as the following:

  • In addition to the cluster nodes, you have a separate machine which will become the QNetd server. See Section 12.3, “Setting Up the QNetd Server”.

  • A different physical network than the one that Corosync uses. It is recommended for QDevice to reach the QNetd server. Ideally, the QNetd server should be in a separate rack than the main cluster, or at least on a separate PSU and not in the same network segment as the corosync ring or rings.

12.3 Setting Up the QNetd Server Edit source

The QNetd server is not part of the cluster stack, it is also not a real member of your cluster. As such, you cannot move resources to this server.

The QNetd server is almost state free. Usually, you do not need to change anything in the configuration file /etc/sysconfig/corosync-qnetd. By default, the corosync-qnetd service runs the daemon as user coroqnetd in the group coroqnetd. This avoids running the daemon as root.

To create a QNetd server, proceed as follows:

  1. On the machine that will become the QNetd server, install SUSE Linux Enterprise Server 15 SP2.

  2. Log in to the QNetd server and install the following package:

    root # zypper install corosync-qnetd

    You do not need to manually start the corosync-qnetd service. The bootstrap scripts will take care of the startup process during the qdevice stage.

Your QNetd server is ready to accept connections from a QDevice client corosync-qdevice. Further configuration is not needed.

12.4 Connecting QDevice Clients to the QNetd Server Edit source

After you have set up your QNetd server, you can set up and run the clients. You can connect the clients to the QNetd server during the installation of your cluster or you can add them later. In the following procedure we use the latter approach. We assume a cluster with two cluster nodes (alice and bob) and the QNetd server (charlie).

  1. On alice, initialize your cluster:

    root # crm cluster init -y
  2. On bob, join the cluster:

    root # crm cluster join -c alice -y
  3. On alice and bob, bootstrap the qdevice stage. Usually, in most cases the default settings are fine. Provide at least --qnetd-hostname and the hostname or IP address of the QNetd server (charlie in our case):

    root # crm cluster init qdevice --qnetd-hostname=charlie

    If you want to change the default settings, get a list of all possible options with the command crm cluster init qdevice --help. All options related to QDevice start with --qdevice-NAME.

If you have used the default settings, the command above creates a QDevice that has TLS enabled and uses the FFSplit algorithm.

12.5 Setting Up a QDevice with Heuristics Edit source

If you need additional control over how votes are determined, use heuristics. Heuristics are a set of commands which are executed in parallel.

For this purpose, the command crm cluster init qdevice provides the option --qdevice-heuristics. You can pass one or more commands (separated by semicolon) with absolute paths.

For example, if your own command for heuristic checks is located at /usr/sbin/my-script.sh you can run it on one of your cluster nodes as follows:

root # crm cluster init qdevice --qdevice-hostname=charlie \
     --qdevice-heuristics=/usr/sbin/my-script.sh  \
     --qdevice-heuristics-mode=on

The command or commands can be written in any language such as Shell, Python, or Ruby. If they succeed, they return 0 (zero), otherwise they return an error code.

You can also pass a set of commands. Only when all commands finish successfully (return code is zero), the heuristics have passed.

The --qdevice-heuristics-mode=on option lets the heuristics commands run regularily.

12.6 Checking and Showing Quorum Status Edit source

You can query the quorum status on one of your cluster nodes as shown in Example 12.1, “Status of QDevice”. It shows the status of your QDevice nodes.

Example 12.1: Status of QDevice
root # corosync-quorumtool 1
 Quorum information
------------------
Date:             ...
Quorum provider:  corosync_votequorum
Nodes:            2 2
Node ID:          3232235777 3
Ring ID:          3232235777/8
Quorate:          Yes 4

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
 3232235777         1    A,V,NMW 192.168.1.1 (local) 5
 3232235778         1    A,V,NMW 192.168.1.2 5
         0          1            Qdevice

1

As an alternative with an identical result, you can also use the crm corosync status quorum command.

2

The number of expected nodes we are expecting. In this example, it is a two-node cluster.

3

As the node ID is not explicitly specified in corosync.conf this ID is a 32-bit integer representation of the IP address. In this example, the value 3232235777 stands for the IP address 192.168.1.1.

4

The quorum status. In this case, the cluster has quorum.

5

The status for each cluster node means:

A (Alive) or NA (not alive)

Shows the connectivity status between QDevice and Corosync. If there is a heartbeat between QDevice and Corosync, it is shown as alive (A).

V (Vote) or NV (non vote)

Shows if the quorum device has given a vote (letter V) to the node. A letter V means that both nodes can communicate with each other. In a split-brain situation, one node would be set to V and the other node would be set to NV.

MW (Master wins) or NMW(not master wins)

Shows if the quorum device master_wins flag is set. By default, the flag is not set, so you see NMW (not master wins) See the man page votequorum_qdevice_master_wins(3) for more information.

NR (not registered)

Shows that the cluster is not using a quorum device.

If you want to query the status of the QNetd server, you get a similar output as shown in Example 12.2, “Status of QNetd Server”:

Example 12.2: Status of QNetd Server
root # corosync-qnetd-tool 1
Cluster "hacluster": 2
    Algorithm:          Fifty-Fifty split 3
    Tie-breaker:        Node with lowest node ID
    Node ID 3232235777: 4
        Client address:         ::ffff:192.168.1.1:54732
        HB interval:            8000ms
        Configured node list:   3232235777, 3232235778
        Ring ID:                aa10ab0.8
        Membership node list:   3232235777, 3232235778
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 3232235778:
        Client address:         ::ffff:192.168.1.2:43016
        HB interval:            8000ms
        Configured node list:   3232235777, 3232235778
        Ring ID:                aa10ab0.8
        Membership node list:   3232235777, 3232235778
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   No change (ACK)

1

As an alternative with an identical result, you can also use the crm corosync status qnetd command.

2

The name of your cluster as set in the configuration file /etc/corosync/corosync.conf in the totem.cluster_name section.

3

The algorithm currently used. In this example, it is FFSplit.

4

This is the entry for the node with the IP address 192.168.1.1. TLS is active.

12.7 For More Information Edit source

For additional information about QDevice and QNetd Man pages of corosync-qdevice(8), corosync-qnetd(8).

Print this page