Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / SUSE Linux Enterprise Server-Dokumentation / Storage Administration Guide / Network storage / Managing multipath I/O for devices
Applies to SUSE Linux Enterprise Server 15 SP4

18 Managing multipath I/O for devices

This section describes how to manage failover and path load balancing for multiple paths between the servers and block storage devices by using Multipath I/O (MPIO).

18.1 Understanding multipath I/O

Multipathing is the ability of a server to communicate with the same physical or logical block storage device across multiple physical paths between the host bus adapters in the server and the storage controllers for the device, typically in Fibre Channel (FC) or iSCSI SAN environments.

Linux multipathing provides connection fault tolerance and can provide load balancing across the active connections. When multipathing is configured and running, it automatically isolates and identifies device connection failures, and reroutes I/O to alternate connections.

Multipathing provides fault tolerance against connection failures, but not against failures of the storage device itself. The latter is achieved with complementary techniques like mirroring.

18.1.1 Multipath terminology

Storage array

A hardware device with many disks and multiple fabrics connections (controllers) that provides SAN or NAS storage to clients. Storage arrays typically have RAID and failover features and support multipathing. Historically, active/passive (failover) and active/active (load-balancing) storage array configurations were distinguished. These concepts still exist but they are merely special cases of the concepts of path groups and access states supported by modern hardware.

Host, host system

The computer running SUSE Linux Enterprise Server which acts as a client system for a storage array.

Multipath map, multipath device

A set of path devices. It represents a storage volume on a storage array and is seen as a single block device by the host system.

Path device, low-level device

A member of a multipath map, typically a SCSI device. Each path device represents a unique connection between the host computer and the actual storage volume, for example, a logical unit from an iSCSI session. Under Linux device mapper multipath, path devices remain visible and accessible in the host system.

WWID, UID, UUID

“World Wide Identifier”, “Unique Identifier”, “Universally Unique Identifier”. The WWID is a property of the storage volume and as such, it is identical between all path devices of a multipath map. multipath-tools uses the WWID to determine which low-level devices should be assembled into a multipath map. multipath relies on udev to determine the WWID of path devices. The WWID of a multipath map never changes. multipath devices can be reliably accessed through /dev/disk/by-id/dm-uuid-mpath-WWID.

The WWID should be distinguished from the map name, which is configurable (see Section 18.9, “Configuring user-friendly names or alias names”).

uevent, udev event

An event sent by the kernel to user space and processed by the udev subsystem. Uevents are generated when devices are added, removed, or change their properties.

Device mapper

A framework in the Linux kernel for creating virtual block devices. I/O operations to mapped devices are redirected to the underlying block devices. Device mappings may be stacked. The device mapper implements its own event signaling, also known as “device mapper events” or “dm events”.

18.2 Hardware support

The multipathing drivers and tools are available on all architectures supported by SUSE Linux Enterprise Server. The generic, protocol-agnostic driver works with most multipath-capable storage hardware on the market. Some storage array vendors provide their own multipathing management tools. Consult the vendor’s hardware documentation to determine what settings are required.

18.2.1 Multipath implementations: device mapper and NVMe

The traditional, generic implementation of multipathing under Linux uses the device mapper framework. For most device types like SCSI devices, device mapper multipathing is the only available implementation. Device mapper multipathing is highly configurable and flexible.

The Linux NVM Express (NVMe) kernel subsystem implements multipathing natively in the kernel. This implementation creates less computational overhead for NVMe devices, which are typically fast devices with very low latencies. Native NVMe multipathing requires no user space component. Since SLE 15, native multipathing has been the default for NVMe multipath devices.

The rest of this chapter is about device mapper multipath.

18.2.2 Storage array autodetection for multipathing

Device mapper multipath is a generic technology. Multipath device detection requires only that the low-level (for example, SCSI) devices are detected by the kernel, and that device properties reliably identify multiple low-level devices as being different “paths” to the same volume rather than actually different devices.

The multipath-tools package detects storage arrays by their vendor and product names. It has validated built-in configuration defaults for a large variety of storage products. Consult the hardware documentation of your storage array; some vendors provide specific recommendations for Linux multipathing configuration. To see the built-in settings for storage that has been detected on your system, run the command multipath -T, see Section 18.4.5, “The multipath command”.

If you need to apply changes to the built-in configuration for your storage array, create and configure the /etc/multipath.conf file. See Section 18.6, “Multipath configuration”.

Note
Note

multipath-tools has built-in presets for many storage arrays. The existence of such presets for a given storage product does not imply that the vendor of the storage product has tested the product with dm-multipath, nor that the vendor endorses or supports use of dm-multipath with the product. Always consult the original vendor documentation for support-related questions.

18.2.3 Storage arrays that require specific hardware handlers

Some storage arrays require special commands for failover from one path to the other, or non-standard error handling methods. These special commands and methods are implemented by hardware handlers in the Linux kernel. Modern SCSI storage arrays support the “Asymmetric Logical Unit Access” (ALUA) hardware handler defined in the SCSI standard. Besides ALUA, the SLE kernel contains hardware handlers for Netapp E-Series (RDAC), the Dell/EMC CLARiiON CX family of arrays, and legacy arrays from HP. Since Linux kernel 4.4, the Linux kernel has automatically detected hardware handlers for most arrays, including all arrays supporting ALUA.

18.3 Planning for multipathing

Use the guidelines in this section when planning your multipath I/O solution.

18.3.1 Prerequisites

  • The storage array you use for the multipathed device must support multipathing. For more information, see Section 18.2, “Hardware support”.

  • You need to configure multipathing only if multiple physical paths exist between host bus adapters in the server and host bus controllers for the block storage device.

  • For some storage arrays, the vendor provides its own multipathing software to manage multipathing for the array’s physical and logical devices. In this case, you should follow the vendor’s instructions for configuring multipathing for those devices.

  • When using multipathing in a virtualization environment, the multipathing is controlled in the host server environment. Configure multipathing for the device before you assign it to a virtual guest machine.

18.3.2 Multipath installation types and the initramfs

18.3.2.1 Root file system on multipath (SAN-boot)

The root file system is on a multipath device (usually, all other file systems are on multipath storage as well). This is typically the case for diskless servers that use SAN or NAS storage exclusively. On such systems, multipath support is required for booting, and multipathing must be enabled in the initramfs (initrd). See Section 18.3.2.3, “Keeping the initial RAM disk synchronized”.

18.3.2.2 Root file system on a local disk

The root file system (and possibly some other file systems) is in local storage, for example, on a directly attached SATA disk or local RAID, but the system additionally uses file systems in the multipath SAN or NAS storage. This system type can be configured in three different ways:

Using a root-on-multipath setup

All block devices are part of multipath maps, including the local disk. Such a setup then appears as a degraded multipath map with just one path. This configuration is created if multipathing was enabled during the initial system installation with YaST. It is the simplest configuration but has a performance overhead.

Ignoring the local disk by multipath-tools

In this configuration, multipathing is enabled in the initramfs. This configuration can be achieved after the installation by blacklisting, or with the find_multipaths configuration parameter.

Disabling multipathing in the initramfs

This setup is created if multipathing was not enabled during the initial system installation with YaST, either because YaST did not detect multipath devices or because the user opted against enabling multipath during installation. This is the only situation in which Section 18.3.2.3, “Keeping the initial RAM disk synchronized” does not apply.

18.3.2.3 Keeping the initial RAM disk synchronized

Important
Important

Make sure that the initial RAM disk and the booted system behave consistently regarding the use of multipathing for all block devices. Rebuild the initramfs after applying multipath configuration changes.

If multipathing is enabled in the system, it also needs to be enabled in the initramfs and vice versa. The only exception to this rule is the option Disabling multipathing in the initramfs in Section 18.3.2.2, “Root file system on a local disk”.

The multipath configuration must be synchronized between the booted system and the initrd. Therefore, if /etc/multipath.conf, /etc/multipath/wwids, /etc/multipath/bindings, or other configuration file or udev rules related to device identification are changed, the initial RAM FS needs to be rebuilt using the command:

> sudo dracut -f

If the initrd and the system are not synchronized, the system will not properly boot and the start-up procedure may result in an emergency shell. See Section 18.15.2, “The system exits to emergency shell at boot when multipath is enabled” for instructions on how to avoid or repair such a scenario.

Special care must be taken if the initial RAM disk is rebuilt in non-standard situations, for example, from a rescue system or after booting with the kernel parameter multipath=off. dracut will automatically include multipathing support in the initial RAM disk if and only if it detects that the root file system is on a multipath device while the initrd is being built. In such cases, it is necessary to enable or disable multipathing explicitly.

To enable multipath support in the initrd, run the command:

> sudo dracut --force --add multipath

To disable multipath support in initrd, run the command:

> sudo dracut --force --omit multipath

18.3.3 Disk management tasks

Use third-party SAN array management tools or the user interface of your storage array to create logical devices and assign them to hosts. Make sure to configure the host credentials correctly on both sides.

You can add or remove volumes to a running host, but detecting the changes may require rescanning SCSI targets and reconfiguring multipathing on the host.

18.3.4 Software RAID and complex storage stacks

Multipathing is set up on top of basic storage devices such as SCSI disks. In a multi-layered storage stack, multipathing is always the bottom layer. Other layers such as software RAID, Logical Volume Management, block device encryption, etc. are layered on top of it. Therefore, for each device that has multiple I/O paths and that you plan to use in a software RAID, you must configure the device for multipathing before you attempt to create the software RAID device.

For information about setting up multipathing for existing software RAIDs, see Section 18.12, “Configuring multipath I/O for an existing software RAID”.

18.3.5 High-availability solutions

High-availability solutions for clustering storage resources run on top of the multipathing service on each node. Make sure that the configuration settings in the /etc/multipath.conf file on each node are consistent across the cluster.

Make sure that multipath devices have the same name across all devices. Refer to Section 18.9.1, “Multipath device names in HA clusters” for details.

The Distributed Replicated Block Device (DRBD) high-availability solution for mirroring devices across a LAN runs on top of multipathing. For each device that has multiple I/O paths and that you plan to use in a DRDB solution, you must configure the device for multipathing before you configure DRBD.

Special care must be taken when using multipathing together with clustering software that relies on shared storage for fencing, such as pacemaker with sbd. See Section 18.7, “Configuring policies for polling, queuing, and failback” for details.

18.4 Multipath management tools

The multipathing support in SUSE Linux Enterprise Server is based on the Device Mapper Multipath module of the Linux kernel and the multipath-tools user space package. You can use the Multiple Devices Administration utility (multipath) to view the status of multipathed devices.

18.4.1 Device mapper multipath module

The Device Mapper Multipath (DM-MP) module provides the generic multipathing capability for Linux. DM-MPIO is the preferred solution for multipathing on SUSE Linux Enterprise Server for SCSI and DASD devices, and can be used for NVMe devices as well.

Note
Note: Using DM-MP for NVMe devices

Since SUSE Linux Enterprise Server 15, native NVMe multipathing (see Section 18.2.1, “Multipath implementations: device mapper and NVMe”) has been recommended for NVMe and used by default. To disable native NVMe multipathing and use device mapper multipath instead, boot with the kernel parameter nvme-core.multipath=0.

DM-MPIO features automatic configuration of the multipathing subsystem for a large variety of setups.

The multipath daemon multipathd takes care of automatic path discovery and grouping, and automated path retesting, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes the need for administrator attention in a production environment.

DM-MPIO protects against failures in the paths to the device, and not failures in the device itself. If one of the active paths is lost (for example, a network adapter breaks or a fiber-optic cable is removed), I/O is redirected to the remaining paths. If all active paths fail, inactive secondary paths must be woken up, so failover occurs with a delay of up to 30 seconds, depending on the properties of the storage array.

If every path to a given device has failed, I/O to this device can be queued in the kernel, either for a given amount of time or even indefinitely (in which case the total amount of queued IO is limited by system memory).

If a disk array has more than one storage processor, ensure that the SAN switch has a connection to the storage processor that owns the LUNs you want to access. On most disk arrays, all LUNs belong to both storage processors, so both connections are active.

Note
Note: Storage processors

On some disk arrays, the storage array manages the traffic through storage processors so that it presents only one storage processor at a time. One processor is active and the other one is passive until there is a failure. If you are connected to the wrong storage processor (the one with the passive path) you might not see the expected LUNs, or you might see the LUNs but get errors when you try to access them.

18.4.2 Multipath I/O management tools

The packages multipath-tools and kpartx provide tools that take care of automatic path discovery and grouping.

multipathd

The daemon to set up and monitor multipath maps, and a command line client to communicate with the daemon process. See Section 18.4.4, “The multipathd daemon and the multipath command”.

multipath

The command line tool for multipath operations. See Section 18.4.5, “The multipath command”.

kpartx

The command line tool for managing “partitions” on multipath devices. See Section 18.5.3, “Partitions on multipath devices”.

mpathpersist

The command line tool for managing SCSI persistent reservations. See Section 18.4.6, “The mpathpersist utility”.

18.4.3 MD RAID on multipath devices

MD RAID arrays on top of multipathing are set up automatically by the system's udev rules. No special configuration in /etc/mdadm.conf is necessary.

18.4.4 The multipathd daemon and the multipath command

multipathd is the most important part of a modern Linux device mapper multipath setup. It is normally started through the systemd service multipathd.service. Socket activation via multipathd.socket is supported, but it's strongly recommended to enable multipathd.service on systems with multipath hardware.

multipathd serves the following tasks (some of them depend on the configuration):

  • On startup, detects path devices and sets up multipath maps from detected devices.

  • Monitors uevents and device mapper events, adding or removing path mappings to multipath maps as necessary and initiating failover or failback operations.

  • Sets up new maps on the fly when new path devices are discovered.

  • Checks path devices at regular intervals to detect failure, and tests failed paths to reinstate them if they become operational again.

  • When all paths fail, multipathd either fails the map, or switches the map device to queuing mode for a given time interval.

  • Handles path state changes and switches path groups or regroups paths, as necessary.

  • Tests paths for “marginal” state, i.e. shaky fabrics conditions that cause path state flipping between operational and non-operational.

  • Handles SCSI persistent reservation keys for path devices if configured. See Section 18.4.6, “The mpathpersist utility”.

multipathd also serves as a command line client to process interactive commands by sending them to the running daemon. The general syntax to send commands to the daemon is as follows:

multipathd COMMAND

or

multipathd -k"COMMAND"

To enter the interactive mode with the daemon, run:

multipathd -k
Note
Note: How multipath and multipathd work together

Many multipathd commands have multipath equivalents. For example, multipathd show topology does the same thing as multipath -ll. The notable difference is that the multipathd command inquires the internal state of the running multipathd daemon, whereas multipath obtains information directly from the kernel and I/O operations.

If the multipath daemon is running, it is recommended to make modifications to the system by using the multipathd commands. Otherwise, the daemon may notice configuration changes and react to them. In some situations, the daemon might even try to undo the applied changes. Therefore, multipath automatically delegates certain possibly dangerous commands, like destroying and flushing maps, to multipathd if a running daemon is detected.

The list bellow describes frequently used multipathd commands:

show topology

Shows the current map topology and properties.

show paths

Shows the currently known path devices.

show paths format "FORMAT STRING"

Shows the currently known path devices using a format string. Use show wildcards to see a list of supported format specifiers.

show maps

Shows the currently configured map devices.

show maps format FORMAT STRING

Shows the currently configured map devices using a format string. Use show wildcards to see a list of supported format specifiers.

show config local

Shows the current configuration that multipathd is using.

reconfigure

Reread configuration files, rescan devices, and set up maps again. This is basically equivalent to a restart of multipathd. A few options cannot be modified without a restart. They are mentioned in the man page multipath.conf(5). The reconfigure command reloads only map devices that have changed in some way. To force reloading every map device, use reconfigure all.

del map MAP DEVICE NAME

Unconfigure and delete the given map device and its partitions. MAP DEVICE NAME can be a device node name like dm-0, a WWID, or a map name. The command fails if the device is in use.

Additional commands are available to modify path states, enable or disable queueing, and more. See multipathd(8) for details.

18.4.5 The multipath command

Even though multipath setup is mostly automatic and handled by multipathd, multipath is still useful for some administration tasks. Several examples of the command usage follow:

multipath

Detects path devices and configures all multipath maps that it finds.

multipath -d

Similar to multipath, but does not set up any maps (“dry run”).

multipath DEVICENAME

Configures a specific multipath device. DEVICENAME can denote a member path device by its device node name (/dev/sdb) or device number in major:minor format. Alternatively, it can be the WWID or name of a multipath map.

multipath -f DEVICENAME

Unconfigures ("flushes") a multipath map and its partition mappings. The command will fail if the map or one of its partitions is in use. See above for possible values of DEVICENAME.

multipath -F

Unconfigures ("flushes") all multipath maps and their partition mappings. The command will fail for maps in use.

multipath -ll

Displays the status and topology of all currently configured multipath devices.

multipath -ll DEVICENAME

Displays the status of a specified multipath device. See above for possible values of DEVICENAME.

multipath -t

Shows the internal hardware table and the active configuration of multipath. Refer to multipath.conf(5) for details about the configuration parameters.

multipath -T

Has a similar function as the multipath -t command but shows only hardware entries for the hardware detected on the host.

The option -v controls the verbosity of the output. You can use values between 0 (only fatal errors) and 4 (verbose logging). The default is -v2. The verbosity option in /etc/multipath.conf can be used to change the default verbosity for both multipath and multipathd.

18.4.6 The mpathpersist utility

The mpathpersist utility is used to manage SCSI persistent reservations on Device Mapper Multipath devices. Persistent reservations serve to restrict access to SCSI Logical Units to certain SCSI initiators. In multipath configurations, it is important to use the same reservation keys for all I_T nexuses (paths) for a given volume; otherwise, creating a reservation on one path may cause other paths to fail.

Use this utility with the reservation_key attribute in the /etc/multipath.conf file to set persistent reservations for SCSI devices. If (and only if) this option is set, the multipathd daemon checks persistent reservations for newly discovered paths or reinstated paths.

You can add the attribute to the defaults section or the multipaths section of multipath.conf. For example:

multipaths {
    multipath {
        wwid             3600140508dbcf02acb448188d73ec97d
        alias            yellow
        reservation_key  0x123abc
    }
}

After setting reservation_key parameter for all mpath devices applicable for persistent management, reload the configuration using multipathd reconfigure.

Note
Note: Using “reservation_key file

If the special value reservation_key file is used in the defaults section of multipath.conf, reservation keys can be managed dynamically in the file /etc/multipath/prkeys using mpathpersist.

This is the recommended way to handle persistent reservations with multipath maps. It is available from SUSE Linux Enterprise Server 12 SP4.

Use the command mpathpersist to query and set persistent reservations for multipath maps consisting of SCSI devices. Refer to the manual page mpathpersist(8) for details. The command line options are the same as those of the sg_persist from the sg3_utils package. The sg_persist(8) manual page explains the semantics of the options in detail.

In the following examples, DEVICE denotes a device mapper multipath device like /dev/mapper/mpatha. The commands below are listed with long options for better readability. All options have single-letter replacements, like in mpathpersist -oGS 123abc DEVICE.

mpathpersist --in --read-keys DEVICE

Read the registered reservation keys for the device.

mpathpersist --in --read-reservation DEVICE

Show existing reservations for the device.

mpathpersist --out --register --param-sark=123abc DEVICE

Register a reservation key for the device. This will add the reservation key for all I_T nexuses (path devices) on the host.

mpathpersist --out --reserve --param-rk=123abc --prout-type=5 DEVICE

Create a reservation of type 5 (“write exclusive - registrants only”) for the device, using the previously registered key.

mpathpersist --out --release --param-rk=123abc --prout-type=5 DEVICE

Release a reservation of type 5 for the device.

mpathpersist --out --register-ignore --param-sark=0 DEVICE

Delete a previously existing reservation key from the device.

18.5 Configuring the system for multipathing

18.5.1 Enabling, starting, and stopping multipath services

To enable multipath services to start at boot time, run the following command:

> sudo systemctl enable multipathd

To manually start the service in the running system, enter:

> sudo systemctl start multipathd

To restart the service, enter:

> sudo systemctl restart multipathd

In most situations, restarting the service is not necessary. To simply have multipathd reload its configuration, run:

> sudo systemctl reload multipathd

To check the status of the service, enter:

> sudo systemctl status multipathd

To stop the multipath services in the current session, run:

> sudo systemctl stop multipathd
> sudo systemctl stop multipathd.socket
Warning
Warning: Disabling multipathd

It is strongly recommended to have multipathd.service always enabled and running on every host that has access to multipath hardware. However, sometimes it may be necessary to disable the service because multipath hardware has been removed, because some other multipathing software is going to be deployed, or for troubleshooting purposes.

To disable multipathing just for a single system boot, use the kernel parameter multipath=off. This affects both the booted system and the initial ramfs, which does not need to be rebuilt in this case.

To disable multipathd services permanently, so that they will not be started on future system boots, run the following commands:

> sudo systemctl disable multipathd
> sudo systemctl disable multipathd.socket
> sudo dracut --force --omit multipath

(Whenever you disable or enable the multipath services, rebuild the initrd. See Section 18.3.2.3, “Keeping the initial RAM disk synchronized”.)

Additionally and optionally, if you also want to make sure multipath devices do not get set up, even when running multipath manually, add the following lines at the end of /etc/multipath.conf before rebuilding the initrd:

blacklist {
    wwid .*
}

18.5.2 Preparing SAN devices for multipathing

Before configuring multipath I/O for your SAN devices, prepare the SAN devices, as necessary, by doing the following:

  • Configure and zone the SAN with the vendor’s tools.

  • Configure permissions for host LUNs on the storage arrays with the vendor’s tools.

  • If SUSE Linux Enterprise Server ships no driver for the host bus adapter (HBA), install a Linux driver from the HBA vendor. See the vendor’s specific instructions for more details.

If multipath devices are detected and multipathd.service is enabled, multipath maps should be created automatically. If this does not happen, use commands like lsscsi to check the probing of the low-level devices. Also, inspect the system logs with journalctl -b. When the LUNs are not seen by the HBA driver, check the zoning setup in the SAN. In particular, check whether LUN masking is active and whether the LUNs are correctly assigned to the server.

If the HBA driver can see LUNs, but no corresponding block devices are created, additional kernel parameters may be needed. See TID 3955167: Troubleshooting SCSI (LUN) Scanning Issues in the SUSE Knowledgebase at https://www.suse.com/support/kb/doc.php?id=3955167.

18.5.3 Partitions on multipath devices

Multipath maps can have partitions like their path devices. Partition table scanning and device node creation for partitions is done in user space by the kpartx tool. kpartx is automatically invoked by udev rules; there is usually no need to run it manually. Technically, “partition” devices created by kpartx are also device mapper devices that simply map a linear range of blocks from the parent device. The Nth partition of a multipath device with known WWID can be accessed reliably via /dev/disk/by-id/dm-uuid-partN-mpath-WWID.

Note
Note: Disabling invocation of kpartx

The skip_kpartx option in /etc/multipath.conf can be used to disable invocation of kpartx on selected multipath maps. This may be useful on virtualization hosts, for example.

Partition tables and partitions on multipath devices can be manipulated as usual, using YaST or tools like fdisk or parted. Changes applied to the partition table will be noted by the system when the partitioning tool exits. If this does not work (usually because a device is busy), try multipathd reconfigure, or reboot the system.

A partitioned multipath device cannot be used otherwise. For example, you cannot create an LVM physical volume from a partitioned device. You will need to wipe the partition table before doing this.

18.6 Multipath configuration

The built-in multipath-tools defaults work well for most setups. If customizations are needed, a configuration file needs to be created. The main configuration file is /etc/multipath.conf. In addition, files matching the pattern /etc/multipath/conf.d/*.conf are read in alphabetical order. See Section 18.6.2, “multipath.conf Syntax” for precedence rules.

Note
Note: Generated configuration files

The files /etc/multipath/wwids, /etc/multipath/bindings, and /etc/multipath/prkeys are maintained by multipath-tools to store persistent information about previously created multipath maps, map names, and reservation keys for SCSI persistent reservations, respectively. Do not edit these generated configuration files.

Note
Note: Configurable paths

Except for /etc/multipath.conf, the paths of the configuration directories and files are configurable, but changing these paths is strongly discouraged.

18.6.1 Creating the /etc/multipath.conf file

You can generate a multipath.conf template from the built-in defaults. This makes all default settings explicit. The behavior of multipath-tools will not change unless the generated file is modified. To generate the configuration template, run:

multipath -T >/etc/multipath.conf

Alternatively, you can create a minimal /etc/multipath.conf that just contains those settings you want to change. The behavior will be identical to modifying only the respective lines in the generated template.

18.6.2 multipath.conf Syntax

The /etc/multipath.conf file uses a hierarchy of sections, subsections, and attribute/value pairs.

  • Whitespace separates tokens. Consecutive whitespace characters are collapsed into a single space, unless quoted (see below).

  • The hash (#) and exclamation mark (!) characters cause the rest of the line to be discarded as a comment.

  • Sections and subsections are started with a section name and an opening brace ({) on the same line, and end with a closing brace (}) on a line on its own.

  • Attributes and values are written on one line. Line continuations are unsupported.

  • Attributes and section names must be keywords. The allowed keywords are documented in multipath.conf(5).

  • Values may be enclosed in double quotes ("). They must be enclosed in quotes if they contain whitespace or comment characters. A double quote character inside a value is represented by a pair of double quotes ("").

  • The values of some attributes are POSIX regular expressions (see regex(7)). They are case sensitive and not anchored, so “bar” matches “rhabarber”.

Syntax Example

section {
    subsection {
        attr1 value
	   attr2      "complex value!"
	attr3    "value with ""quoted"" word"
    } ! subsection end
} # section end

Precedence Rules

As noted at the beginning of Section 18.6, “Multipath configuration”, it is possible to have multiple configuration files. The additional files follow the same syntax rules as /etc/multipath.conf. Sections and attributes can occur multiple times. If the same attribute is set in multiple files, or on multiple lines in the same file, the last value read takes precedence.

18.6.3 /etc/multipath.conf sections

The /etc/multipath.conf file is organized into the following sections. Some attributes can occur in more than one section. See multipath.conf(5) for details.

defaults

General default settings.

blacklist

Lists devices to ignore. See Section 18.8, “Blacklisting non-multipath devices”.

blacklist_exceptions

Lists devices to be multipathed even though they are matched by the blacklist. See Section 18.8, “Blacklisting non-multipath devices”.

devices

Settings specific to the storage controller. This section is a collection of device subsections. Values in this section override values for the same attributes in the defaults section.

multipaths

Settings for individual multipath devices. This section is a list of multipath subsections. Values override the defaults and devices sections.

overrides

Settings that override values from all other sections.

18.6.4 Applying /etc/multipath.conf modifications

To apply the configuration changes, run

> sudo multipathd reconfigure

Do not forget to synchronize with the configuration in the initrd. See Section 18.3.2.3, “Keeping the initial RAM disk synchronized”.

Warning
Warning: Do not apply settings using multipath

Do not apply new settings with the multipath command while multipathd is running. This may result in an inconsistent and possibly broken setup.

Note
Note: Verifying a modified setup

It is possible to test modified settings first before they are applied, by running:

multipath -d -v2

This command shows new maps to be created with the proposed topology. However, the command does not show whether maps will be removed/flushed. To obtain even more information, run this command:

multipath -d -v3 2>&1 | less

18.6.5 Generating a WWID

To identify a device over different paths, multipath uses a World Wide Identification (WWID) for each device. If the WWID is the same for two device paths, they are assumed to represent the same device. We recommend not changing the method of WWID generation, unless there is a compelling reason to do so. For more details, see man multipath.conf.

18.7 Configuring policies for polling, queuing, and failback

This section discusses the most important multipath-tools configuration parameters for achieving fault tolerance.

polling_interval

The time interval (in seconds) between health checks for path devices. The default is 5 seconds. Failed devices are checked at this time interval. For healthy devices, the time interval may be increased up to max_polling_interval seconds.

no_path_retry

Determine what happens if all paths of a given multipath map have failed or disappeared. The possible values are:

fail

Fail I/O on the multipath map. This will cause I/O errors in upper layers such as mounted file systems. The affected file systems, and possibly the entire host, will enter degraded mode.

queue

I/O on the multipath map is queued in the device-mapper layer and sent to the device when path devices become available again. This is the safest option to avoid losing data, but it can have negative effects if the path devices don't get reinstated for a long time. Processes reading from the device will hang in uninterruptible sleep (D) state. Queued data occupies memory, which becomes unavailable for processes. Eventually, memory will be exhausted.

N

N is a positive integer. Keep the map device in queuing mode for N polling intervals. When the time elapses, multipathd fails the map device. If polling_interval is 5 seconds and no_path_retry is 6, multipathd will queue I/O for approximately 6 * 5s = 30s before failing I/O on the map device. A carefully chosen timeout value is often a good compromise between fail and queue.

The goal of multipath I/O is to provide connectivity fault tolerance between the storage system and the server. The desired default behavior depends on whether the server is a stand-alone server or a node in a high-availability cluster.

When you configure multipath I/O for a stand-alone server, the no_path_retry setting protects the server operating system from receiving I/O errors as long as possible. It queues messages until a multipath failover occurs and provides a healthy connection.

When you configure multipath I/O for a node in a high-availability cluster, you want multipath to report the I/O failure to trigger the resource failover instead of waiting for a multipath failover to be resolved. In cluster environments, you must modify the no_path_retry setting so that the cluster node receives an I/O error in relation to the cluster verification process (recommended to be 50% of the heartbeat tolerance) if the connection is lost to the storage system. In addition, you want the multipath I/O fallback to be set to manual to avoid a ping-pong of resources because of path failures.

The /etc/multipath.conf file should contain a defaults section where you can specify default behaviors for polling, queuing, and failback. If the field is not otherwise specified in a device section, the default setting is applied for that SAN configuration.

The following are the compiled default settings. They will be used unless you overwrite these values by creating and configuring a personalized /etc/multipath.conf file.

defaults {
  verbosity 2
#  udev_dir is deprecated in SLES 11 SP3
#  udev_dir              /dev
  polling_interval      5
#  path_selector default value is service-time in SLES 11 SP3
#  path_selector         "round-robin 0"
  path selector         "service-time 0"
  path_grouping_policy  failover
#  getuid_callout is deprecated in SLES 11 SP3 and replaced with uid_attribute
#  getuid_callout        "/usr/lib/udev/scsi_id --whitelisted --device=/dev/%n"
#  uid_attribute is new in SLES 11 SP3
  uid_attribute         "ID_SERIAL"
  prio                  "const"
  prio_args             ""
  features              "0"
  path_checker          "tur"
  alias_prefix          "mpath"
  rr_min_io_rq          1
  max_fds               "max"
  rr_weight             "uniform"
  queue_without_daemon  "yes"
  flush_on_last_del     "no"
  user_friendly_names   "no"
  fast_io_fail_tmo      5
  bindings_file         "/etc/multipath/bindings"
  wwids_file            "/etc/multipath/wwids"
  log_checker_err       "always"

  retain_attached_hw_handler  "no"
  detect_prio           "no"
  failback              "manual"
  no_path_retry         "fail"
  }

For information about setting the polling, queuing, and failback policies, see the following parameters in Section 18.10, “Configuring path failover policies and priorities”:

After you have modified the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then restart the server for the changes to take effect. See Section 18.6.4, “Applying /etc/multipath.conf modifications” for details.

18.8 Blacklisting non-multipath devices

The /etc/multipath.conf file can contain a blacklist section where all non-multipath devices are listed. You can blacklist devices by WWID (wwid keyword), device name (devnode keyword), or device type (device section). You can also use the blacklist_exceptions section to enable multipath for some devices that are blacklisted by the regular expressions used in the blacklist section.

Note
Note: Preferred blacklisting methods

The preferred method for blacklisting devices is by WWID or by vendor and product. Blacklisting by devnode is not recommended, because device nodes can change and thus are not useful for persistent device identification.

Warning
Warning: Regular expressions in multipath.conf

Regular expressions in the /etc/multipath.conf do not work in general. They only work if they are matched against common strings. However, the standard configuration of multipath already contains regular expressions for many devices and vendors. Matching regular expressions with other regular expressions does not work. Make sure that you are only matching against strings shown with multipath -t.

You can typically ignore non-multipathed devices, such as hpsa, fd, hd, md, dm, sr, scd, st, ram, raw, and loop. For example, local SATA hard disks and flash disks do not have multiple paths. If you want multipath to ignore single-path devices, put them in the blacklist section.

Note
Note: Compatibility

The keyword devnode_blacklist has been deprecated and replaced with the keyword blacklist.

With SUSE Linux Enterprise Server 12 the glibc-provided regular expressions are used. To match an arbitrary string, you must now use ".*" rather than "*".

For example, to blacklist local devices and all arrays from the hpsa driver from being managed by multipath, the blacklist section looks like this:

blacklist {
      wwid "26353900f02796769"
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
      devnode "^sd[a-z][0-9]*"
}

You can also blacklist only the partitions from a driver instead of the entire array. For example, you can use the following regular expression to blacklist only partitions from the cciss driver and not the entire array:

blacklist {
      devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}

You can blacklist by specific device types by adding a device section in the blacklist, and using the vendor and product keywords.

blacklist {
      device {
           vendor  "DELL"
           product ".*"
       }
}

You can use a blacklist_exceptions section to enable multipath for some devices that were blacklisted by the regular expressions used in the blacklist section. You add exceptions by WWID (wwid keyword), device name (devnode keyword), or device type (device section). You must specify the exceptions in the same way that you blacklisted the corresponding devices. That is, wwid exceptions apply to a wwid blacklist, devnode exceptions apply to a devnode blacklist, and device type exceptions apply to a device type blacklist.

For example, you can enable multipath for a desired device type when you have different device types from the same vendor. Blacklist all of the vendor’s device types in the blacklist section, and then enable multipath for the desired device type by adding a device section in a blacklist_exceptions section.

blacklist {
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*"
      device {
           vendor  "DELL"
           product ".*"
       }
}

blacklist_exceptions {
      device {
           vendor  "DELL"
           product "MD3220i"
       }
}

You can also use the blacklist_exceptions to enable multipath only for specific devices. For example:

blacklist {
      wwid ".*"
}

blacklist_exceptions {
        wwid "3600d0230000000000e13955cc3751234"
        wwid "3600d0230000000000e13955cc3751235"
}

After you have modified the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then restart the server for the changes to take effect. See Section 18.6.4, “Applying /etc/multipath.conf modifications” for details.

Following the reboot, the local devices should no longer be listed in the multipath maps when you issue the multipath -ll command.

Note
Note: Using the find_multipaths option

Starting with SUSE Linux Enterprise Server 12 SP2, the multipath tools support the option find_multipaths in the defaults section of /etc/multipath.conf. This option prevents multipath and multipathd from setting up multipath maps for devices with only a single path (see man 5 multipath.conf for details). In certain configurations, this may save the administrator from needing to create blacklist entries, for example for local SATA disks.

Convenient as it seems at first, using the find_multipaths option also has disadvantages. It complicates and slows down the system boot, because for every device found, the boot logic needs to wait until all devices have been discovered to see whether a second path exists for the device. Additionally, problems can arise when some paths are down or otherwise invisible at boot time—a device can be falsely detected as a single-path device and activated, causing later addition of more paths to fail.

find_multipaths considers all devices that are listed in /etc/multipath/wwids with matching WWIDs as being multipath devices. This is important when find_multipaths is first activated: Unless /etc/multipath/wwids is deleted or edited, activating this option has no effect, because all previously existing multipath maps (including single-path ones) are listed in the wwids file. On SAN-boot systems with a multipathed root file system, make sure to keep /etc/multipath/wwids synchronized between the initial RAM disk and the file system.

In summary, using find_multipaths may be convenient in certain use cases, but SUSE still recommends the default configuration with a properly configured blacklist and blacklist exceptions.

18.9 Configuring user-friendly names or alias names

A multipath device can be identified by its WWID, by a user-friendly name, or by an alias that you assign for it. Device node names in the form of /dev/sdn and /dev/dm-n can change on reboot and might be assigned to different devices each time. A device’s WWID, user-friendly name, and alias name persist across reboots, and are the preferred way to identify the device.

Important
Important: Using persistent names is recommended

Because device node names in the form of /dev/sdn and /dev/dm-n can change on reboot, referring to multipath devices by their WWID is preferred. You can also use a user-friendly name or alias that is mapped to the WWID to identify the device uniquely across reboots.

The following table describes the types of device names that can be used for a device in the /etc/multipath.conf file. For an example of multipath.conf settings, see the /usr/share/doc/packages/multipath-tools/multipath.conf.synthetic file.

Table 18.1: Comparison of multipath device name types

Name Types

Description

WWID (default)

The serial WWID (Worldwide Identifier) is an identifier for the multipath device that is guaranteed to be globally unique and unchanging. The default name used in multipathing is the ID of the logical unit as found in the /dev/disk/by-id directory. For example, a device with the WWID of 3600508e0000000009e6baa6f609e7908 is listed as /dev/disk/by-id/scsi-3600508e0000000009e6baa6f609e7908.

User-friendly

The Device Mapper Multipath device names in the /dev/mapper directory also reference the ID of the logical unit. These multipath device names are user-friendly names in the form of /dev/mapper/mpathN, such as /dev/mapper/mpath0. The names are unique and persistent because they use the /var/lib/multipath/bindings file to track the association between the UUID and user-friendly names.

Alias

An alias name is a globally unique name that the administrator provides for a multipath device. Alias names override the WWID and the user-friendly /dev/mapper/mpathN names.

If you are using user_friendly_names, do not set the alias to mpathN format. This may conflict with an automatically assigned user-friendly name, and give you incorrect device node names.

The global multipath user_friendly_names option in the /etc/multipath.conf file is used to enable or disable the use of user-friendly names for multipath devices. If it is set to no (the default), multipath uses the WWID as the name of the device. If it is set to yes, multipath uses the /var/lib/multipath/bindings file to assign a persistent and unique name to the device in the form of mpath<N> in the /dev/mapper directory. The bindings file option in the /etc/multipath.conf file can be used to specify an alternate location for the bindings file.

The global multipath alias option in the /etc/multipath.conf file is used to explicitly assign a name to the device. If an alias name is set up for a multipath device, the alias is used instead of the WWID or the user-friendly name.

Using the user_friendly_names option can be problematic in the following situations:

Root device is using multipath:

If the system root device is using multipath and you use the user_friendly_names option, the user-friendly settings in the /var/lib/multipath/bindings file are included in the initrd. If you later change the storage setup, such as by adding or removing devices, there is a mismatch between the bindings setting inside the initrd and the bindings settings in /var/lib/multipath/bindings.

Warning
Warning: Binding mismatches

A bindings mismatch between initrd and /var/lib/multipath/bindings can lead to a wrong assignment of mount points to devices, which can result in file system corruption and data loss.

To avoid this problem, we recommend that you use the default WWID settings for the system root device. You should not use aliases for the system root device. Because the device name would differ, using an alias causes you to lose the ability to seamlessly switch off multipathing via the kernel command line.

Mounting /var from another partition:

The default location of the user_friendly_names configuration file is /var/lib/multipath/bindings. If the /var data is not located on the system root device but mounted from another partition, the bindings file is not available when setting up multipathing.

Make sure that the /var/lib/multipath/bindings file is available on the system root device and multipath can find it. For example, this can be done as follows:

  1. Move the /var/lib/multipath/bindings file to /etc/multipath/bindings.

  2. Set the bindings_file option in the defaults section of /etc/multipath.conf to this new location. For example:

    defaults {
                   user_friendly_names yes
                   bindings_file "/etc/multipath/bindings"
    }
Multipath is in the initrd:

Even if the system root device is not on multipath, it is possible for multipath to be included in the initrd. For example, this can happen if the system root device is on LVM. If you use the user_friendly_names option and multipath is in the initrd, you should boot with the parameter multipath=off to avoid problems.

This disables multipath only in the initrd during system boots. After the system boots, the boot.multipath and multipathd boot scripts can activate multipathing.

Multipathing in HA clusters:

See Section 18.9.1, “Multipath device names in HA clusters” for details.

To enable user-friendly names or to specify aliases:

  1. Open the /etc/multipath.conf file in a text editor with root privileges.

  2. (Optional) Modify the location of the /var/lib/multipath/bindings file.

    The alternate path must be available on the system root device where multipath can find it.

    1. Move the /var/lib/multipath/bindings file to /etc/multipath/bindings.

    2. Set the bindings_file option in the defaults section of /etc/multipath.conf to this new location. For example:

      defaults {
                user_friendly_names yes
                bindings_file "/etc/multipath/bindings"
      }
  3. (Optional, not recommended) Enable user-friendly names:

    1. Uncomment the defaults section and its ending bracket.

    2. Uncomment the user_friendly_names option, then change its value from No to Yes.

      For example:

      ## Use user-friendly names, instead of using WWIDs as names.
      defaults {
        user_friendly_names yes
      }
  4. (Optional) Specify your own names for devices by using the alias option in the multipath section.

    For example:

    ## Use alias names, instead of using WWIDs as names.
    multipaths {
           multipath {
                   wwid           36006048000028350131253594d303030
                   alias             blue1
           }
           multipath {
                   wwid           36006048000028350131253594d303041
                   alias             blue2
           }
           multipath {
                   wwid           36006048000028350131253594d303145
                   alias             yellow1
           }
           multipath {
                   wwid           36006048000028350131253594d303334
                   alias             yellow2
           }
    }
    Important
    Important: WWID compared to WWN

    When you define device aliases in the /etc/multipath.conf file, ensure that you use each device’s WWID (such as 3600508e0000000009e6baa6f609e7908) and not its WWN, which replaces the first character of a device ID with 0x, such as 0x600508e0000000009e6baa6f609e7908.

  5. Save your changes, then close the file.

  6. After you have modified the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then restart the server for the changes to take effect. See Section 18.6.4, “Applying /etc/multipath.conf modifications” for details.

To use the entire LUN directly (for example, if you are using the SAN features to partition your storage), you can use the /dev/disk/by-id/xxx names for mkfs, /etc/fstab, your application, and so on. Partitioned devices have _part<n> appended to the device name, such as /dev/disk/by-id/xxx_part1.

In the /dev/disk/by-id directory, the multipath-mapped devices are represented by the device’s dm-uuid* name or alias name (if you assign an alias for it in the /etc/multipath.conf file). The scsi- and wwn- device names represent physical paths to the devices.

18.9.1 Multipath device names in HA clusters

Make sure that multipath devices have the same name across all devices by doing the following:

  • Use UUID and alias names to ensure that multipath device names are consistent across all nodes in the cluster. Alias names must be unique across all nodes. Copy the /etc/multipath.conf file from the node to the /etc/ directory for all of the other nodes in the cluster.

  • When using links to multipath-mapped devices, ensure that you specify the dm-uuid* name or alias name in the /dev/disk/by-id directory, and not a fixed path instance of the device. For information, see Section 18.9, “Configuring user-friendly names or alias names”.

  • Set the user_friendly_names configuration option to no to disable it. A user-friendly name is unique to a node, but a device might not be assigned the same user-friendly name on every node in the cluster.

Note
Note: User-friendly names

If you really need to use user-friendly names, you can force the system-defined user-friendly names to be consistent across all nodes in the cluster by doing the following:

  1. In the /etc/multipath.conf file on one node:

    1. Set the user_friendly_names configuration option to yes to enable it.

      Multipath uses the /var/lib/multipath/bindings file to assign a persistent and unique name to the device in the form of mpath<N> in the /dev/mapper directory.

    2. (Optional) Set the bindings_file option in the defaults section of the /etc/multipath.conf file to specify an alternate location for the bindings file.

      The default location is /var/lib/multipath/bindings.

  2. Set up all of the multipath devices on the node.

  3. Copy the /etc/multipath.conf file from the node to the /etc/ directory of all the other nodes in the cluster.

  4. Copy the bindings file from the node to the bindings_file path on all of the other nodes in the cluster.

  5. After you have modified the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then restart the node for the changes to take effect. See Section 18.6.4, “Applying /etc/multipath.conf modifications” for details. This applies to all affected nodes.

18.10 Configuring path failover policies and priorities

In a Linux host, when there are multiple paths to a storage controller, each path appears as a separate block device, and results in multiple block devices for single LUN. The Device Mapper Multipath service detects multiple paths with the same LUN ID, and creates a new multipath device with that ID. For example, a host with two HBAs attached to a storage controller with two ports via a single unzoned Fibre Channel switch sees four block devices: /dev/sda, /dev/sdb, /dev/sdc, and /dev/sdd. The Device Mapper Multipath service creates a single block device, /dev/mpath/mpath1, that reroutes I/O through those four underlying block devices.

This section describes how to specify policies for failover and configure priorities for the paths. Note that after you have modified the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then restart the server for the changes to take effect. See Section 18.6.4, “Applying /etc/multipath.conf modifications” for details.

18.10.1 Configuring the path failover policies

Use the multipath command with the -p option to set the path failover policy:

> sudo multipath DEVICENAME -p POLICY

Replace POLICY with one of the following policy options:

Table 18.2: Group policy options for the multipath -p command

Policy Option

Description

failover

(Default) One path per priority group.

multibus

All paths in one priority group.

group_by_serial

One priority group per detected serial number.

group_by_prio

One priority group per path priority value. Priorities are determined by callout programs specified as a global, per-controller, or per-multipath option in the /etc/multipath.conf configuration file.

group_by_node_name

One priority group per target node name. Target node names are fetched in the /sys/class/fc_transport/target*/node_name location.

18.10.2 Configuring failover priorities

You must manually enter the failover priorities for the device in the /etc/multipath.conf file. Examples for all settings and options can be found in the /usr/share/doc/packages/multipath-tools/multipath.conf.annotated file.

18.10.2.1 Understanding priority groups and attributes

A priority group is a collection of paths that go to the same physical LUN. By default, I/O is distributed in a round-robin fashion across all paths in the group. The multipath command automatically creates priority groups for each LUN in the SAN based on the path_grouping_policy setting for that SAN. The multipath command multiplies the number of paths in a group by the group’s priority to determine which group is the primary. The group with the highest calculated value is the primary. When all paths in the primary group are failed, the priority group with the next highest value becomes active.

A path priority is an integer value assigned to a path. The higher the value, the higher the priority. An external program is used to assign priorities for each path. For a given device, the paths with the same priorities belong to the same priority group.

The prio setting is used in the defaults{} or devices{} section of the /etc/multipath.conf file. It is silently ignored when it is specified for an individual multipath definition in the multipaths{) section. The prio line specifies the prioritizer. If the prioritizer requires an argument, you specify the argument by using the prio_args keyword on a second line.

PRIO Settings for the Defaults or Devices Sections

prio

Specifies the prioritizer program to call to obtain a path priority value. Weights are summed for each path group to determine the next path group to use in case of failure.

Use the prio_args keyword to specify arguments if the specified prioritizer requires arguments.

If no prio keyword is specified, all paths are equal. The default setting is const with a prio_args setting with no value.

prio "const"
prio_args ""

Example prioritizer programs include:

Prioritizer Program

Description

alua

Generates path priorities based on the SCSI-3 ALUA settings.

const

Generates the same priority for all paths.

emc

Generates the path priority for EMC arrays.

hdc

Generates the path priority for Hitachi HDS Modular storage arrays.

hp_sw

Generates the path priority for Compaq/HP controller in active/standby mode.

ontap

Generates the path priority for NetApp arrays.

random

Generates a random priority for each path.

rdac

Generates the path priority for LSI/Engenio RDAC controller.

weightedpath

Generates the path priority based on the weighted values you specify in the arguments for prio_args.

path_latency

Generates the path priority based on a latency algorithm, which is configured with the prio_args keyword.

prio_args arguments

These are the arguments for the prioritizer programs that require arguments. Most prio programs do not need arguments. There is no default. The values depend on the prio setting and whether the prioritizer requires any of the following arguments:

weighted

Requires a value of the form [hbtl|devname|serial|wwn] REGEX1 PRIO1 REGEX2 PRIO2...

Regex must be of SCSI H:B:T:L format, for example 1:0:.:. and *:0:0:., with a weight value, where H, B, T, L are the host, bus, target, and LUN IDs for a device. For example:

prio "weightedpath"
prio_args "hbtl 1:.:.:. 2 4:.:.:. 4"
devname

Regex is in device name format. For example: sda, sd.e

serial

Regex is in serial number format. For example: .*J1FR.*324. Look up your serial number with the multipathd show paths format %z command. (multipathd show wildcards displays all format wildcards.)

alua

If exclusive_pref_bit is set for a device (alua exclusive_pref_bit), paths with the preferred path bit set will always be in their own path group.

path_latency

path_latency adjusts latencies between remote and local storage arrays if both arrays use the same type of hardware. Usually the latency on the remote array will be higher, so you can tune the latency to bring them closer together. This requires a value pair of the form io_num=20 base_num=10.

io_num is the number of read IOs sent to the current path continuously, which are used to calculate the average path latency. Valid values are integers from 2 to 200.

base_num is the logarithmic base number, used to partition different priority ranks. Valid values are integers from 2 to 10. The maximum average latency value is 100s, the minimum is 1 μs. For example, if base_num=10, the paths will be grouped in priority groups with path latency <=1 μs, (1 μs, 10 μs], (10 μs, 100 μs), (100 μs, 1 ms), (1 ms, 10 ms), (10 ms, 100 ms), (100 ms, 1 s), (1 s, 10 s), (10 s, 100 s), >100 s.

Multipath Attributes

Multipath attributes are used to control the behavior of multipath I/O for devices. You can specify attributes as defaults for all multipath devices. You can also specify attributes that apply only to a given multipath device by creating an entry for that device in the multipaths section of the multipath configuration file.

user_friendly_names

Specifies whether to use world-wide IDs (WWIDs) or to use the /var/lib/multipath/bindings file to assign a persistent and unique alias to the multipath devices in the form of /dev/mapper/mpathN.

This option can be used in the devices section and the multipaths section.

Value

Description

no

(Default) Use the WWIDs shown in the /dev/disk/by-id/ location.

yes

Autogenerate user-friendly names as aliases for the multipath devices instead of the actual ID.

failback

Specifies whether to monitor the failed path recovery, and indicates the timing for group failback after failed paths return to service.

When the failed path recovers, the path is added back into the multipath-enabled path list based on this setting. Multipath evaluates the priority groups, and changes the active priority group when the priority of the primary path exceeds the secondary group.

Value

Description

manual

(Default) The failed path is not monitored for recovery. The administrator runs the multipath command to update enabled paths and priority groups.

followover

Only perform automatic failback when the first path of a pathgroup becomes active. This keeps a node from automatically failing back when another node requested the failover.

immediate

When a path recovers, enable the path immediately.

N

When the path recovers, wait N seconds before enabling the path. Specify an integer value greater than 0.

We recommend failback setting of manual for multipath in cluster environments to prevent multipath failover ping-pong.

failback "manual"
Important
Important: Verification

Make sure that you verify the failback setting with your storage system vendor. Different storage systems can require different settings.

no_path_retry

Specifies the behaviors to use on path failure.

Value

Description

N

Specifies the number of retries until multipath stops the queuing and fails the path. Specify an integer value greater than 0.

In a cluster, you can specify a value of “0” to prevent queuing and allow resources to fail over.

fail

Specifies immediate failure (no queuing).

queue

Never stop queuing (queue forever until the path comes alive).

We recommend a retry setting of fail or 0 in the /etc/multipath.conf file when working in a cluster. This causes the resources to fail over when the connection is lost to storage. Otherwise, the messages queue and the resource failover cannot occur.

no_path_retry "fail"
no_path_retry "0"
Important
Important: Verification

Make sure that you verify the retry settings with your storage system vendor. Different storage systems can require different settings.

path_checker

Determines the state of the path.

Value

Description

directio

Reads the first sector that has direct I/O. This is useful for DASD devices. Logs failure messages in the systemd journal (see Kapitel 21, journalctl: Abfragen des systemd-Journals).

tur

Issues an SCSI test unit ready command to the device. This is the preferred setting if the LUN supports it. On failure, the command does not fill up the systemd log journal with messages.

CUSTOM_VENDOR_VALUE

Some SAN vendors provide custom path_checker options:

  • cciss_tur Checks the path state for HP Smart Storage Arrays.

  • emc_clariion Queries the EMC Clariion EVPD page 0xC0 to determine the path state.

  • hp_sw Checks the path state (Up, Down, or Ghost) for HP storage arrays with Active/Standby firmware.

  • rdac Checks the path state for the LSI/Engenio RDAC storage controller.

path_grouping_policy

Specifies the path grouping policy for a multipath device hosted by a given controller.

Value

Description

failover

(Default) One path is assigned per priority group so that only one path at a time is used.

multibus

All valid paths are in one priority group. Traffic is load-balanced across all active paths in the group.

group_by_prio

One priority group exists for each path priority value. Paths with the same priority are in the same priority group. Priorities are assigned by an external program.

group_by_serial

Paths are grouped by the SCSI target serial number (controller node WWN).

group_by_node_name

One priority group is assigned per target node name. Target node names are fetched in /sys/class/fc_transport/target*/node_name.

path_selector

Specifies the path-selector algorithm to use for load balancing.

Value

Description

round-robin 0

The load-balancing algorithm used to balance traffic across all active paths in a priority group.

queue-length 0

A dynamic load balancer that balances the number of in-flight I/O on paths similar to the least-pending option.

service-time 0

(Default) A service-time oriented load balancer that balances I/O on paths according to the latency.

pg_timeout

Specifies path group timeout handling. No value can be specified; an internal default is set.

polling_interval

Specifies the time in seconds between the end of one path checking cycle and the beginning of the next path checking cycle.

Specify an integer value greater than 0. The default value is 5. Make sure that you verify the polling_interval setting with your storage system vendor. Different storage systems can require different settings.

rr_min_io_rq

Specifies the number of I/O requests to route to a path before switching to the next path in the current path group, using request-based device-mapper-multipath.

Specify an integer value greater than 0. The default value is 1.

rr_min_io_rq "1"
rr_weight

Specifies the weighting method to use for paths.

Value

Description

uniform

(Default) All paths have the same round-robin weights.

priorities

Each path’s weight is determined by the path’s priority times the rr_min_io_rq setting.

uid_attribute

A udev attribute that provides a unique path identifier. The default value is ID_SERIAL.

18.10.2.2 Configuring for round-robin load balancing

All paths are active. I/O is configured for some number of seconds or some number of I/O transactions before moving to the next open path in the sequence.

18.10.2.3 Configuring for single path failover

A single path with the highest priority (lowest value setting) is active for traffic. Other paths are available for failover, but are not used unless failover occurs.

18.10.2.4 Grouping I/O paths for round-robin load balancing

Multiple paths with the same priority fall into the active group. When all paths in that group fail, the device fails over to the next highest priority group. All paths in the group share the traffic load in a round-robin load balancing fashion.

18.10.3 Reporting target path groups

Use the SCSI Report Target Port Groups (sg_rtpg(8)) command. For information, see the man page for sg_rtpg(8).

18.11 Configuring multipath I/O for the root device

Device Mapper Multipath I/O (DM-MPIO) is available and supported for /boot and /root in SUSE Linux Enterprise Server. In addition, the YaST partitioner in the YaST installer supports enabling multipath during the install.

18.11.1 Enabling multipath I/O at install time

To install the operating system on a multipath device, the multipath software must be running at install time. The multipathd daemon is not automatically active during the system installation. You can start it by using the Configure Multipath option in the YaST Partitioner.

18.11.1.1 Enabling multipath I/O at install time on an active/active multipath storage LUN

  1. Choose Expert Partitioner on the Suggested Partitioning screen during the installation.

  2. Select the Hard Disks main icon, click the Configure button, then select Configure Multipath.

  3. Start multipath.

    YaST starts to rescan the disks and shows available multipath devices (such as /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65). This is the device that should be used for all further processing.

  4. Click Next to continue with the installation.

18.11.1.2 Enabling multipath I/O at install time on an active/passive multipath storage LUN

The multipathd daemon is not automatically active during the system installation. You can start it by using the Configure Multipath option in the YaST Partitioner.

To enable multipath I/O at install time for an active/passive multipath storage LUN:

  1. Choose Expert Partitioner on the Suggested Partitioning screen during the installation.

  2. Select the Hard Disks main icon, click the Configure button, then select Configure Multipath.

  3. Start multipath.

    YaST starts to rescan the disks and shows available multipath devices (such as /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65). This is the device that should be used for all further processing. Write down the device path and UUID; you will need it later.

  4. Click Next to continue with the installation.

  5. After all settings are done and the installation is finished, YaST starts to write the boot loader information, and displays a countdown to restart the system. Stop the counter by clicking the Stop button and press CtrlAltF5 to access a console.

  6. Use the console to determine if a passive path was entered in the /boot/grub2/device.map file for the hd0 entry.

    This is necessary because the installation does not distinguish between active and passive paths.

    1. Mount the root device to /mnt by entering

      > sudo mount /dev/disk/by-id/UUID;_part2 /mnt

      For example, enter

      > sudo mount /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65_part2 /mnt
    2. Mount the boot device to /mnt/boot by entering

      > sudo mount /dev/disk/by-id/UUID_part1 /mnt/boot

      For example, enter

      > sudo mount /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65_part2 /mnt/boot
    3. In the /mnt/boot/grub2/device.map file, determine if the hd0 entry points to a passive path, then do one of the following:

      • Active path:  No action is needed. Skip all remaining steps and return to the YaST graphical environment by pressing CtrlAltF7 and continue with the installation.

      • Passive path:  The configuration must be changed and the boot loader must be reinstalled.

  7. If the hd0 entry points to a passive path, change the configuration and reinstall the boot loader:

    1. Enter the following commands at the console prompt:

                mount -o bind /dev /mnt/dev
                mount -o bind /sys /mnt/sys
                mount -o bind /proc /mnt/proc
                chroot /mnt
    2. At the console, run multipath -ll, then check the output to find the active path.

      Passive paths are flagged as ghost.

    3. In the /boot/grub2/device.map file, change the hd0 entry to an active path, save the changes, and close the file.

    4. Reinstall the boot loader by entering

      grub-install /dev/disk/by-id/UUID_part1 /mnt/boot

      For example, enter

      grub-install /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65_part2 /mnt/boot
    5. Enter the following commands:

      exit
      umount /mnt/*
      umount /mnt
  8. Return to the YaST graphical environment by pressing CtrlAltF7.

  9. Click OK to continue with the installation reboot.

18.11.2 Enabling multipath I/O for an existing root device

  1. Install Linux with only a single path active, preferably one where the by-id symbolic links are listed in the partitioner.

  2. Mount the devices by using the /dev/disk/by-id path used during the install.

  3. Open or create /etc/dracut.conf.d/10-mp.conf and add the following line (mind the leading empty space):

    force_drivers+=" dm-multipath"
  4. For IBM Z, before running dracut, edit the /etc/zipl.conf file to change the by-path information in zipl.conf with the same by-id information that was used in /etc/fstab.

  5. Run dracut -f to update the initrd image.

  6. For IBM Z, after running dracut, run zipl.

  7. Reboot the server.

18.11.3 Disabling multipath I/O on the root device

Add multipath=off to the kernel command line. This can be done with the YaST Boot Loader module. Open Boot Loader Installation › Kernel Parameters and add the parameter to both command lines.

This affects only the root device. All other devices are not affected.

18.12 Configuring multipath I/O for an existing software RAID

Ideally, you should configure multipathing for devices before you use them as components of a software RAID device. If you add multipathing after creating any software RAID devices, the DM-MPIO service might be starting after the multipath service on reboot, which makes multipathing appear not to be available for RAIDs. You can use the procedure in this section to get multipathing running for a previously existing software RAID.

For example, you might need to configure multipathing for devices in a software RAID under the following circumstances:

  • If you create a new software RAID as part of the Partitioning settings during a new install or upgrade.

  • If you did not configure the devices for multipathing before using them in the software RAID as a member device or spare.

  • If you grow your system by adding new HBA adapters to the server or expanding the storage subsystem in your SAN.

Note
Note: Assumptions

The following instructions assume the software RAID device is /dev/mapper/mpath0, which is its device name as recognized by the kernel. It assumes you have enabled user-friendly names in the /etc/multipath.conf file as described in Section 18.9, “Configuring user-friendly names or alias names”.

Make sure to modify the instructions for the device name of your software RAID.

  1. Open a terminal console.

    Except where otherwise directed, use this console to enter the commands in the following steps.

  2. If any software RAID devices are currently mounted or running, enter the following commands for each device to unmount the device and stop it.

    > sudo umount /dev/mapper/mpath0
    > sudo mdadm --misc --stop /dev/mapper/mpath0
  3. Stop the md service by entering

    > sudo systemctl stop mdmonitor
  4. Start the multipathd daemon by entering the following command:

    > systemctl start multipathd
  5. After the multipathing service has been started, verify that the software RAID’s component devices are listed in the /dev/disk/by-id directory. Do one of the following:

    • Devices are listed:  The device names should now have symbolic links to their Device Mapper Multipath device names, such as /dev/dm-1.

    • Devices are not listed:  Force the multipath service to recognize them by flushing and rediscovering the devices by entering

      > sudo multipath -F
      > sudo multipath -v0

      The devices should now be listed in /dev/disk/by-id, and have symbolic links to their Device Mapper Multipath device names. For example:

      lrwxrwxrwx 1 root root 10 2011-01-06 11:42 dm-uuid-mpath-36006016088d014007e0d0d2213ecdf11 -> ../../dm-1
  6. Restart the mdmonitor service and the RAID device by entering

    > sudo systemctl start mdmonitor
  7. Check the status of the software RAID by entering

    > sudo mdadm --detail /dev/mapper/mpath0

    The RAID’s component devices should match their Device Mapper Multipath device names that are listed as the symbolic links of devices in the /dev/disk/by-id directory.

  8. In case the root (/) device or any parts of it (such as /var, /etc, /log) are on the SAN and multipath is needed to boot, rebuild the initrd:

    > dracut -f --add-multipath
  9. Reboot the server to apply the changes.

  10. Verify that the software RAID array comes up properly on top of the multipathed devices by checking the RAID status. Enter

    > sudo mdadm --detail /dev/mapper/mpath0

    For example:

    Number Major Minor RaidDevice State
    0 253 0 0 active sync /dev/dm-0
    1 253 1 1 active sync /dev/dm-1
    2 253 2 2 active sync /dev/dm-2
Note
Note: Using mdadm with multipath devices

The mdadm tool requires that the devices be accessed by the ID rather than by the device node path. Refer to Section 18.4.3, “MD RAID on multipath devices” for details.

18.13 Using LVM2 on multipath devices

When using multipath, all paths to a resource are present as devices in the device tree. By default LVM checks if there is a multipath device on top of any device in the device tree. If LVM finds a multipath device on top, it assumes that the device is a multipath component and ignores the (underlying) device. This is the most likely desired behavior, but it can be changed in /etc/lvm/lvm.conf. When multipath_component_detection is set to 0, LVM is scanning multipath component devices. The default entry in lvm.conf is:

    # By default, LVM2 will ignore devices used as component paths
    # of device-mapper multipath devices.
    # 1 enables; 0 disables.
    multipath_component_detection = 1

18.14 Best practice

18.14.1 Scanning for new devices without rebooting

If your system has already been configured for multipathing and you later need to add storage to the SAN, you can use the rescan-scsi-bus.sh script to scan for the new devices. By default, this script scans all HBAs with typical LUN ranges. The general syntax for the command looks like the following:

> sudo rescan-scsi-bus.sh [options] [host [host ...]]

For most storage subsystems, the script can be run successfully without options. However, some special cases might need to use one or more options. Run rescan-scsi-bus.sh --help for details.

Warning
Warning: EMC PowerPath environments

In EMC PowerPath environments, do not use the rescan-scsi-bus.sh utility provided with the operating system or the HBA vendor scripts for scanning the SCSI buses. To avoid potential file system corruption, EMC requires that you follow the procedure provided in the vendor documentation for EMC PowerPath for Linux.

Use the following procedure to scan the devices and make them available to multipathing without rebooting the system.

  1. On the storage subsystem, use the vendor’s tools to allocate the device and update its access control settings to allow the Linux system access to the new storage. Refer to the vendor’s documentation for details.

  2. Scan all targets for a host to make its new device known to the middle layer of the Linux kernel’s SCSI subsystem. At a terminal console prompt, enter

    > sudo rescan-scsi-bus.sh

    Depending on your setup, you might need to run rescan-scsi-bus.sh with optional parameters. Refer to rescan-scsi-bus.sh --help for details.

  3. Check for scanning progress in the systemd journal (see Kapitel 21, journalctl: Abfragen des systemd-Journals for details). At a terminal console prompt, enter

    > sudo journalctl -r

    This command displays the last lines of the log. For example:

    > sudo journalctl -r
    Feb 14 01:03 kernel: SCSI device sde: 81920000
    Feb 14 01:03 kernel: SCSI device sdf: 81920000
    Feb 14 01:03 multipathd: sde: path checker registered
    Feb 14 01:03 multipathd: sdf: path checker registered
    Feb 14 01:03 multipathd: mpath4: event checker started
    Feb 14 01:03 multipathd: mpath5: event checker started
    Feb 14 01:03:multipathd: mpath4: remaining active paths: 1
    Feb 14 01:03 multipathd: mpath5: remaining active paths: 1
    [...]
  4. Repeat the previous steps to add paths through other HBA adapters on the Linux system that are connected to the new device.

  5. Run the multipath command to recognize the devices for DM-MPIO configuration. At a terminal console prompt, enter

    > sudo multipath

    You can now configure the new device for multipathing.

18.14.2 Scanning for new partitioned devices without rebooting

Use the example in this section to detect a newly added multipathed LUN without rebooting.

Warning
Warning: EMC PowerPath environments

In EMC PowerPath environments, do not use the rescan-scsi-bus.sh utility provided with the operating system or the HBA vendor scripts for scanning the SCSI buses. To avoid potential file system corruption, EMC requires that you follow the procedure provided in the vendor documentation for EMC PowerPath for Linux.

  1. Open a terminal console.

  2. Scan all targets for a host to make its new device known to the middle layer of the Linux kernel’s SCSI subsystem. At a terminal console prompt, enter

    > rescan-scsi-bus.sh

    Depending on your setup, you might need to run rescan-scsi-bus.sh with optional parameters. Refer to rescan-scsi-bus.sh --help for details.

  3. Verify that the device is seen (such as if the link has a new time stamp) by entering

    > ls -lrt /dev/dm-*

    You can also verify the devices in /dev/disk/by-id by entering

    > ls -l /dev/disk/by-id/
  4. Verify the new device appears in the log by entering

    > sudo journalctl -r
  5. Use a text editor to add a new alias definition for the device in the /etc/multipath.conf file, such as data_vol3.

    For example, if the UUID is 36006016088d014006e98a7a94a85db11, make the following changes:

    defaults {
         user_friendly_names   yes
      }
    multipaths {
         multipath {
              wwid    36006016088d014006e98a7a94a85db11
              alias  data_vol3
              }
      }
  6. Create a partition table for the device by entering

    > fdisk /dev/disk/by-id/dm-uuid-mpath-<UUID>

    Replace UUID with the device WWID, such as 36006016088d014006e98a7a94a85db11.

  7. Trigger udev by entering

    > sudo echo 'add' > /sys/block/DM_DEVICE/uevent

    For example, to generate the device-mapper devices for the partitions on dm-8, enter

    > sudo echo 'add' > /sys/block/dm-8/uevent
  8. Create a file system on the device /dev/disk/by-id/dm-uuid-mpath-UUID_partN. Depending on your choice for the file system, you may use one of the following commands for this purpose: mkfs.btrfs mkfs.ext3, mkfs.ext4, or mkfs.xfs. Refer to the respective man pages for details. Replace UUID_partN with the actual UUID and partition number, such as 36006016088d014006e98a7a94a85db11_part1.

  9. Create a label for the new partition by entering the following command:

    > sudo tune2fs -L LABELNAME /dev/disk/by-id/dm-uuid-UUID_partN

    Replace UUID_partN with the actual UUID and partition number, such as 36006016088d014006e98a7a94a85db11_part1. Replace LABELNAME with a label of your choice.

  10. Reconfigure DM-MPIO to let it read the aliases by entering

    > sudo multipathd -k'reconfigure'
  11. Verify that the device is recognized by multipathd by entering

    > sudo multipath -ll
  12. Use a text editor to add a mount entry in the /etc/fstab file.

    At this point, the alias you created in a previous step is not yet in the /dev/disk/by-label directory. Add a mount entry for the /dev/dm-9 path, then change the entry before the next time you reboot to

    LABEL=LABELNAME
  13. Create a directory to use as the mount point, then mount the device.

18.14.3 Viewing multipath I/O status

Querying the multipath I/O status outputs the current status of the multipath maps.

The multipath -l option displays the current path status as of the last time that the path checker was run. It does not run the path checker.

The multipath -ll option runs the path checker, updates the path information, then displays the current status information. This command always displays the latest information about the path status.

> sudo multipath -ll
3600601607cf30e00184589a37a31d911
[size=127 GB][features="0"][hwhandler="1 emc"]

\_ round-robin 0 [active][first]
  \_ 1:0:1:2 sdav 66:240  [ready ][active]
  \_ 0:0:1:2 sdr  65:16   [ready ][active]

\_ round-robin 0 [enabled]
  \_ 1:0:0:2 sdag 66:0    [ready ][active]
  \_ 0:0:0:2 sdc  8:32    [ready ][active]

For each device, it shows the device’s ID, size, features, and hardware handlers.

Paths to the device are automatically grouped into priority groups on device discovery. Only one priority group is active at a time. For an active/active configuration, all paths are in the same group. For an active/passive configuration, the passive paths are placed in separate priority groups.

The following information is displayed for each group:

  • Scheduling policy used to balance I/O within the group, such as round-robin

  • Whether the group is active, disabled, or enabled

  • Whether the group is the first (highest priority) group

  • Paths contained within the group

The following information is displayed for each path:

  • The physical address as HOST:BUS:TARGET:LUN, such as 1:0:1:2

  • Device node name, such as sda

  • Major:minor numbers

  • Status of the device

Note
Note: Using iostat in multipath setups

In multipath environments, the iostat command might lead to unexpected results. By default, iostat filters out all block devices with no I/O. To make iostat show all devices, use:

iostat -p ALL

18.14.4 Managing I/O in error situations

You might need to configure multipathing to queue I/O if all paths fail concurrently by enabling queue_if_no_path. Otherwise, I/O fails immediately if all paths are gone. In certain scenarios, where the driver, the HBA, or the fabric experience spurious errors, DM-MPIO should be configured to queue all I/O where those errors lead to a loss of all paths, and never propagate errors upward.

When you use multipathed devices in a cluster, you might choose to disable queue_if_no_path. This automatically fails the path instead of queuing the I/O, and escalates the I/O error to cause a failover of the cluster resources.

Because enabling queue_if_no_path leads to I/O being queued indefinitely unless a path is reinstated, ensure that multipathd is running and works for your scenario. Otherwise, I/O might be stalled indefinitely on the affected multipathed device until reboot or until you manually return to failover instead of queuing.

To test the scenario:

  1. Open a terminal console.

  2. Activate queuing instead of failover for the device I/O by entering

    > sudo dmsetup message DEVICE_ID 0 queue_if_no_path

    Replace the DEVICE_ID with the ID for your device. The 0 value represents the sector and is used when sector information is not needed.

    For example, enter:

    > sudo dmsetup message 3600601607cf30e00184589a37a31d911 0 queue_if_no_path
  3. Return to failover for the device I/O by entering

    > sudo dmsetup message DEVICE_ID 0 fail_if_no_path

    This command immediately causes all queued I/O to fail.

    Replace the DEVICE_ID with the ID for your device. For example, enter

    > sudo dmsetup message 3600601607cf30e00184589a37a31d911 0 fail_if_no_path

To set up queuing I/O for scenarios where all paths fail:

  1. Open a terminal console.

  2. Open the /etc/multipath.conf file in a text editor.

  3. Uncomment the defaults section and its ending bracket, then add the default_features setting, as follows:

    defaults {
      default_features "1 queue_if_no_path"
    }
  4. After you modify the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then reboot for the changes to take effect.

  5. When you are ready to return to failover for the device I/O, enter

    > sudo dmsetup message MAPNAME 0 fail_if_no_path

    Replace the MAPNAME with the mapped alias name or the device ID for the device. The 0 value represents the sector and is used when sector information is not needed.

    This command immediately causes all queued I/O to fail and propagates the error to the calling application.

18.14.5 Resolving stalled I/O

If all paths fail concurrently and I/O is queued and stalled, do the following:

  1. Enter the following command at a terminal console prompt:

    > sudo dmsetup message MAPNAME 0 fail_if_no_path

    Replace MAPNAME with the correct device ID or mapped alias name for the device. The 0 value represents the sector and is used when sector information is not needed.

    This command immediately causes all queued I/O to fail and propagates the error to the calling application.

  2. Reactivate queuing by entering the following command:

    > sudo dmsetup message MAPNAME 0 queue_if_no_path

18.14.6 Configuring default settings for IBM Z devices

Testing of the IBM Z device with multipathing has shown that the dev_loss_tmo parameter should be set to infinity (2147483647), and the fast_io_fail_tmo parameter should be set to 5 seconds. If you are using IBM Z devices, modify the /etc/multipath.conf file to specify the values as follows:

defaults {
       dev_loss_tmo 2147483647
       fast_io_fail_tmo 5
}

The dev_loss_tmo parameter sets the number of seconds to wait before marking a multipath link as bad. When the path fails, any current I/O on that failed path fails. The default value varies according to the device driver being used. To use the driver’s internal timeouts, set the value to zero (0). It can also be set to "infinity" or 2147483647, which sets it to the max value of 2147483647 seconds (68 years).

The fast_io_fail_tmo parameter sets the length of time to wait before failing I/O when a link problem is detected. I/O that reaches the driver fails. If I/O is in a blocked queue, the I/O does not fail until the dev_loss_tmo time elapses and the queue is unblocked.

If you modify the /etc/multipath.conf file, the changes are not applied until you update the multipath maps, or until the multipathd daemon is restarted (systemctl restart multipathd).

18.14.7 Using multipath with NetApp devices

When using multipath for NetApp devices, we recommend the following settings in the /etc/multipath.conf file:

  • Set the default values for the following parameters globally for NetApp devices:

    max_fds max
    queue_without_daemon no
  • Set the default values for the following parameters for NetApp devices in the hardware table:

    dev_loss_tmo infinity
    fast_io_fail_tmo 5
    features "3 queue_if_no_path pg_init_retries 50"

18.14.8 Using --noflush with multipath devices

The --noflush option should always be used when running on multipath devices.

For example, in scripts where you perform a table reload, you use the --noflush option on resume to ensure that any outstanding I/O is not flushed, because you need the multipath topology information.

load
resume --noflush

18.14.9 SAN timeout settings when the root device is multipathed

A system with root (/) on a multipath device might stall when all paths have failed and are removed from the system because a dev_loss_tmo timeout is received from the storage subsystem (such as Fibre Channel storage arrays).

If the system device is configured with multiple paths and the multipath no_path_retry setting is active, you should modify the storage subsystem’s dev_loss_tmo setting accordingly to ensure that no devices are removed during an all-paths-down scenario. We strongly recommend that you set the dev_loss_tmo value to be equal to or higher than the no_path_retry setting from multipath.

The recommended setting for the storage subsystem’s dev_los_tmo is

<dev_loss_tmo> = <no_path_retry> * <polling_interval>

where the following definitions apply for the multipath values:

  • no_path_retry is the number of retries for multipath I/O until the path is considered to be lost, and queuing of IO is stopped.

  • polling_interval is the time in seconds between path checks.

Each of these multipath values should be set from the /etc/multipath.conf configuration file. For information, see Section 18.6, “Multipath configuration”.

18.15 Troubleshooting MPIO

This section describes some known issues and possible solutions for MPIO.

18.15.1 Installing GRUB2 on multipath devices

On legacy BIOS systems with Btrfs, grub2-install can fail with a permission denied. To fix this, make sure that the /boot/grub2/SUBDIR/ subvolume is mounted in read-write (rw) mode. SUBDIR can be x86_64-efi or i386-pc.

18.15.2 The system exits to emergency shell at boot when multipath is enabled

During boot the system exits into the emergency shell with messages similar to the following:

[  OK  ] Listening on multipathd control socket.
         Starting Device-Mapper Multipath Device Controller...
[  OK  ] Listening on Device-mapper event daemon FIFOs.
         Starting Device-mapper event daemon...
         Expecting device dev-disk-by\x2duuid-34be48b2\x2dc21...32dd9.device...
         Expecting device dev-sda2.device...
[  OK  ] Listening on udev Kernel Socket.
[  OK  ] Listening on udev Control Socket.
         Starting udev Coldplug all Devices...
         Expecting device dev-disk-by\x2duuid-1172afe0\x2d63c...5d0a7.device...
         Expecting device dev-disk-by\x2duuid-c4a3d1de\x2d4dc...ef77d.device...
[  OK  ] Started Create list of required static device nodes ...current kernel.
         Starting Create static device nodes in /dev...
[  OK  ] Started Collect Read-Ahead Data.
[  OK  ] Started Device-mapper event daemon.
[  OK  ] Started udev Coldplug all Devices.
         Starting udev Wait for Complete Device Initialization...
[  OK  ] Started Replay Read-Ahead Data.
         Starting Load Kernel Modules...
         Starting Remount Root and Kernel File Systems...
[  OK  ] Started Create static devices
[*     ] (1 of 4) A start job is running for dev-disk-by\x2du...(7s / 1min 30s)
[*     ] (1 of 4) A start job is running for dev-disk-by\x2du...(7s / 1min 30s)

...

Timed out waiting for device dev-disk-by\x2duuid-c4a...cfef77d.device.
[DEPEND] Dependency failed for /opt.
[DEPEND] Dependency failed for Local File Systems.
[DEPEND] Dependency failed for Postfix Mail Transport Agent.
Welcome to emergency shell
Give root password for maintenance
(or press Control-D to continue):

At this stage, you are working in a temporary dracut emergency shell from the initrd environment. To make the configuration changes described below persistent, you need to perform them in the environment of the installed system.

  1. Identify what the system root (/) file system is. Inspect the content of /proc/cmdline and look for the root= parameter.

  2. Verify whether the root file system is mounted:

    > sudo systemctl status sysroot.mount
    Tip
    Tip

    dracut mounts the root file system under /sysroot by default.

    From now on, let us assume that the root file system is mounted under /sysroot.

  3. Mount system-required file systems under /sysroot, chroot into it, then mount all file systems. For example:

    > sudo for x in proc sys dev run; do mount --bind /$x /sysroot/$x; done
    > sudo chroot /sysroot /bin/bash
    > sudo mount -a

    Refer to Abschnitt 48.5.2.3, „Zugriff auf das installierte System“ for more details.

  4. Make changes to the multipath or dracut configuration as suggested in the procedures below. Remember to rebuild initrd to include the modifications.

  5. Exit the chroot environment by entering the exit command, then exit the emergency shell and reboot the server by pressing CtrlD.

Procedure 18.1: Emergency shell: blacklist file systems

This fix is required if the root file system is not on multipath but multipath is enabled. In such a setup, multipath tries to set its paths for all devices that are not blacklisted. Since the device with the root file system is already mounted, it is inaccessible for multipath and causes it to fail. Fix this issue by configuring multipath correctly by blacklisting the root device in /etc/multipath.conf:

  1. Run multipath -v2 in the emergency shell and identify the device for the root file system. It will result in an output similar to:

    # multipath -v2
    Dec 18 10:10:03 | 3600508b1001030343841423043300400: ignoring map

    The string between | and : is the WWID needed for blacklisting.

  2. Open /etc/multipath.conf and add the following:

    blacklist {
      wwid "WWID"
    }

    Replace WWID with the ID you retrieved in the previous step. For more information see Section 18.8, “Blacklisting non-multipath devices”.

  3. Rebuild the initrd using the following command:

    > dracut -f --add-multipath
Procedure 18.2: Emergency shell: rebuild the initrd

This fix is required if the multipath status (enabled or disabled) differs between initrd and system. To fix this, rebuild the initrd:

  • If multipath has been enabled in the system, rebuild the initrd with multipath support with this command:

    > dracut --force --add multipath

    In case Multipath has been disabled in the system, rebuild the initrd with Multipath support with this command:

    > dracut --force -o multipath
Procedure 18.3: Emergency shell: rebuild the initrd

This fix is required if the initrd does not contain drivers to access network attached storage. This may, for example, be the case when the system was installed without multipath or when the respective hardware was added or replaced.

  1. Add the required driver(s) to the variable force_drivers in the file /etc/dracut.conf.d/01-dist.conf. For example, if your system contains a RAID controller accessed by the hpsa driver and multipathed devices connected to a QLogic controller accessed by the driver qla23xx, this entry would look like:

    force_drivers+="hpsa qla23xx"
  2. Rebuild the initrd using the following command:

    > dracut -f --add-multipath
  3. To prevent the system from booting into emergency mode if attaching the network storage fails, it is recommended to add the mount option _netdev to the respective entries in /etc/fstab.

18.15.3 PRIO settings for individual devices fail after upgrading to multipath 0.4.9 or later

Multipath Tools from version 0.4.9 onward uses the prio setting in the defaults{} or devices{} section of the /etc/multipath.conf file. It silently ignores the keyword prio when it is specified for an individual multipath definition in the multipaths{) section.

Multipath Tools 0.4.8 allowed the prio setting in the individual multipath definition in the multipaths{) section to override the prio settings in the defaults{} or devices{} section.

18.15.4 PRIO settings with arguments fail after upgrading to multipath-tools-0.4.9 or later

When you upgrade from multipath-tools-0.4.8 to multipath-tools-0.4.9, the prio settings in the /etc/multipath.conf file are broken for prioritizers that require an argument. In multipath-tools-0.4.9, the prio keyword is used to specify the prioritizer, and the prio_args keyword is used to specify the argument for prioritizers that require an argument. Previously, both the prioritizer and its argument were specified on the same prio line.

For example, in multipath-tools-0.4.8, the following line was used to specify a prioritizer and its arguments on the same line.

prio "weightedpath hbtl [1,3]:.:.+:.+ 260 [0,2]:.:.+:.+ 20"

After upgrading to multipath-tools-0.4.9 or later, the command causes an error. The message is similar to the following:

<Month day hh:mm:ss> | Prioritizer 'weightedpath hbtl [1,3]:.:.+:.+ 260
[0,2]:.:.+:.+ 20' not found in /lib64/multipath

To resolve this problem, use a text editor to modify the prio line in the /etc/multipath.conf file. Create two lines with the prioritizer specified on the prio line, and the prioritizer argument specified on the prio_args line below it:

prio "weightedpath"
prio_args "hbtl [1,3]:.:.+:.+ 260 [0,2]:.:.+:.+ 20"

Restart the multipathd daemon for the changes to become active by running sudo systemctl restart multipathd.

18.15.5 Technical information documents

For information about troubleshooting multipath I/O issues on SUSE Linux Enterprise Server, see the following Technical Information Documents (TIDs) in the SUSE Knowledgebase: