18 Managing multipath I/O for devices #
This section describes how to manage failover and path load balancing for multiple paths between the servers and block storage devices by using Multipath I/O (MPIO).
18.1 Understanding multipath I/O #
Multipathing is the ability of a server to communicate with the same physical or logical block storage device across multiple physical paths between the host bus adapters in the server and the storage controllers for the device, typically in Fibre Channel (FC) or iSCSI SAN environments.
Linux multipathing provides connection fault tolerance and can provide load balancing across the active connections. When multipathing is configured and running, it automatically isolates and identifies device connection failures, and reroutes I/O to alternate connections.
Multipathing provides fault tolerance against connection failures, but not against failures of the storage device itself. The latter is achieved with complementary techniques like mirroring.
18.1.1 Multipath terminology #
- Storage array
A hardware device with many disks and multiple fabrics connections (controllers) that provides SAN or NAS storage to clients. Storage arrays typically have RAID and failover features and support multipathing. Historically, active/passive (failover) and active/active (load-balancing) storage array configurations were distinguished. These concepts still exist but they are merely special cases of the concepts of path groups and access states supported by modern hardware.
- Host, host system
The computer running SUSE Linux Enterprise Server which acts as a client system for a storage array.
- Multipath map, multipath device
A set of path devices. It represents a storage volume on a storage array and is seen as a single block device by the host system.
- Path device, low-level device
A member of a multipath map, typically a SCSI device. Each path device represents a unique connection between the host computer and the actual storage volume, for example, a logical unit from an iSCSI session. Under Linux device mapper multipath, path devices remain visible and accessible in the host system.
- WWID, UID, UUID
“World Wide Identifier”, “Unique Identifier”, “Universally Unique Identifier”. The WWID is a property of the storage volume and as such, it is identical between all path devices of a multipath map.
multipath-tools
uses the WWID to determine which low-level devices should be assembled into a multipath map. multipath relies onudev
to determine the WWID of path devices. The WWID of a multipath map never changes. multipath devices can be reliably accessed through/dev/disk/by-id/dm-uuid-mpath-WWID
.The WWID should be distinguished from the map name, which is configurable (see Section 18.9, “Configuring user-friendly names or alias names”).
- uevent, udev event
An event sent by the kernel to user space and processed by the
udev
subsystem. Uevents are generated when devices are added, removed, or change their properties.- Device mapper
A framework in the Linux kernel for creating virtual block devices. I/O operations to mapped devices are redirected to the underlying block devices. Device mappings may be stacked. The device mapper implements its own event signaling, also known as “device mapper events” or “dm events”.
18.2 Hardware support #
The multipathing drivers and tools are available on all architectures supported by SUSE Linux Enterprise Server. The generic, protocol-agnostic driver works with most multipath-capable storage hardware on the market. Some storage array vendors provide their own multipathing management tools. Consult the vendor’s hardware documentation to determine what settings are required.
18.2.1 Multipath implementations: device mapper and NVMe #
The traditional, generic implementation of multipathing under Linux uses the device mapper framework. For most device types like SCSI devices, device mapper multipathing is the only available implementation. Device mapper multipathing is highly configurable and flexible.
The Linux NVM Express (NVMe) kernel subsystem implements multipathing natively in the kernel. This implementation creates less computational overhead for NVMe devices, which are typically fast devices with very low latencies. Native NVMe multipathing requires no user space component. Since SLE 15, native multipathing has been the default for NVMe multipath devices.
The rest of this chapter is about device mapper multipath.
18.2.2 Storage array autodetection for multipathing #
Device mapper multipath is a generic technology. Multipath device detection requires only that the low-level (for example, SCSI) devices are detected by the kernel, and that device properties reliably identify multiple low-level devices as being different “paths” to the same volume rather than actually different devices.
The multipath-tools
package detects storage arrays
by their vendor and product names. It has validated built-in configuration
defaults for a large variety of storage products. Consult the hardware
documentation of your storage array; some vendors provide
specific recommendations for Linux multipathing configuration.
To see the built-in
settings for storage that has been detected on your system, run the
command multipath -T
, see
Section 18.4.5, “The multipath command”.
If you need to apply changes to the built-in configuration for your
storage array, create and configure the
/etc/multipath.conf
file. See
Section 18.6, “Multipath configuration”.
Note
multipath-tools
has built-in presets for many
storage arrays.
The existence of such presets for a given storage product
does not imply
that the vendor of the storage product has tested the product with
dm-multipath
, nor that the vendor endorses
or supports use of dm-multipath
with the
product. Always consult the original vendor documentation for
support-related questions.
18.2.3 Storage arrays that require specific hardware handlers #
Some storage arrays require special commands for failover from one path to the other, or non-standard error handling methods. These special commands and methods are implemented by hardware handlers in the Linux kernel. Modern SCSI storage arrays support the “Asymmetric Logical Unit Access” (ALUA) hardware handler defined in the SCSI standard. Besides ALUA, the SLE kernel contains hardware handlers for Netapp E-Series (RDAC), the Dell/EMC CLARiiON CX family of arrays, and legacy arrays from HP. Since Linux kernel 4.4, the Linux kernel has automatically detected hardware handlers for most arrays, including all arrays supporting ALUA.
18.3 Planning for multipathing #
Use the guidelines in this section when planning your multipath I/O solution.
18.3.1 Prerequisites #
The storage array you use for the multipathed device must support multipathing. For more information, see Section 18.2, “Hardware support”.
You need to configure multipathing only if multiple physical paths exist between host bus adapters in the server and host bus controllers for the block storage device.
For some storage arrays, the vendor provides its own multipathing software to manage multipathing for the array’s physical and logical devices. In this case, you should follow the vendor’s instructions for configuring multipathing for those devices.
When using multipathing in a virtualization environment, the multipathing is controlled in the host server environment. Configure multipathing for the device before you assign it to a virtual guest machine.
18.3.2 Multipath installation types and the initramfs #
18.3.2.1 Root file system on multipath (SAN-boot) #
The root file system is on a multipath device (usually, all other file systems are on multipath storage as well). This is typically the case for diskless servers that use SAN or NAS storage exclusively. On such systems, multipath support is required for booting, and multipathing must be enabled in the initramfs (initrd). See Section 18.3.2.3, “Keeping the initial RAM disk synchronized”.
18.3.2.2 Root file system on a local disk #
The root file system (and possibly some other file systems) is in local storage, for example, on a directly attached SATA disk or local RAID, but the system additionally uses file systems in the multipath SAN or NAS storage. This system type can be configured in three different ways:
- Using a root-on-multipath setup
All block devices are part of multipath maps, including the local disk. Such a setup then appears as a degraded multipath map with just one path. This configuration is created if multipathing was enabled during the initial system installation with YaST. It is the simplest configuration but has a performance overhead.
- Ignoring the local disk by
multipath-tools
In this configuration, multipathing is enabled in the initramfs. This configuration can be achieved after the installation by blacklisting, or with the
find_multipaths
configuration parameter.- Disabling multipathing in the initramfs
This setup is created if multipathing was not enabled during the initial system installation with YaST, either because YaST did not detect multipath devices or because the user opted against enabling multipath during installation. This is the only situation in which Section 18.3.2.3, “Keeping the initial RAM disk synchronized” does not apply.
18.3.2.3 Keeping the initial RAM disk synchronized #
Important
Make sure that the initial RAM disk and the booted system behave consistently regarding the use of multipathing for all block devices. Rebuild the initramfs after applying multipath configuration changes.
If multipathing is enabled in the system, it also needs to be
enabled in the initramfs
and vice versa.
The only exception to this rule is the option
Disabling multipathing in the initramfs
in
Section 18.3.2.2, “Root file system on a local disk”.
The multipath configuration must be synchronized between the booted
system and the initrd. Therefore, if /etc/multipath.conf
,
/etc/multipath/wwids
,
/etc/multipath/bindings
, or other configuration
file or udev rules related to device identification are changed, the
initial RAM FS needs to be rebuilt using the command:
>
sudo
dracut -f
If the initrd
and the system are not synchronized, the
system will not properly boot and the start-up procedure may result in an
emergency shell. See Section 18.15.2, “The system exits to emergency shell at boot when multipath is enabled” for
instructions on how to avoid or repair such a scenario.
Special care must be taken if the initial RAM disk is rebuilt in
non-standard situations, for example, from a rescue system or after booting
with the kernel parameter multipath=off
.
dracut
will automatically include multipathing support
in the initial RAM disk if and only
if it detects that the root file system is on a multipath device while
the initrd is being built. In such cases, it is necessary to enable or
disable multipathing explicitly.
To enable multipath support in the initrd
, run the command:
>
sudo
dracut --force --add multipath
To disable multipath support in initrd
, run the command:
>
sudo
dracut --force --omit multipath
18.3.3 Disk management tasks #
Use third-party SAN array management tools or the user interface of your storage array to create logical devices and assign them to hosts. Make sure to configure the host credentials correctly on both sides.
You can add or remove volumes to a running host, but detecting the changes may require rescanning SCSI targets and reconfiguring multipathing on the host.
18.3.4 Software RAID and complex storage stacks #
Multipathing is set up on top of basic storage devices such as SCSI disks. In a multi-layered storage stack, multipathing is always the bottom layer. Other layers such as software RAID, Logical Volume Management, block device encryption, etc. are layered on top of it. Therefore, for each device that has multiple I/O paths and that you plan to use in a software RAID, you must configure the device for multipathing before you attempt to create the software RAID device.
For information about setting up multipathing for existing software RAIDs, see Section 18.12, “Configuring multipath I/O for an existing software RAID”.
18.3.5 High-availability solutions #
High-availability solutions for clustering storage resources run on top of
the multipathing service on each node. Make sure that the configuration
settings in the /etc/multipath.conf
file on each node
are consistent across the cluster.
Make sure that multipath devices have the same name across all devices. Refer to Section 18.9.1, “Multipath device names in HA clusters” for details.
The Distributed Replicated Block Device (DRBD) high-availability solution for mirroring devices across a LAN runs on top of multipathing. For each device that has multiple I/O paths and that you plan to use in a DRDB solution, you must configure the device for multipathing before you configure DRBD.
Special care must be taken when using multipathing together with
clustering software that relies on shared storage for fencing, such as
pacemaker
with sbd
.
See Section 18.7, “Configuring policies for polling, queuing, and failback” for details.
18.4 Multipath management tools #
The multipathing support in SUSE Linux Enterprise Server is based on the Device Mapper
Multipath module of the Linux kernel and the
multipath-tools
user space package. You can use the
Multiple Devices Administration utility (multipath
) to
view the status of multipathed devices.
18.4.1 Device mapper multipath module #
The Device Mapper Multipath (DM-MP) module provides the generic multipathing capability for Linux. DM-MPIO is the preferred solution for multipathing on SUSE Linux Enterprise Server for SCSI and DASD devices, and can be used for NVMe devices as well.
Note: Using DM-MP for NVMe devices
Since SUSE Linux Enterprise Server 15, native NVMe multipathing
(see Section 18.2.1, “Multipath implementations: device mapper and NVMe”)
has been recommended for NVMe and used by default.
To disable native NVMe multipathing and use device mapper multipath
instead, boot with the kernel parameter
nvme-core.multipath=0
.
DM-MPIO features automatic configuration of the multipathing subsystem for a large variety of setups.
The multipath daemon multipathd
takes care of automatic
path discovery and grouping, and automated path retesting,
so that a previously failed path is automatically reinstated when it
becomes healthy again. This minimizes the need for administrator attention
in a production environment.
DM-MPIO protects against failures in the paths to the device, and not failures in the device itself. If one of the active paths is lost (for example, a network adapter breaks or a fiber-optic cable is removed), I/O is redirected to the remaining paths. If all active paths fail, inactive secondary paths must be woken up, so failover occurs with a delay of up to 30 seconds, depending on the properties of the storage array.
If every path to a given device has failed, I/O to this device can be queued in the kernel, either for a given amount of time or even indefinitely (in which case the total amount of queued IO is limited by system memory).
If a disk array has more than one storage processor, ensure that the SAN switch has a connection to the storage processor that owns the LUNs you want to access. On most disk arrays, all LUNs belong to both storage processors, so both connections are active.
Note: Storage processors
On some disk arrays, the storage array manages the traffic through storage processors so that it presents only one storage processor at a time. One processor is active and the other one is passive until there is a failure. If you are connected to the wrong storage processor (the one with the passive path) you might not see the expected LUNs, or you might see the LUNs but get errors when you try to access them.
18.4.2 Multipath I/O management tools #
The packages multipath-tools
and
kpartx
provide tools that take
care of automatic path discovery and grouping.
-
multipathd
The daemon to set up and monitor multipath maps, and a command line client to communicate with the daemon process. See Section 18.4.4, “The
multipathd
daemon and themultipath
command”.-
multipath
The command line tool for multipath operations. See Section 18.4.5, “The multipath command”.
-
kpartx
The command line tool for managing “partitions” on multipath devices. See Section 18.5.3, “Partitions on multipath devices”.
-
mpathpersist
The command line tool for managing SCSI persistent reservations. See Section 18.4.6, “The mpathpersist utility”.
18.4.3 MD RAID on multipath devices #
MD RAID arrays on top of multipathing
are set up automatically by the system's udev rules. No special
configuration in /etc/mdadm.conf
is necessary.
18.4.4 The multipathd
daemon and the multipath
command #
multipathd
is the most important part of a modern
Linux device mapper multipath setup. It is normally started through the
systemd service multipathd.service
. Socket activation
via multipathd.socket
is supported, but it's
strongly recommended to enable
multipathd.service
on systems with multipath hardware.
multipathd
serves the following tasks (some of them
depend on the configuration):
On startup, detects path devices and sets up multipath maps from detected devices.
Monitors uevents and device mapper events, adding or removing path mappings to multipath maps as necessary and initiating failover or failback operations.
Sets up new maps on the fly when new path devices are discovered.
Checks path devices at regular intervals to detect failure, and tests failed paths to reinstate them if they become operational again.
When all paths fail,
multipathd
either fails the map, or switches the map device to queuing mode for a given time interval.Handles path state changes and switches path groups or regroups paths, as necessary.
Tests paths for “marginal” state, i.e. shaky fabrics conditions that cause path state flipping between operational and non-operational.
Handles SCSI persistent reservation keys for path devices if configured. See Section 18.4.6, “The mpathpersist utility”.
multipathd
also serves as a command line
client to process interactive commands by sending them to the running
daemon. The general syntax to send commands to the daemon is as follows:
multipathd COMMAND
or
multipathd -k"COMMAND"
To enter the interactive mode with the daemon, run:
multipathd -k
Note: How multipath and multipathd work together
Many multipathd
commands have
multipath
equivalents. For example,
multipathd show topology
does the same
thing as multipath -ll
. The notable difference is
that the multipathd command inquires the internal state of the running
multipathd
daemon, whereas multipath obtains
information directly from the kernel and I/O operations.
If the multipath daemon is running, it is recommended to make
modifications to the system by using the
multipathd
commands. Otherwise, the daemon may
notice configuration changes and react to them. In some situations,
the daemon might even try to undo the applied changes. Therefore,
multipath
automatically delegates certain possibly dangerous commands, like
destroying and flushing maps, to multipathd
if
a running daemon is detected.
The list bellow describes frequently used multipathd
commands:
- show topology
Shows the current map topology and properties.
- show paths
Shows the currently known path devices.
- show paths format "FORMAT STRING"
Shows the currently known path devices using a format string. Use
show wildcards
to see a list of supported format specifiers.- show maps
Shows the currently configured map devices.
- show maps format FORMAT STRING
Shows the currently configured map devices using a format string. Use
show wildcards
to see a list of supported format specifiers.- show config local
Shows the current configuration that multipathd is using.
- reconfigure
Reread configuration files, rescan devices, and set up maps again. This is basically equivalent to a restart of
multipathd
. A few options cannot be modified without a restart. They are mentioned in the man pagemultipath.conf(5)
. Thereconfigure
command reloads only map devices that have changed in some way. To force reloading every map device, usereconfigure all
.- del map MAP DEVICE NAME
Unconfigure and delete the given map device and its partitions. MAP DEVICE NAME can be a device node name like
dm-0
, a WWID, or a map name. The command fails if the device is in use.
Additional commands are available to modify path states, enable or disable
queueing, and more. See multipathd(8)
for
details.
18.4.5 The multipath command #
Even though multipath setup is mostly automatic and
handled by multipathd
, multipath
is still
useful for some administration tasks. Several examples of the command usage
follow:
- multipath
Detects path devices and configures all multipath maps that it finds.
- multipath -d
Similar to
multipath
, but does not set up any maps (“dry run”).- multipath DEVICENAME
Configures a specific multipath device. DEVICENAME can denote a member path device by its device node name (
/dev/sdb
) or device number inmajor:minor
format. Alternatively, it can be the WWID or name of a multipath map.- multipath -f DEVICENAME
Unconfigures ("flushes") a multipath map and its partition mappings. The command will fail if the map or one of its partitions is in use. See above for possible values of DEVICENAME.
- multipath -F
Unconfigures ("flushes") all multipath maps and their partition mappings. The command will fail for maps in use.
- multipath -ll
Displays the status and topology of all currently configured multipath devices.
- multipath -ll DEVICENAME
Displays the status of a specified multipath device. See above for possible values of DEVICENAME.
- multipath -t
Shows the internal hardware table and the active configuration of multipath. Refer to
multipath.conf(5)
for details about the configuration parameters.- multipath -T
Has a similar function as the
multipath -t
command but shows only hardware entries for the hardware detected on the host.
The option -v
controls the verbosity of the output.
You can use values between 0 (only fatal errors) and 4 (verbose logging). The
default is -v2
. The verbosity
option in
/etc/multipath.conf
can be used to change the default
verbosity for both multipath
and multipathd
.
18.4.6 The mpathpersist utility #
The mpathpersist
utility is used to manage SCSI
persistent reservations on Device Mapper Multipath devices. Persistent
reservations serve to restrict access to SCSI Logical Units to certain SCSI
initiators. In multipath configurations, it is important to use the same
reservation keys for all I_T nexuses (paths) for a given volume; otherwise,
creating a reservation on one path may cause other paths to fail.
Use this utility with the reservation_key
attribute in the
/etc/multipath.conf
file to set persistent
reservations for SCSI devices. If (and only if) this option is set,
the multipathd
daemon checks
persistent reservations for newly discovered paths or reinstated paths.
You can add the attribute to the defaults
section or the
multipaths
section of
multipath.conf
.
For example:
multipaths { multipath { wwid 3600140508dbcf02acb448188d73ec97d alias yellow reservation_key 0x123abc } }
After setting reservation_key
parameter for all mpath devices
applicable for persistent management, reload the configuration using
multipathd reconfigure
.
Note:
Using “reservation_key file
”
If the special value reservation_key file
is used in
the defaults
section of
multipath.conf
, reservation keys can be managed
dynamically in the file /etc/multipath/prkeys
using
mpathpersist
.
This is the recommended way to handle persistent reservations with multipath maps. It is available from SUSE Linux Enterprise Server 12 SP4.
Use the command
mpathpersist
to query and set persistent reservations
for multipath maps consisting of SCSI devices.
Refer to the manual page mpathpersist(8)
for details. The
command line options are the same as those of the sg_persist
from
the sg3_utils
package.
The sg_persist(8)
manual page explains the
semantics of the options in detail.
In the following examples, DEVICE denotes
a device mapper multipath device like
/dev/mapper/mpatha
.
The commands below are listed with long options for better readability.
All options have single-letter replacements, like in mpathpersist
-oGS 123abc DEVICE
.
- mpathpersist --in --read-keys DEVICE
Read the registered reservation keys for the device.
- mpathpersist --in --read-reservation DEVICE
Show existing reservations for the device.
- mpathpersist --out --register --param-sark=123abc DEVICE
Register a reservation key for the device. This will add the reservation key for all I_T nexuses (path devices) on the host.
- mpathpersist --out --reserve --param-rk=123abc --prout-type=5 DEVICE
Create a reservation of type 5 (“write exclusive - registrants only”) for the device, using the previously registered key.
- mpathpersist --out --release --param-rk=123abc --prout-type=5 DEVICE
Release a reservation of type 5 for the device.
- mpathpersist --out --register-ignore --param-sark=0 DEVICE
Delete a previously existing reservation key from the device.
18.5 Configuring the system for multipathing #
18.5.1 Enabling, starting, and stopping multipath services #
To enable multipath services to start at boot time, run the following command:
>
sudo
systemctl enable multipathd
To manually start the service in the running system, enter:
>
sudo
systemctl start multipathd
To restart the service, enter:
>
sudo
systemctl restart multipathd
In most situations, restarting the service is not necessary.
To simply have multipathd
reload its configuration, run:
>
sudo
systemctl reload multipathd
To check the status of the service, enter:
>
sudo
systemctl status multipathd
To stop the multipath services in the current session, run:
>
sudo
systemctl stop multipathd>
sudo
systemctl stop multipathd.socket
Warning: Disabling multipathd
It is strongly recommended to have
multipathd.service
always enabled and running
on every host that has access to multipath hardware. However, sometimes it
may be necessary to disable the service because multipath hardware has
been removed, because some other multipathing software is going to be
deployed, or for troubleshooting purposes.
To disable multipathing just for a single system boot, use the kernel
parameter multipath=off
. This affects both the
booted system and the initial ramfs, which does not need to be rebuilt
in this case.
To disable multipathd services permanently, so that they will not be started on future system boots, run the following commands:
>
sudo
systemctl disable multipathd>
sudo
systemctl disable multipathd.socket>
sudo
dracut --force --omit multipath
(Whenever you disable or enable the multipath services,
rebuild the initrd
. See
Section 18.3.2.3, “Keeping the initial RAM disk synchronized”.)
Additionally and optionally, if you also want to make sure multipath
devices do not get set up, even when running multipath
manually, add the
following lines at the end of /etc/multipath.conf
before rebuilding the initrd:
blacklist { wwid .* }
18.5.2 Preparing SAN devices for multipathing #
Before configuring multipath I/O for your SAN devices, prepare the SAN devices, as necessary, by doing the following:
Configure and zone the SAN with the vendor’s tools.
Configure permissions for host LUNs on the storage arrays with the vendor’s tools.
If SUSE Linux Enterprise Server ships no driver for the host bus adapter (HBA), install a Linux driver from the HBA vendor. See the vendor’s specific instructions for more details.
If multipath devices are detected and
multipathd.service
is enabled, multipath maps should
be created automatically. If this does not happen, use commands like
lsscsi
to check the probing of the low-level devices.
Also, inspect the system logs with journalctl -b
.
When the LUNs are not seen by the HBA driver, check the
zoning setup in the SAN. In particular, check whether LUN masking is active
and whether the LUNs are correctly assigned to the server.
If the HBA driver can see LUNs, but no corresponding block devices are created, additional kernel parameters may be needed. See TID 3955167: Troubleshooting SCSI (LUN) Scanning Issues in the SUSE Knowledgebase at https://www.suse.com/support/kb/doc.php?id=3955167.
18.5.3 Partitions on multipath devices #
Multipath maps can have partitions like their path devices.
Partition table scanning and device node creation for partitions
is done in user space by the
kpartx
tool. kpartx
is
automatically invoked by udev rules; there is usually no need to run it
manually. Technically, “partition” devices created by kpartx are
also device mapper devices that simply map a linear range of blocks from the
parent device. The Nth partition of a
multipath device with known WWID can be accessed reliably via
/dev/disk/by-id/dm-uuid-partN-mpath-WWID
.
Note: Disabling invocation of kpartx
The skip_kpartx
option in
/etc/multipath.conf
can be used to disable
invocation of kpartx
on selected multipath maps. This
may be useful on virtualization hosts, for example.
Partition tables and partitions on multipath devices can be manipulated
as usual, using YaST or tools like fdisk
or parted
. Changes
applied to the partition table will be noted by the system when the
partitioning tool exits. If this does not work (usually because a device
is busy), try multipathd reconfigure
, or reboot the
system.
A partitioned multipath device cannot be used otherwise. For example, you cannot create an LVM physical volume from a partitioned device. You will need to wipe the partition table before doing this.
18.6 Multipath configuration #
The built-in multipath-tools
defaults work well for most
setups. If customizations are needed, a configuration file needs to be
created. The main configuration file is
/etc/multipath.conf
. In addition, files matching the pattern
/etc/multipath/conf.d/*.conf
are read in alphabetical
order. See Section 18.6.2, “multipath.conf Syntax” for precedence rules.
Note: Generated configuration files
The files /etc/multipath/wwids
,
/etc/multipath/bindings
, and
/etc/multipath/prkeys
are maintained by
multipath-tools
to store persistent information
about previously created multipath maps, map names, and reservation keys
for SCSI persistent reservations, respectively.
Do not edit these generated configuration files.
Note: Configurable paths
Except for /etc/multipath.conf
, the paths of
the configuration directories and files are configurable,
but changing these paths is strongly discouraged.
18.6.1 Creating the /etc/multipath.conf file #
You can generate a multipath.conf
template
from the built-in defaults. This makes all default settings explicit.
The behavior of multipath-tools
will not change
unless the generated file is modified. To generate the configuration
template, run:
multipath -T >/etc/multipath.conf
Alternatively, you can create a minimal
/etc/multipath.conf
that just contains
those settings you want to change. The behavior will be identical
to modifying only the respective lines in the generated template.
18.6.2 multipath.conf Syntax #
The /etc/multipath.conf
file uses a hierarchy
of sections, subsections, and attribute/value pairs.
Whitespace separates tokens. Consecutive whitespace characters are collapsed into a single space, unless quoted (see below).
The hash (
#
) and exclamation mark (!
) characters cause the rest of the line to be discarded as a comment.Sections and subsections are started with a section name and an opening brace (
{
) on the same line, and end with a closing brace (}
) on a line on its own.Attributes and values are written on one line. Line continuations are unsupported.
Attributes and section names must be keywords. The allowed keywords are documented in
multipath.conf(5)
.Values may be enclosed in double quotes (
"
). They must be enclosed in quotes if they contain whitespace or comment characters. A double quote character inside a value is represented by a pair of double quotes (""
).The values of some attributes are POSIX regular expressions (see
regex(7)
). They are case sensitive and not anchored, so “bar
” matches “rhabarber
”.
Syntax Example#
section { subsection { attr1 value attr2 "complex value!" attr3 "value with ""quoted"" word" } ! subsection end } # section end
Precedence Rules#
As noted at the beginning of Section 18.6, “Multipath configuration”, it is possible to have multiple
configuration files. The additional files follow the same syntax rules as
/etc/multipath.conf
. Sections and attributes can
occur multiple times. If the same attribute is set in multiple files,
or on multiple lines in the same file, the last value read takes precedence.
18.6.3 /etc/multipath.conf
sections #
The /etc/multipath.conf
file is organized into the
following sections. Some attributes can occur in more than one section.
See multipath.conf(5)
for details.
- defaults
General default settings.
- blacklist
Lists devices to ignore. See Section 18.8, “Blacklisting non-multipath devices”.
- blacklist_exceptions
Lists devices to be multipathed even though they are matched by the blacklist. See Section 18.8, “Blacklisting non-multipath devices”.
- devices
Settings specific to the storage controller. This section is a collection of
device
subsections. Values in this section override values for the same attributes in thedefaults
section.- multipaths
Settings for individual multipath devices. This section is a list of
multipath
subsections. Values override thedefaults
anddevices
sections.- overrides
Settings that override values from all other sections.
18.6.4 Applying /etc/multipath.conf
modifications #
To apply the configuration changes, run
>
sudo
multipathd reconfigure
Do not forget to synchronize with the configuration in the initrd. See Section 18.3.2.3, “Keeping the initial RAM disk synchronized”.
Warning: Do not apply settings using multipath
Do not apply new settings with the multipath
command while multipathd
is running. This
may result in an inconsistent and possibly broken setup.
Note: Verifying a modified setup
It is possible to test modified settings first before they are applied, by running:
multipath -d -v2
This command shows new maps to be created with the proposed topology. However, the command does not show whether maps will be removed/flushed. To obtain even more information, run this command:
multipath -d -v3 2>&1 | less
18.6.5 Generating a WWID #
To identify a device over different paths, multipath uses a World Wide
Identification (WWID) for each device. If the WWID is the same for two
device paths, they are assumed to represent the same device. We recommend
not changing the method of WWID generation, unless there is a compelling
reason to do so. For more details, see man
multipath.conf
.
18.7 Configuring policies for polling, queuing, and failback #
This section discusses the most important
multipath-tools
configuration parameters for
achieving fault tolerance.
- polling_interval
The time interval (in seconds) between health checks for path devices. The default is 5 seconds. Failed devices are checked at this time interval. For healthy devices, the time interval may be increased up to
max_polling_interval
seconds.- no_path_retry
Determine what happens if all paths of a given multipath map have failed or disappeared. The possible values are:
- fail
Fail I/O on the multipath map. This will cause I/O errors in upper layers such as mounted file systems. The affected file systems, and possibly the entire host, will enter degraded mode.
- queue
I/O on the multipath map is queued in the device-mapper layer and sent to the device when path devices become available again. This is the safest option to avoid losing data, but it can have negative effects if the path devices don't get reinstated for a long time. Processes reading from the device will hang in uninterruptible sleep (
D
) state. Queued data occupies memory, which becomes unavailable for processes. Eventually, memory will be exhausted.- N
N is a positive integer. Keep the map device in queuing mode for N polling intervals. When the time elapses,
multipathd
fails the map device. Ifpolling_interval
is 5 seconds andno_path_retry
is 6,multipathd
will queue I/O for approximately 6 * 5s = 30s before failing I/O on the map device. A carefully chosen timeout value is often a good compromise betweenfail
andqueue
.
The goal of multipath I/O is to provide connectivity fault tolerance between the storage system and the server. The desired default behavior depends on whether the server is a stand-alone server or a node in a high-availability cluster.
When you configure multipath I/O for a stand-alone server, the
no_path_retry
setting protects the server operating
system from receiving I/O errors as long as possible. It queues messages
until a multipath failover occurs and provides a healthy connection.
When you configure multipath I/O for a node in a high-availability cluster,
you want multipath to report the I/O failure to trigger the resource
failover instead of waiting for a multipath failover to be resolved. In
cluster environments, you must modify the no_path_retry
setting so that the cluster node receives an I/O error in relation
to the cluster verification process (recommended to be 50% of the heartbeat
tolerance) if the connection is lost to the storage system. In addition, you
want the multipath I/O fallback to be set to manual to avoid a ping-pong of
resources because of path failures.
The /etc/multipath.conf
file should contain a
defaults
section where you can specify default behaviors
for polling, queuing, and failback. If the field is not otherwise specified
in a device
section, the default setting is applied for
that SAN configuration.
The following are the compiled default settings. They will be used unless
you overwrite these values by creating and configuring a personalized
/etc/multipath.conf
file.
defaults { verbosity 2 # udev_dir is deprecated in SLES 11 SP3 # udev_dir /dev polling_interval 5 # path_selector default value is service-time in SLES 11 SP3 # path_selector "round-robin 0" path selector "service-time 0" path_grouping_policy failover # getuid_callout is deprecated in SLES 11 SP3 and replaced with uid_attribute # getuid_callout "/usr/lib/udev/scsi_id --whitelisted --device=/dev/%n" # uid_attribute is new in SLES 11 SP3 uid_attribute "ID_SERIAL" prio "const" prio_args "" features "0" path_checker "tur" alias_prefix "mpath" rr_min_io_rq 1 max_fds "max" rr_weight "uniform" queue_without_daemon "yes" flush_on_last_del "no" user_friendly_names "no" fast_io_fail_tmo 5 bindings_file "/etc/multipath/bindings" wwids_file "/etc/multipath/wwids" log_checker_err "always" retain_attached_hw_handler "no" detect_prio "no" failback "manual" no_path_retry "fail" }
For information about setting the polling, queuing, and failback policies, see the following parameters in Section 18.10, “Configuring path failover policies and priorities”:
After you have modified the /etc/multipath.conf
file,
you must run dracut
-f
to re-create the
initrd
on your system, then restart the server for the
changes to take effect. See Section 18.6.4, “Applying /etc/multipath.conf
modifications”
for details.
18.8 Blacklisting non-multipath devices #
The /etc/multipath.conf
file can contain a
blacklist
section where all non-multipath devices are
listed. You can blacklist devices by WWID (wwid
keyword),
device name (devnode
keyword), or device type
(device
section). You can also use the
blacklist_exceptions
section to enable multipath for some
devices that are blacklisted by the regular expressions used in the
blacklist
section.
Note: Preferred blacklisting methods
The preferred method for blacklisting devices is by WWID or by vendor and product. Blacklisting by devnode is not recommended, because device nodes can change and thus are not useful for persistent device identification.
Warning: Regular expressions in multipath.conf
Regular expressions in the /etc/multipath.conf
do
not work in general. They only work if they are
matched against common strings. However, the standard configuration of
multipath already contains regular expressions for many devices and
vendors. Matching regular expressions with other regular expressions does
not work. Make sure that you are only matching against strings shown with
multipath -t
.
You can typically ignore non-multipathed devices, such as
hpsa
, fd
, hd
,
md
, dm
, sr
,
scd
, st
, ram
,
raw
, and loop
. For example, local SATA
hard disks and flash disks do not have multiple paths. If you want
multipath
to ignore single-path devices, put them in the
blacklist
section.
Note: Compatibility
The keyword devnode_blacklist
has been deprecated and
replaced with the keyword blacklist
.
With SUSE Linux Enterprise Server 12 the glibc-provided regular expressions are used. To match an
arbitrary string, you must now use ".*"
rather than
"*"
.
For example, to blacklist local devices and all arrays from the
hpsa
driver from being managed by multipath, the
blacklist
section looks like this:
blacklist { wwid "26353900f02796769" devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^sd[a-z][0-9]*" }
You can also blacklist only the partitions from a driver instead of the entire array. For example, you can use the following regular expression to blacklist only partitions from the cciss driver and not the entire array:
blacklist { devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]" }
You can blacklist by specific device types by adding a
device
section in the blacklist, and using the
vendor
and product
keywords.
blacklist { device { vendor "DELL" product ".*" } }
You can use a blacklist_exceptions
section to enable
multipath for some devices that were blacklisted by the regular expressions
used in the blacklist
section. You add exceptions by WWID
(wwid
keyword), device name (devnode
keyword), or device type (device
section). You must
specify the exceptions in the same way that you blacklisted the
corresponding devices. That is, wwid
exceptions apply to
a wwid
blacklist, devnode
exceptions
apply to a devnode
blacklist, and device type exceptions
apply to a device type blacklist.
For example, you can enable multipath for a desired device type when you
have different device types from the same vendor. Blacklist all of the
vendor’s device types in the blacklist
section, and
then enable multipath for the desired device type by adding a
device
section in a
blacklist_exceptions
section.
blacklist { devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*" device { vendor "DELL" product ".*" } } blacklist_exceptions { device { vendor "DELL" product "MD3220i" } }
You can also use the blacklist_exceptions to enable multipath only for specific devices. For example:
blacklist { wwid ".*" } blacklist_exceptions { wwid "3600d0230000000000e13955cc3751234" wwid "3600d0230000000000e13955cc3751235" }
After you have modified the /etc/multipath.conf
file,
you must run dracut
-f
to re-create the
initrd
on your system, then restart the server for the
changes to take effect. See Section 18.6.4, “Applying /etc/multipath.conf
modifications”
for details.
Following the reboot, the local devices should no longer be listed in the
multipath maps when you issue the multipath -ll
command.
Note: Using the find_multipaths
option
Starting with SUSE Linux Enterprise Server 12 SP2, the multipath tools support the option
find_multipaths
in the defaults
section of /etc/multipath.conf
. This option prevents
multipath and multipathd
from
setting up multipath maps for devices with only a single path (see
man 5 multipath.conf
for details). In certain
configurations, this may save the administrator from needing to create
blacklist entries, for example for local SATA disks.
Convenient as it seems at first, using the
find_multipaths
option also has disadvantages. It
complicates and slows down the system boot, because for every device found,
the boot logic needs to wait until all devices have been discovered to see
whether a second path exists for the device. Additionally, problems can
arise when some paths are down or otherwise invisible at boot time—a
device can be falsely detected as a single-path device and activated,
causing later addition of more paths to fail.
find_multipaths
considers all devices that are listed in
/etc/multipath/wwids
with matching WWIDs as being
multipath devices. This is important when
find_multipaths
is first activated: Unless
/etc/multipath/wwids
is deleted or edited, activating
this option has no effect, because all previously existing multipath maps
(including single-path ones) are listed in the wwids file. On SAN-boot
systems with a multipathed root file system, make sure to keep
/etc/multipath/wwids
synchronized between the initial
RAM disk and the file system.
In summary, using find_multipaths
may be convenient in
certain use cases, but SUSE still recommends the default configuration
with a properly configured blacklist and blacklist exceptions.
18.9 Configuring user-friendly names or alias names #
A multipath device can be identified by its WWID, by a user-friendly name,
or by an alias that you assign for it. Device node names in the form of
/dev/sdn
and /dev/dm-n
can change
on reboot and might be assigned to different devices each time. A device’s
WWID, user-friendly name, and alias name persist across reboots, and are the
preferred way to identify the device.
Important: Using persistent names is recommended
Because device node names in the form of /dev/sdn
and
/dev/dm-n
can change on reboot, referring to multipath
devices by their WWID is preferred. You can also use a user-friendly name
or alias that is mapped to the WWID to identify the device uniquely across
reboots.
The following table describes the types of device names that can be used for
a device in the /etc/multipath.conf
file. For an
example of multipath.conf
settings, see the
/usr/share/doc/packages/multipath-tools/multipath.conf.synthetic
file.
Table 18.1: Comparison of multipath device name types #
Name Types |
Description |
---|---|
WWID (default) |
The serial WWID (Worldwide Identifier) is an identifier for the
multipath device that is guaranteed to be globally unique and
unchanging. The default name used in multipathing is the ID of the
logical unit as found in the |
User-friendly |
The Device Mapper Multipath device names in the
|
Alias |
An alias name is a globally unique name that the administrator provides
for a multipath device. Alias names override the WWID and the
user-friendly
If you are using user_friendly_names, do not set the alias to mpathN format. This may conflict with an automatically assigned user-friendly name, and give you incorrect device node names. |
The global multipath user_friendly_names
option in the
/etc/multipath.conf
file is used to enable or disable
the use of user-friendly names for multipath devices. If it is set to
no
(the default), multipath uses the WWID as the name of
the device. If it is set to yes
, multipath uses the
/var/lib/multipath/bindings
file to assign a persistent
and unique name to the device in the form of
mpath<N>
in the
/dev/mapper
directory. The bindings
file
option in the /etc/multipath.conf
file can
be used to specify an alternate location for the
bindings
file.
The global multipath alias
option in the
/etc/multipath.conf
file is used to explicitly assign a
name to the device. If an alias name is set up for a multipath device, the
alias is used instead of the WWID or the user-friendly name.
Using the user_friendly_names
option can be problematic
in the following situations:
- Root device is using multipath:
If the system root device is using multipath and you use the
user_friendly_names
option, the user-friendly settings in the/var/lib/multipath/bindings
file are included in theinitrd
. If you later change the storage setup, such as by adding or removing devices, there is a mismatch between the bindings setting inside theinitrd
and the bindings settings in/var/lib/multipath/bindings
.Warning: Binding mismatches
A bindings mismatch between
initrd
and/var/lib/multipath/bindings
can lead to a wrong assignment of mount points to devices, which can result in file system corruption and data loss.To avoid this problem, we recommend that you use the default WWID settings for the system root device. You should not use aliases for the system root device. Because the device name would differ, using an alias causes you to lose the ability to seamlessly switch off multipathing via the kernel command line.
- Mounting /var from another partition:
The default location of the
user_friendly_names
configuration file is/var/lib/multipath/bindings
. If the/var
data is not located on the system root device but mounted from another partition, thebindings
file is not available when setting up multipathing.Make sure that the
/var/lib/multipath/bindings
file is available on the system root device and multipath can find it. For example, this can be done as follows:Move the
/var/lib/multipath/bindings
file to/etc/multipath/bindings
.Set the
bindings_file
option in thedefaults
section of /etc/multipath.conf
to this new location. For example:defaults { user_friendly_names yes bindings_file "/etc/multipath/bindings" }
- Multipath is in the initrd:
Even if the system root device is not on multipath, it is possible for multipath to be included in the
initrd
. For example, this can happen if the system root device is on LVM. If you use theuser_friendly_names
option and multipath is in theinitrd
, you should boot with the parametermultipath=off
to avoid problems.This disables multipath only in the
initrd
during system boots. After the system boots, theboot.multipath
andmultipathd
boot scripts can activate multipathing.- Multipathing in HA clusters:
See Section 18.9.1, “Multipath device names in HA clusters” for details.
To enable user-friendly names or to specify aliases:
Open the
/etc/multipath.conf
file in a text editor withroot
privileges.(Optional) Modify the location of the
/var/lib/multipath/bindings
file.The alternate path must be available on the system root device where multipath can find it.
Move the
/var/lib/multipath/bindings
file to/etc/multipath/bindings
.Set the
bindings_file
option in thedefaults
section of /etc/multipath.conf
to this new location. For example:defaults { user_friendly_names yes bindings_file "/etc/multipath/bindings" }
(Optional, not recommended) Enable user-friendly names:
Uncomment the
defaults
section and its ending bracket.Uncomment the
user_friendly_names option
, then change its value from No to Yes.For example:
## Use user-friendly names, instead of using WWIDs as names. defaults { user_friendly_names yes }
(Optional) Specify your own names for devices by using the
alias
option in themultipath
section.For example:
## Use alias names, instead of using WWIDs as names. multipaths { multipath { wwid 36006048000028350131253594d303030 alias blue1 } multipath { wwid 36006048000028350131253594d303041 alias blue2 } multipath { wwid 36006048000028350131253594d303145 alias yellow1 } multipath { wwid 36006048000028350131253594d303334 alias yellow2 } }
Important: WWID compared to WWN
When you define device aliases in the
/etc/multipath.conf
file, ensure that you use each device’s WWID (such as3600508e0000000009e6baa6f609e7908
) and not its WWN, which replaces the first character of a device ID with0x
, such as0x600508e0000000009e6baa6f609e7908
.Save your changes, then close the file.
After you have modified the
/etc/multipath.conf
file, you must rundracut
-f
to re-create theinitrd
on your system, then restart the server for the changes to take effect. See Section 18.6.4, “Applying/etc/multipath.conf
modifications” for details.
To use the entire LUN directly (for example, if you are using the SAN
features to partition your storage), you can use the
/dev/disk/by-id/xxx
names for mkfs
,
/etc/fstab
, your application, and so on. Partitioned
devices have _part<n>
appended to the device
name, such as /dev/disk/by-id/xxx_part1
.
In the /dev/disk/by-id
directory, the multipath-mapped
devices are represented by the device’s dm-uuid*
name
or alias name (if you assign an alias for it in the
/etc/multipath.conf
file). The
scsi-
and wwn-
device names
represent physical paths to the devices.
18.9.1 Multipath device names in HA clusters #
Make sure that multipath devices have the same name across all devices by doing the following:
Use UUID and alias names to ensure that multipath device names are consistent across all nodes in the cluster. Alias names must be unique across all nodes. Copy the
/etc/multipath.conf
file from the node to the/etc/
directory for all of the other nodes in the cluster.When using links to multipath-mapped devices, ensure that you specify the
dm-uuid*
name or alias name in the/dev/disk/by-id
directory, and not a fixed path instance of the device. For information, see Section 18.9, “Configuring user-friendly names or alias names”.Set the
user_friendly_names
configuration option tono
to disable it. A user-friendly name is unique to a node, but a device might not be assigned the same user-friendly name on every node in the cluster.
Note: User-friendly names
If you really need to use user-friendly names, you can force the system-defined user-friendly names to be consistent across all nodes in the cluster by doing the following:
In the
/etc/multipath.conf
file on one node:Set the
user_friendly_names
configuration option toyes
to enable it.Multipath uses the
/var/lib/multipath/bindings
file to assign a persistent and unique name to the device in the form ofmpath<N>
in the/dev/mapper
directory.(Optional) Set the
bindings_file
option in thedefaults
section of the/etc/multipath.conf
file to specify an alternate location for thebindings
file.The default location is
/var/lib/multipath/bindings
.
Set up all of the multipath devices on the node.
Copy the
/etc/multipath.conf
file from the node to the/etc/
directory of all the other nodes in the cluster.Copy the
bindings
file from the node to thebindings_file
path on all of the other nodes in the cluster.After you have modified the
/etc/multipath.conf
file, you must rundracut
-f
to re-create theinitrd
on your system, then restart the node for the changes to take effect. See Section 18.6.4, “Applying/etc/multipath.conf
modifications” for details. This applies to all affected nodes.
18.10 Configuring path failover policies and priorities #
In a Linux host, when there are multiple paths to a storage controller, each
path appears as a separate block device, and results in multiple block
devices for single LUN. The Device Mapper Multipath service detects multiple
paths with the same LUN ID, and creates a new multipath device with that ID.
For example, a host with two HBAs attached to a storage controller with two
ports via a single unzoned Fibre Channel switch sees four block devices:
/dev/sda
, /dev/sdb
,
/dev/sdc
, and /dev/sdd
. The Device
Mapper Multipath service creates a single block device,
/dev/mpath/mpath1
, that reroutes I/O through those four
underlying block devices.
This section describes how to specify policies for failover and configure
priorities for the paths. Note that after you have modified the
/etc/multipath.conf
file, you must run
dracut
-f
to re-create the
initrd
on your system, then restart the server for the
changes to take effect. See Section 18.6.4, “Applying /etc/multipath.conf
modifications”
for details.
18.10.1 Configuring the path failover policies #
Use the multipath
command with the -p
option to set the path failover policy:
>
sudo
multipath DEVICENAME -p POLICY
Replace POLICY with one of the following policy options:
Table 18.2: Group policy options for the multipath -p command #
Policy Option |
Description |
---|---|
failover |
(Default) One path per priority group. |
multibus |
All paths in one priority group. |
group_by_serial |
One priority group per detected serial number. |
group_by_prio |
One priority group per path priority value. Priorities are determined
by callout programs specified as a global, per-controller, or
per-multipath option in the |
group_by_node_name |
One priority group per target node name. Target node names are fetched
in the |
18.10.2 Configuring failover priorities #
You must manually enter the failover priorities for the device in the
/etc/multipath.conf
file. Examples for all settings
and options can be found in the
/usr/share/doc/packages/multipath-tools/multipath.conf.annotated
file.
18.10.2.1 Understanding priority groups and attributes #
A priority group is a collection of paths that go to
the same physical LUN. By default, I/O is distributed in a round-robin
fashion across all paths in the group. The multipath
command automatically creates priority groups for each LUN in the SAN
based on the path_grouping_policy
setting for that SAN.
The multipath
command multiplies the number of paths in
a group by the group’s priority to determine which group is the primary.
The group with the highest calculated value is the primary. When all paths
in the primary group are failed, the priority group with the next highest
value becomes active.
A path priority is an integer value assigned to a path. The higher the value, the higher the priority. An external program is used to assign priorities for each path. For a given device, the paths with the same priorities belong to the same priority group.
The prio
setting is used in the
defaults{}
or devices{}
section of
the /etc/multipath.conf
file. It is silently ignored
when it is specified for an individual multipath
definition in the multipaths{)
section. The
prio
line specifies the prioritizer. If the prioritizer
requires an argument, you specify the argument by using the
prio_args
keyword on a second line.
PRIO Settings for the Defaults or Devices Sections#
prio
Specifies the prioritizer program to call to obtain a path priority value. Weights are summed for each path group to determine the next path group to use in case of failure.
Use the
prio_args
keyword to specify arguments if the specified prioritizer requires arguments.If no
prio
keyword is specified, all paths are equal. The default setting isconst
with aprio_args
setting with no value.prio "const" prio_args ""
Example prioritizer programs include:
Prioritizer Program
Description
alua
Generates path priorities based on the SCSI-3 ALUA settings.
const
Generates the same priority for all paths.
emc
Generates the path priority for EMC arrays.
hdc
Generates the path priority for Hitachi HDS Modular storage arrays.
hp_sw
Generates the path priority for Compaq/HP controller in active/standby mode.
ontap
Generates the path priority for NetApp arrays.
random
Generates a random priority for each path.
rdac
Generates the path priority for LSI/Engenio RDAC controller.
weightedpath
Generates the path priority based on the weighted values you specify in the arguments for
prio_args
.path_latency
Generates the path priority based on a latency algorithm, which is configured with the
prio_args
keyword.
prio_args
arguments#
These are the arguments for the prioritizer programs that require
arguments. Most prio
programs do not need arguments.
There is no default. The values depend on the prio
setting and whether the prioritizer requires any of the following
arguments:
- weighted
Requires a value of the form
[hbtl|devname|serial|wwn]
REGEX1 PRIO1 REGEX2 PRIO2...Regex must be of SCSI H:B:T:L format, for example 1:0:.:. and *:0:0:., with a weight value, where H, B, T, L are the host, bus, target, and LUN IDs for a device. For example:
prio "weightedpath" prio_args "hbtl 1:.:.:. 2 4:.:.:. 4"
- devname
Regex is in device name format. For example: sda, sd.e
- serial
Regex is in serial number format. For example: .*J1FR.*324. Look up your serial number with the
multipathd show paths format %z
command. (multipathd show wildcards
displays allformat
wildcards.)- alua
If
exclusive_pref_bit
is set for a device (alua exclusive_pref_bit
), paths with thepreferred path
bit set will always be in their own path group.- path_latency
path_latency
adjusts latencies between remote and local storage arrays if both arrays use the same type of hardware. Usually the latency on the remote array will be higher, so you can tune the latency to bring them closer together. This requires a value pair of the formio_num=20 base_num=10
.io_num
is the number of read IOs sent to the current path continuously, which are used to calculate the average path latency. Valid values are integers from 2 to 200.base_num
is the logarithmic base number, used to partition different priority ranks. Valid values are integers from 2 to 10. The maximum average latency value is 100s, the minimum is 1 μs. For example, ifbase_num=10
, the paths will be grouped in priority groups with path latency <=1 μs, (1 μs, 10 μs], (10 μs, 100 μs), (100 μs, 1 ms), (1 ms, 10 ms), (10 ms, 100 ms), (100 ms, 1 s), (1 s, 10 s), (10 s, 100 s), >100 s.
Multipath Attributes#
Multipath attributes are used to control the behavior of multipath I/O for
devices. You can specify attributes as defaults for all multipath devices.
You can also specify attributes that apply only to a given multipath
device by creating an entry for that device in the
multipaths
section of the multipath configuration file.
user_friendly_names
Specifies whether to use world-wide IDs (WWIDs) or to use the
/var/lib/multipath/bindings
file to assign a persistent and unique alias to the multipath devices in the form of/dev/mapper/mpathN
.This option can be used in the
devices
section and themultipaths
section.Value
Description
no
(Default) Use the WWIDs shown in the
/dev/disk/by-id/
location.yes
Autogenerate user-friendly names as aliases for the multipath devices instead of the actual ID.
failback
Specifies whether to monitor the failed path recovery, and indicates the timing for group failback after failed paths return to service.
When the failed path recovers, the path is added back into the multipath-enabled path list based on this setting. Multipath evaluates the priority groups, and changes the active priority group when the priority of the primary path exceeds the secondary group.
Value
Description
manual
(Default) The failed path is not monitored for recovery. The administrator runs the
multipath
command to update enabled paths and priority groups.followover
Only perform automatic failback when the first path of a pathgroup becomes active. This keeps a node from automatically failing back when another node requested the failover.
immediate
When a path recovers, enable the path immediately.
N
When the path recovers, wait N seconds before enabling the path. Specify an integer value greater than 0.
We recommend failback setting of
manual
for multipath in cluster environments to prevent multipath failover ping-pong.failback "manual"
Important: Verification
Make sure that you verify the failback setting with your storage system vendor. Different storage systems can require different settings.
no_path_retry
Specifies the behaviors to use on path failure.
Value
Description
N
Specifies the number of retries until
multipath
stops the queuing and fails the path. Specify an integer value greater than 0.In a cluster, you can specify a value of “0” to prevent queuing and allow resources to fail over.
fail
Specifies immediate failure (no queuing).
queue
Never stop queuing (queue forever until the path comes alive).
We recommend a retry setting of
fail
or0
in the/etc/multipath.conf
file when working in a cluster. This causes the resources to fail over when the connection is lost to storage. Otherwise, the messages queue and the resource failover cannot occur.no_path_retry "fail" no_path_retry "0"
Important: Verification
Make sure that you verify the retry settings with your storage system vendor. Different storage systems can require different settings.
path_checker
Determines the state of the path.
Value
Description
directio
Reads the first sector that has direct I/O. This is useful for DASD devices. Logs failure messages in the
systemd
journal (see Kapitel 21,journalctl
: Abfragen dessystemd
-Journals).tur
Issues an SCSI test unit ready command to the device. This is the preferred setting if the LUN supports it. On failure, the command does not fill up the
systemd
log journal with messages.CUSTOM_VENDOR_VALUE
Some SAN vendors provide custom path_checker options:
cciss_tur
: Checks the path state for HP Smart Storage Arrays.emc_clariion
: Queries the EMC Clariion EVPD page 0xC0 to determine the path state.hp_sw
: Checks the path state (Up, Down, or Ghost) for HP storage arrays with Active/Standby firmware.rdac
: Checks the path state for the LSI/Engenio RDAC storage controller.
path_grouping_policy
Specifies the path grouping policy for a multipath device hosted by a given controller.
Value
Description
failover
(Default) One path is assigned per priority group so that only one path at a time is used.
multibus
All valid paths are in one priority group. Traffic is load-balanced across all active paths in the group.
group_by_prio
One priority group exists for each path priority value. Paths with the same priority are in the same priority group. Priorities are assigned by an external program.
group_by_serial
Paths are grouped by the SCSI target serial number (controller node WWN).
group_by_node_name
One priority group is assigned per target node name. Target node names are fetched in
/sys/class/fc_transport/target*/node_name
.path_selector
Specifies the path-selector algorithm to use for load balancing.
Value
Description
round-robin 0
The load-balancing algorithm used to balance traffic across all active paths in a priority group.
queue-length 0
A dynamic load balancer that balances the number of in-flight I/O on paths similar to the least-pending option.
service-time 0
(Default) A service-time oriented load balancer that balances I/O on paths according to the latency.
- pg_timeout
Specifies path group timeout handling. No value can be specified; an internal default is set.
polling_interval
Specifies the time in seconds between the end of one path checking cycle and the beginning of the next path checking cycle.
Specify an integer value greater than 0. The default value is 5. Make sure that you verify the polling_interval setting with your storage system vendor. Different storage systems can require different settings.
rr_min_io_rq
Specifies the number of I/O requests to route to a path before switching to the next path in the current path group, using request-based device-mapper-multipath.
Specify an integer value greater than 0. The default value is 1.
rr_min_io_rq "1"
rr_weight
Specifies the weighting method to use for paths.
Value
Description
uniform
(Default) All paths have the same round-robin weights.
priorities
Each path’s weight is determined by the path’s priority times the rr_min_io_rq setting.
uid_attribute
A udev attribute that provides a unique path identifier. The default value is
ID_SERIAL
.
18.10.2.2 Configuring for round-robin load balancing #
All paths are active. I/O is configured for some number of seconds or some number of I/O transactions before moving to the next open path in the sequence.
18.10.2.3 Configuring for single path failover #
A single path with the highest priority (lowest value setting) is active for traffic. Other paths are available for failover, but are not used unless failover occurs.
18.10.2.4 Grouping I/O paths for round-robin load balancing #
Multiple paths with the same priority fall into the active group. When all paths in that group fail, the device fails over to the next highest priority group. All paths in the group share the traffic load in a round-robin load balancing fashion.
18.10.3 Reporting target path groups #
Use the SCSI Report Target Port Groups (sg_rtpg(8)
)
command. For information, see the man page for
sg_rtpg(8)
.
18.11 Configuring multipath I/O for the root device #
Device Mapper Multipath I/O (DM-MPIO) is available and supported for
/boot
and /root
in SUSE Linux Enterprise Server.
In addition, the YaST partitioner in the YaST installer supports
enabling multipath during the install.
18.11.1 Enabling multipath I/O at install time #
To install the operating system on a multipath device, the multipath
software must be running at install time. The
multipathd
daemon is not
automatically active during the system installation. You can start it by
using the option in the YaST
Partitioner.
18.11.1.1 Enabling multipath I/O at install time on an active/active multipath storage LUN #
Choose
on the screen during the installation.Select the
main icon, click the button, then select .Start multipath.
YaST starts to rescan the disks and shows available multipath devices (such as
/dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65
). This is the device that should be used for all further processing.Click
to continue with the installation.
18.11.1.2 Enabling multipath I/O at install time on an active/passive multipath storage LUN #
The multipathd
daemon is not
automatically active during the system installation. You can start it by
using the option in the YaST
Partitioner.
To enable multipath I/O at install time for an active/passive multipath storage LUN:
Choose
on the screen during the installation.Select the
main icon, click the button, then select .Start multipath.
YaST starts to rescan the disks and shows available multipath devices (such as
/dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65
). This is the device that should be used for all further processing. Write down the device path and UUID; you will need it later.Click
to continue with the installation.After all settings are done and the installation is finished, YaST starts to write the boot loader information, and displays a countdown to restart the system. Stop the counter by clicking the Ctrl–Alt–F5 to access a console.
button and pressUse the console to determine if a passive path was entered in the
/boot/grub2/device.map
file for thehd0
entry.This is necessary because the installation does not distinguish between active and passive paths.
Mount the root device to
/mnt
by entering>
sudo
mount /dev/disk/by-id/UUID;_part2 /mntFor example, enter
>
sudo
mount /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65_part2 /mntMount the boot device to
/mnt/boot
by entering>
sudo
mount /dev/disk/by-id/UUID_part1 /mnt/bootFor example, enter
>
sudo
mount /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65_part2 /mnt/bootIn the
/mnt/boot/grub2/device.map
file, determine if thehd0
entry points to a passive path, then do one of the following:Active path: No action is needed. Skip all remaining steps and return to the YaST graphical environment by pressing Ctrl–Alt–F7 and continue with the installation.
Passive path: The configuration must be changed and the boot loader must be reinstalled.
If the
hd0
entry points to a passive path, change the configuration and reinstall the boot loader:Enter the following commands at the console prompt:
mount -o bind /dev /mnt/dev mount -o bind /sys /mnt/sys mount -o bind /proc /mnt/proc chroot /mnt
At the console, run
multipath -ll
, then check the output to find the active path.Passive paths are flagged as
ghost
.In the
/boot/grub2/device.map
file, change thehd0
entry to an active path, save the changes, and close the file.Reinstall the boot loader by entering
grub-install /dev/disk/by-id/UUID_part1 /mnt/boot
For example, enter
grub-install /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65_part2 /mnt/boot
Enter the following commands:
exit umount /mnt/* umount /mnt
Return to the YaST graphical environment by pressing Ctrl–Alt–F7.
Click
to continue with the installation reboot.
18.11.2 Enabling multipath I/O for an existing root device #
Install Linux with only a single path active, preferably one where the
by-id
symbolic links are listed in the partitioner.Mount the devices by using the
/dev/disk/by-id
path used during the install.Open or create
/etc/dracut.conf.d/10-mp.conf
and add the following line (mind the leading empty space):force_drivers+=" dm-multipath"
For IBM Z, before running
dracut
, edit the/etc/zipl.conf
file to change the by-path information inzipl.conf
with the same by-id information that was used in/etc/fstab
.Run
dracut
-f
to update theinitrd
image.For IBM Z, after running
dracut
, runzipl
.Reboot the server.
18.11.3 Disabling multipath I/O on the root device #
Add multipath=off
to the kernel command line. This can
be done with the YaST Boot Loader module. Open › and add the parameter to both command lines.
This affects only the root device. All other devices are not affected.
18.12 Configuring multipath I/O for an existing software RAID #
Ideally, you should configure multipathing for devices before you use them
as components of a software RAID device. If you add multipathing after
creating any software RAID devices, the DM-MPIO service might be starting
after the multipath
service on reboot, which makes
multipathing appear not to be available for RAIDs. You can use the procedure
in this section to get multipathing running for a previously existing
software RAID.
For example, you might need to configure multipathing for devices in a software RAID under the following circumstances:
If you create a new software RAID as part of the Partitioning settings during a new install or upgrade.
If you did not configure the devices for multipathing before using them in the software RAID as a member device or spare.
If you grow your system by adding new HBA adapters to the server or expanding the storage subsystem in your SAN.
Note: Assumptions
The following instructions assume the software RAID device is
/dev/mapper/mpath0
, which is its device name as
recognized by the kernel. It assumes you have enabled user-friendly names
in the /etc/multipath.conf
file as described in
Section 18.9, “Configuring user-friendly names or alias names”.
Make sure to modify the instructions for the device name of your software RAID.
Open a terminal console.
Except where otherwise directed, use this console to enter the commands in the following steps.
If any software RAID devices are currently mounted or running, enter the following commands for each device to unmount the device and stop it.
>
sudo
umount /dev/mapper/mpath0>
sudo
mdadm --misc --stop /dev/mapper/mpath0Stop the
md
service by entering>
sudo
systemctl stop mdmonitorStart the
multipathd
daemon by entering the following command:>
systemctl start multipathdAfter the multipathing service has been started, verify that the software RAID’s component devices are listed in the
/dev/disk/by-id
directory. Do one of the following:Devices are listed: The device names should now have symbolic links to their Device Mapper Multipath device names, such as
/dev/dm-1
.Devices are not listed: Force the multipath service to recognize them by flushing and rediscovering the devices by entering
>
sudo
multipath -F>
sudo
multipath -v0The devices should now be listed in
/dev/disk/by-id
, and have symbolic links to their Device Mapper Multipath device names. For example:lrwxrwxrwx 1 root root 10 2011-01-06 11:42 dm-uuid-mpath-36006016088d014007e0d0d2213ecdf11 -> ../../dm-1
Restart the
mdmonitor
service and the RAID device by entering>
sudo
systemctl start mdmonitorCheck the status of the software RAID by entering
>
sudo
mdadm --detail /dev/mapper/mpath0The RAID’s component devices should match their Device Mapper Multipath device names that are listed as the symbolic links of devices in the
/dev/disk/by-id
directory.In case the root (
/
) device or any parts of it (such as/var
,/etc
,/log
) are on the SAN and multipath is needed to boot, rebuild theinitrd
:>
dracut -f --add-multipathReboot the server to apply the changes.
Verify that the software RAID array comes up properly on top of the multipathed devices by checking the RAID status. Enter
>
sudo
mdadm --detail /dev/mapper/mpath0For example:
Number Major Minor RaidDevice State
0 253 0 0 active sync /dev/dm-0
1 253 1 1 active sync /dev/dm-1
2 253 2 2 active sync /dev/dm-2
Note: Using mdadm with multipath devices
The mdadm
tool requires that the devices be accessed by
the ID rather than by the device node path. Refer to
Section 18.4.3, “MD RAID on multipath devices” for details.
18.13 Using LVM2 on multipath devices #
When using multipath, all paths to a resource are present as devices in the
device tree. By default LVM checks if there is a multipath device on top of
any device in the device tree. If LVM finds a multipath device on top, it
assumes that the device is a multipath component and ignores the
(underlying) device. This is the most likely desired behavior, but it can be
changed in /etc/lvm/lvm.conf
. When
multipath_component_detection is set to 0, LVM is scanning multipath
component devices. The default entry in lvm.conf is:
# By default, LVM2 will ignore devices used as component paths # of device-mapper multipath devices. # 1 enables; 0 disables. multipath_component_detection = 1
18.14 Best practice #
18.14.1 Scanning for new devices without rebooting #
If your system has already been configured for multipathing and you later
need to add storage to the SAN, you can use the
rescan-scsi-bus.sh
script to scan for the new devices.
By default, this script scans all HBAs with typical LUN ranges. The general
syntax for the command looks like the following:
>
sudo
rescan-scsi-bus.sh [options] [host [host ...]]
For most storage subsystems, the script can be run successfully without
options. However, some special cases might need to use one or more options.
Run rescan-scsi-bus.sh --help
for details.
Warning: EMC PowerPath environments
In EMC PowerPath environments, do not use the
rescan-scsi-bus.sh
utility provided with the
operating system or the HBA vendor scripts for scanning the SCSI buses. To
avoid potential file system corruption, EMC requires that you follow the
procedure provided in the vendor documentation for EMC PowerPath for
Linux.
Use the following procedure to scan the devices and make them available to multipathing without rebooting the system.
On the storage subsystem, use the vendor’s tools to allocate the device and update its access control settings to allow the Linux system access to the new storage. Refer to the vendor’s documentation for details.
Scan all targets for a host to make its new device known to the middle layer of the Linux kernel’s SCSI subsystem. At a terminal console prompt, enter
>
sudo
rescan-scsi-bus.shDepending on your setup, you might need to run
rescan-scsi-bus.sh
with optional parameters. Refer torescan-scsi-bus.sh --help
for details.Check for scanning progress in the
systemd
journal (see Kapitel 21,journalctl
: Abfragen dessystemd
-Journals for details). At a terminal console prompt, enter>
sudo
journalctl -rThis command displays the last lines of the log. For example:
>
sudo
journalctl -r Feb 14 01:03 kernel: SCSI device sde: 81920000 Feb 14 01:03 kernel: SCSI device sdf: 81920000 Feb 14 01:03 multipathd: sde: path checker registered Feb 14 01:03 multipathd: sdf: path checker registered Feb 14 01:03 multipathd: mpath4: event checker started Feb 14 01:03 multipathd: mpath5: event checker started Feb 14 01:03:multipathd: mpath4: remaining active paths: 1 Feb 14 01:03 multipathd: mpath5: remaining active paths: 1 [...]Repeat the previous steps to add paths through other HBA adapters on the Linux system that are connected to the new device.
Run the
multipath
command to recognize the devices for DM-MPIO configuration. At a terminal console prompt, enter>
sudo
multipathYou can now configure the new device for multipathing.
18.14.2 Scanning for new partitioned devices without rebooting #
Use the example in this section to detect a newly added multipathed LUN without rebooting.
Warning: EMC PowerPath environments
In EMC PowerPath environments, do not use the
rescan-scsi-bus.sh
utility provided with the
operating system or the HBA vendor scripts for scanning the SCSI buses. To
avoid potential file system corruption, EMC requires that you follow the
procedure provided in the vendor documentation for EMC PowerPath for
Linux.
Open a terminal console.
Scan all targets for a host to make its new device known to the middle layer of the Linux kernel’s SCSI subsystem. At a terminal console prompt, enter
>
rescan-scsi-bus.shDepending on your setup, you might need to run
rescan-scsi-bus.sh
with optional parameters. Refer torescan-scsi-bus.sh --help
for details.Verify that the device is seen (such as if the link has a new time stamp) by entering
>
ls -lrt /dev/dm-*You can also verify the devices in
/dev/disk/by-id
by entering>
ls -l /dev/disk/by-id/Verify the new device appears in the log by entering
>
sudo
journalctl -rUse a text editor to add a new alias definition for the device in the
/etc/multipath.conf
file, such asdata_vol3
.For example, if the UUID is
36006016088d014006e98a7a94a85db11
, make the following changes:defaults { user_friendly_names yes } multipaths { multipath { wwid 36006016088d014006e98a7a94a85db11 alias data_vol3 } }
Create a partition table for the device by entering
>
fdisk /dev/disk/by-id/dm-uuid-mpath-<UUID>Replace UUID with the device WWID, such as
36006016088d014006e98a7a94a85db11
.Trigger udev by entering
>
sudo
echo 'add' > /sys/block/DM_DEVICE/ueventFor example, to generate the device-mapper devices for the partitions on
dm-8
, enter>
sudo
echo 'add' > /sys/block/dm-8/ueventCreate a file system on the device
/dev/disk/by-id/dm-uuid-mpath-UUID_partN
. Depending on your choice for the file system, you may use one of the following commands for this purpose:mkfs.btrfs
mkfs.ext3
,mkfs.ext4
, ormkfs.xfs
. Refer to the respective man pages for details. ReplaceUUID_partN
with the actual UUID and partition number, such as 36006016088d014006e98a7a94a85db11_part1.Create a label for the new partition by entering the following command:
>
sudo
tune2fs -L LABELNAME /dev/disk/by-id/dm-uuid-UUID_partNReplace
UUID_partN
with the actual UUID and partition number, such as 36006016088d014006e98a7a94a85db11_part1. Replace LABELNAME with a label of your choice.Reconfigure DM-MPIO to let it read the aliases by entering
>
sudo
multipathd -k'reconfigure'Verify that the device is recognized by
multipathd
by entering>
sudo
multipath -llUse a text editor to add a mount entry in the
/etc/fstab
file.At this point, the alias you created in a previous step is not yet in the
/dev/disk/by-label
directory. Add a mount entry for the/dev/dm-9
path, then change the entry before the next time you reboot toLABEL=LABELNAME
Create a directory to use as the mount point, then mount the device.
18.14.3 Viewing multipath I/O status #
Querying the multipath I/O status outputs the current status of the multipath maps.
The multipath -l
option displays the current path status
as of the last time that the path checker was run. It does not run the path
checker.
The multipath -ll
option runs the path checker, updates
the path information, then displays the current status information. This
command always displays the latest information about the path status.
>
sudo
multipath -ll 3600601607cf30e00184589a37a31d911 [size=127 GB][features="0"][hwhandler="1 emc"] \_ round-robin 0 [active][first] \_ 1:0:1:2 sdav 66:240 [ready ][active] \_ 0:0:1:2 sdr 65:16 [ready ][active] \_ round-robin 0 [enabled] \_ 1:0:0:2 sdag 66:0 [ready ][active] \_ 0:0:0:2 sdc 8:32 [ready ][active]
For each device, it shows the device’s ID, size, features, and hardware handlers.
Paths to the device are automatically grouped into priority groups on device discovery. Only one priority group is active at a time. For an active/active configuration, all paths are in the same group. For an active/passive configuration, the passive paths are placed in separate priority groups.
The following information is displayed for each group:
Scheduling policy used to balance I/O within the group, such as round-robin
Whether the group is active, disabled, or enabled
Whether the group is the first (highest priority) group
Paths contained within the group
The following information is displayed for each path:
The physical address as HOST:BUS:TARGET:LUN, such as 1:0:1:2
Device node name, such as
sda
Major:minor numbers
Status of the device
Note: Using iostat
in multipath setups
In multipath environments, the iostat
command might
lead to unexpected results. By default, iostat
filters out all block devices with no I/O. To make iostat
show all devices, use:
iostat -p ALL
18.14.4 Managing I/O in error situations #
You might need to configure multipathing to queue I/O if all paths fail concurrently by enabling queue_if_no_path. Otherwise, I/O fails immediately if all paths are gone. In certain scenarios, where the driver, the HBA, or the fabric experience spurious errors, DM-MPIO should be configured to queue all I/O where those errors lead to a loss of all paths, and never propagate errors upward.
When you use multipathed devices in a cluster, you might choose to disable queue_if_no_path. This automatically fails the path instead of queuing the I/O, and escalates the I/O error to cause a failover of the cluster resources.
Because enabling queue_if_no_path leads to I/O being queued indefinitely
unless a path is reinstated, ensure that multipathd
is
running and works for your scenario. Otherwise, I/O might be stalled
indefinitely on the affected multipathed device until reboot or until you
manually return to failover instead of queuing.
To test the scenario:
Open a terminal console.
Activate queuing instead of failover for the device I/O by entering
>
sudo
dmsetup message DEVICE_ID 0 queue_if_no_pathReplace the DEVICE_ID with the ID for your device. The 0 value represents the sector and is used when sector information is not needed.
For example, enter:
>
sudo
dmsetup message 3600601607cf30e00184589a37a31d911 0 queue_if_no_pathReturn to failover for the device I/O by entering
>
sudo
dmsetup message DEVICE_ID 0 fail_if_no_pathThis command immediately causes all queued I/O to fail.
Replace the DEVICE_ID with the ID for your device. For example, enter
>
sudo
dmsetup message 3600601607cf30e00184589a37a31d911 0 fail_if_no_path
To set up queuing I/O for scenarios where all paths fail:
Open a terminal console.
Open the
/etc/multipath.conf
file in a text editor.Uncomment the defaults section and its ending bracket, then add the
default_features
setting, as follows:defaults { default_features "1 queue_if_no_path" }
After you modify the
/etc/multipath.conf
file, you must rundracut
-f
to re-create theinitrd
on your system, then reboot for the changes to take effect.When you are ready to return to failover for the device I/O, enter
>
sudo
dmsetup message MAPNAME 0 fail_if_no_pathReplace the MAPNAME with the mapped alias name or the device ID for the device. The 0 value represents the sector and is used when sector information is not needed.
This command immediately causes all queued I/O to fail and propagates the error to the calling application.
18.14.5 Resolving stalled I/O #
If all paths fail concurrently and I/O is queued and stalled, do the following:
Enter the following command at a terminal console prompt:
>
sudo
dmsetup message MAPNAME 0 fail_if_no_pathReplace
MAPNAME
with the correct device ID or mapped alias name for the device. The 0 value represents the sector and is used when sector information is not needed.This command immediately causes all queued I/O to fail and propagates the error to the calling application.
Reactivate queuing by entering the following command:
>
sudo
dmsetup message MAPNAME 0 queue_if_no_path
18.14.6 Configuring default settings for IBM Z devices #
Testing of the IBM Z device with multipathing has shown that the
dev_loss_tmo
parameter should be set to infinity
(2147483647), and the fast_io_fail_tmo
parameter should
be set to 5 seconds. If you are using IBM Z devices, modify the
/etc/multipath.conf
file to specify the values as
follows:
defaults { dev_loss_tmo 2147483647 fast_io_fail_tmo 5 }
The dev_loss_tmo
parameter sets the number of seconds to
wait before marking a multipath link as bad. When the path fails, any
current I/O on that failed path fails. The default value varies according
to the device driver being used. To use the driver’s internal timeouts,
set the value to zero (0). It can also be set to "infinity" or 2147483647,
which sets it to the max value of 2147483647 seconds (68 years).
The fast_io_fail_tmo
parameter sets the length of time
to wait before failing I/O when a link problem is detected. I/O that
reaches the driver fails. If I/O is in a blocked queue, the I/O does not
fail until the dev_loss_tmo
time elapses and the queue
is unblocked.
If you modify the /etc/multipath.conf
file, the
changes are not applied until you update the multipath maps, or until the
multipathd
daemon is restarted
(systemctl restart multipathd
).
18.14.7 Using multipath with NetApp devices #
When using multipath for NetApp devices, we recommend the following
settings in the /etc/multipath.conf
file:
Set the default values for the following parameters globally for NetApp devices:
max_fds max queue_without_daemon no
Set the default values for the following parameters for NetApp devices in the hardware table:
dev_loss_tmo infinity fast_io_fail_tmo 5 features "3 queue_if_no_path pg_init_retries 50"
18.14.8 Using --noflush with multipath devices #
The --noflush
option should always be used when running on
multipath devices.
For example, in scripts where you perform a table reload, you use the
--noflush
option on resume to ensure that any
outstanding I/O is not flushed, because you need the multipath topology
information.
load resume --noflush
18.14.9 SAN timeout settings when the root device is multipathed #
A system with root (/
) on a multipath device might
stall when all paths have failed and are removed from the system because a
dev_loss_tmo
timeout is received from the storage
subsystem (such as Fibre Channel storage arrays).
If the system device is configured with multiple paths and the multipath
no_path_retry
setting is active, you should modify the
storage subsystem’s dev_loss_tmo
setting accordingly
to ensure that no devices are removed during an all-paths-down scenario. We
strongly recommend that you set the dev_loss_tmo
value
to be equal to or higher than the no_path_retry
setting
from multipath.
The recommended setting for the storage subsystem’s
dev_los_tmo
is
<dev_loss_tmo> = <no_path_retry> * <polling_interval>
where the following definitions apply for the multipath values:
no_path_retry
is the number of retries for multipath I/O until the path is considered to be lost, and queuing of IO is stopped.polling_interval
is the time in seconds between path checks.
Each of these multipath values should be set from the
/etc/multipath.conf
configuration file. For
information, see
Section 18.6, “Multipath configuration”.
18.15 Troubleshooting MPIO #
This section describes some known issues and possible solutions for MPIO.
18.15.1 Installing GRUB2 on multipath devices #
On legacy BIOS systems with Btrfs, grub2-install
can
fail with a permission denied. To fix this, make sure
that the
/boot/grub2/SUBDIR/
subvolume is mounted in read-write (rw) mode.
SUBDIR can be x86_64-efi
or
i386-pc
.
18.15.2 The system exits to emergency shell at boot when multipath is enabled #
During boot the system exits into the emergency shell with messages similar to the following:
[ OK ] Listening on multipathd control socket. Starting Device-Mapper Multipath Device Controller... [ OK ] Listening on Device-mapper event daemon FIFOs. Starting Device-mapper event daemon... Expecting device dev-disk-by\x2duuid-34be48b2\x2dc21...32dd9.device... Expecting device dev-sda2.device... [ OK ] Listening on udev Kernel Socket. [ OK ] Listening on udev Control Socket. Starting udev Coldplug all Devices... Expecting device dev-disk-by\x2duuid-1172afe0\x2d63c...5d0a7.device... Expecting device dev-disk-by\x2duuid-c4a3d1de\x2d4dc...ef77d.device... [ OK ] Started Create list of required static device nodes ...current kernel. Starting Create static device nodes in /dev... [ OK ] Started Collect Read-Ahead Data. [ OK ] Started Device-mapper event daemon. [ OK ] Started udev Coldplug all Devices. Starting udev Wait for Complete Device Initialization... [ OK ] Started Replay Read-Ahead Data. Starting Load Kernel Modules... Starting Remount Root and Kernel File Systems... [ OK ] Started Create static devices [* ] (1 of 4) A start job is running for dev-disk-by\x2du...(7s / 1min 30s) [* ] (1 of 4) A start job is running for dev-disk-by\x2du...(7s / 1min 30s) ... Timed out waiting for device dev-disk-by\x2duuid-c4a...cfef77d.device. [DEPEND] Dependency failed for /opt. [DEPEND] Dependency failed for Local File Systems. [DEPEND] Dependency failed for Postfix Mail Transport Agent. Welcome to emergency shell Give root password for maintenance (or press Control-D to continue):
At this stage, you are working in a temporary dracut
emergency shell from the initrd environment. To make the configuration
changes described below persistent, you need to perform them in the
environment of the installed system.
Identify what the system root (
/
) file system is. Inspect the content of/proc/cmdline
and look for theroot=
parameter.Verify whether the root file system is mounted:
>
sudo
systemctl status sysroot.mountTip
dracut
mounts the root file system under/sysroot
by default.From now on, let us assume that the root file system is mounted under
/sysroot
.Mount system-required file systems under
/sysroot
,chroot
into it, then mount all file systems. For example:>
sudo
for x in proc sys dev run; do mount --bind /$x /sysroot/$x; done>
sudo
chroot /sysroot /bin/bash>
sudo
mount -aRefer to Abschnitt 48.5.2.3, „Zugriff auf das installierte System“ for more details.
Make changes to the multipath or dracut configuration as suggested in the procedures below. Remember to rebuild
initrd
to include the modifications.Exit the
chroot
environment by entering theexit
command, then exit the emergency shell and reboot the server by pressing Ctrl–D.
Procedure 18.1: Emergency shell: blacklist file systems #
This fix is required if the root file system is not on multipath but
multipath is enabled. In such a setup, multipath tries to set its paths
for all devices that are not blacklisted. Since the device with the root
file system is already mounted, it is inaccessible for multipath and
causes it to fail. Fix this issue by configuring multipath correctly by
blacklisting the root device in /etc/multipath.conf
:
Run
multipath -v2
in the emergency shell and identify the device for the root file system. It will result in an output similar to:#
multipath -v2 Dec 18 10:10:03 | 3600508b1001030343841423043300400: ignoring mapThe string between
|
and:
is the WWID needed for blacklisting.Open
/etc/multipath.conf
and add the following:blacklist { wwid "WWID" }
Replace WWID with the ID you retrieved in the previous step. For more information see Section 18.8, “Blacklisting non-multipath devices”.
Rebuild the
initrd
using the following command:>
dracut -f --add-multipath
Procedure 18.2: Emergency shell: rebuild the initrd
#
This fix is required if the multipath status (enabled or disabled) differs
between initrd
and system. To fix this, rebuild the
initrd
:
If multipath has been enabled in the system, rebuild the initrd with multipath support with this command:
>
dracut --force --add multipathIn case Multipath has been disabled in the system, rebuild the initrd with Multipath support with this command:
>
dracut --force -o multipath
Procedure 18.3: Emergency shell: rebuild the initrd
#
This fix is required if the initrd does not contain drivers to access network attached storage. This may, for example, be the case when the system was installed without multipath or when the respective hardware was added or replaced.
Add the required driver(s) to the variable
force_drivers
in the file/etc/dracut.conf.d/01-dist.conf
. For example, if your system contains a RAID controller accessed by thehpsa
driver and multipathed devices connected to a QLogic controller accessed by the driver qla23xx, this entry would look like:force_drivers+="hpsa qla23xx"
Rebuild the
initrd
using the following command:>
dracut -f --add-multipathTo prevent the system from booting into emergency mode if attaching the network storage fails, it is recommended to add the mount option
_netdev
to the respective entries in/etc/fstab
.
18.15.3 PRIO settings for individual devices fail after upgrading to multipath 0.4.9 or later #
Multipath Tools from version 0.4.9 onward uses the prio
setting in the defaults{}
or
devices{}
section of the
/etc/multipath.conf
file. It silently ignores the
keyword prio
when it is specified for an individual
multipath
definition in the
multipaths{)
section.
Multipath Tools 0.4.8 allowed the prio setting in the individual
multipath
definition in the
multipaths{)
section to override the
prio
settings in the defaults{}
or
devices{}
section.
18.15.4 PRIO settings with arguments fail after upgrading to multipath-tools-0.4.9 or later #
When you upgrade from multipath-tools-0.4.8
to
multipath-tools-0.4.9
, the prio
settings in the /etc/multipath.conf
file are broken
for prioritizers that require an argument. In multipath-tools-0.4.9, the
prio
keyword is used to specify the prioritizer, and the
prio_args
keyword is used to specify the argument for
prioritizers that require an argument. Previously, both the prioritizer and
its argument were specified on the same prio
line.
For example, in multipath-tools-0.4.8, the following line was used to specify a prioritizer and its arguments on the same line.
prio "weightedpath hbtl [1,3]:.:.+:.+ 260 [0,2]:.:.+:.+ 20"
After upgrading to multipath-tools-0.4.9 or later, the command causes an error. The message is similar to the following:
<Month day hh:mm:ss> | Prioritizer 'weightedpath hbtl [1,3]:.:.+:.+ 260 [0,2]:.:.+:.+ 20' not found in /lib64/multipath
To resolve this problem, use a text editor to modify the
prio
line in the
/etc/multipath.conf
file. Create two lines with the
prioritizer specified on the prio
line, and the
prioritizer argument specified on the prio_args
line
below it:
prio "weightedpath" prio_args "hbtl [1,3]:.:.+:.+ 260 [0,2]:.:.+:.+ 20"
Restart the multipathd
daemon for
the changes to become active by running sudo systemctl restart
multipathd
.
18.15.5 Technical information documents #
For information about troubleshooting multipath I/O issues on SUSE Linux Enterprise Server, see the following Technical Information Documents (TIDs) in the SUSE Knowledgebase: