Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / Documentação do SUSE Linux Enterprise Server / Storage Administration Guide / Network Storage / Managing Multipath I/O for Devices
Applies to SUSE Linux Enterprise Server 12 SP4

17 Managing Multipath I/O for Devices

This section describes how to manage failover and path load balancing for multiple paths between the servers and block storage devices by using Multipath I/O (MPIO).

17.1 Understanding Multipath I/O

Multipathing is the ability of a server to communicate with the same physical or logical block storage device across multiple physical paths between the host bus adapters in the server and the storage controllers for the device, typically in Fibre Channel (FC) or iSCSI SAN environments. You can also achieve multiple connections with direct attached storage when multiple channels are available.

Linux multipathing provides connection fault tolerance and can provide load balancing across the active connections. When multipathing is configured and running, it automatically isolates and identifies device connection failures, and reroutes I/O to alternate connections.

Typical connection problems involve faulty adapters, cables, or controllers. When you configure multipath I/O for a device, the multipath driver monitors the active connection between devices. When the multipath driver detects I/O errors for an active path, it fails over the traffic to the device’s designated secondary path. When the preferred path becomes healthy again, control can be returned to the preferred path.

17.2 Hardware Support

The multipathing drivers and tools support all architectures for which SUSE Linux Enterprise Server is available. They support most storage arrays. The storage array that houses the multipathed device must support multipathing to use the multipathing drivers and tools. Some storage array vendors provide their own multipathing management tools. Consult the vendor’s hardware documentation to determine what settings are required.

17.2.1 Storage Arrays That Are Automatically Detected for Multipathing

The multipath-tools package automatically detects the following storage arrays:

3PARdata VV
AIX NVDISK
AIX VDASD
APPLE Xserve RAID
COMPELNT Compellent Vol
COMPAQ/HP HSV101, HSV111, HSV200, HSV210, HSV300, HSV400, HSV 450
COMPAQ/HP MSA, HSV
COMPAQ/HP MSA VOLUME
DataCore SANmelody
DDN SAN DataDirector
DEC HSG80
DELL MD3000
DELL MD3000i
DELL MD32xx
DELL MD32xxi
DGC
EMC Clariion
EMC Invista
EMC SYMMETRIX
EUROLOGC FC2502
FSC CentricStor
FUJITSU ETERNUS_DX, DXL, DX400, DX8000
HITACHI DF
HITACHI/HP OPEN
HP A6189A
HP HSVX700
HP LOGICAL VOLUME
HP MSA2012fc, MSA 2212fc, MSA2012i
HP MSA2012sa, MSA2312 fc/i/sa, MCA2324 fc/i/sa, MSA2000s VOLUME
HP P2000 G3 FC|P2000G3 FC/iSCSI|P2000 G3 SAS|P2000 G3 iSCSI
IBM 1722-600
IBM 1724
IBM 1726
IBM 1742
IBM 1745, 1746
IBM 1750500
IBM 1814
IBM 1815
IBM 1818
IBM 1820N00
IBM 2105800
IBM 2105F20
IBM 2107900
IBM 2145
IBM 2810XIV
IBM 3303 NVDISK
IBM 3526
IBM 3542
IBM IPR
IBM Nseries
IBM ProFibre 4000R
IBM S/390 DASD ECKD
IBM S/390 DASD FBA
Intel Multi-Flex
LSI/ENGENIO INF-01-00
NEC DISK ARRAY
NETAPP LUN
NEXENTA COMSTAR
Pillar Axiom
PIVOT3 RAIGE VOLUME
SGI IS
SGI TP9100, TP 9300
SGI TP9400, TP9500
STK FLEXLINE 380
STK OPENstorage D280
SUN CSM200_R
SUN LCSM100_[IEFS]
SUN STK6580, STK6780
SUN StorEdge 3510, T4
SUN SUN_6180

In general, most other storage arrays should work. When storage arrays are automatically detected, the default settings for multipathing apply. If you want non-default settings, you must manually create and configure the /etc/multipath.conf file. The same applies for hardware that is not automatically detected. For information, see Section 17.6, “Creating or Modifying the /etc/multipath.conf File”.

Consider the following caveats:

17.2.2 Tested Storage Arrays for Multipathing Support

Storage arrays from the following vendors have been tested with SUSE Linux Enterprise Server:

EMC
Hitachi
Hewlett-Packard/Compaq
IBM
NetApp
SGI

Most other vendor storage arrays should also work. Consult your vendor’s documentation for guidance. For a list of the default storage arrays recognized by the multipath-tools package, see Section 17.2.1, “Storage Arrays That Are Automatically Detected for Multipathing”.

17.2.3 Storage Arrays that Require Specific Hardware Handlers

Storage arrays that require special commands on failover from one path to the other or that require special nonstandard error handling might require more extensive support. Therefore, the Device Mapper Multipath service has hooks for hardware handlers. For example, one such handler for the EMC CLARiiON CX family of arrays is already provided.

Important
Important: For More Information

Consult the hardware vendor’s documentation to determine if its hardware handler must be installed for Device Mapper Multipath.

The multipath -t command shows an internal table of storage arrays that require special handling with specific hardware handlers. The displayed list is not an exhaustive list of supported storage arrays. It lists only those arrays that require special handling and that the multipath-tools developers had access to during the tool development.

Important
Important: Exceptions

Arrays with true active/active multipath support do not require special handling, so they are not listed for the multipath -t command.

A listing in the multipath -t table does not necessarily mean that SUSE Linux Enterprise Server was tested on that specific hardware. For a list of tested storage arrays, see Section 17.2.2, “Tested Storage Arrays for Multipathing Support”.

17.3 Planning for Multipathing

Use the guidelines in this section when planning your multipath I/O solution.

17.3.1 Prerequisites

  • Multipathing is managed at the device level.

  • The storage array you use for the multipathed device must support multipathing. For more information, see Section 17.2, “Hardware Support”.

  • You need to configure multipathing only if multiple physical paths exist between host bus adapters in the server and host bus controllers for the block storage device. You configure multipathing for the logical device as seen by the server.

  • For some storage arrays, the vendor provides its own multipathing software to manage multipathing for the array’s physical and logical devices. In this case, you should follow the vendor’s instructions for configuring multipathing for those devices.

  • When using multipathing in a virtualization environment, the multipathing is controlled in the host server environment. Configure multipathing for the device before you assign it to a virtual guest machine.

17.3.2 Disk Management Tasks

Perform the following disk management tasks before you attempt to configure multipathing for a physical or logical device that has multiple paths:

  • Use third-party tools to carve physical disks into smaller logical disks.

  • Use third-party tools to partition physical or logical disks. If you change the partitioning in the running system, the Device Mapper Multipath (DM-MP) module does not automatically detect and reflect these changes. DM-MPIO must be re-initialized, which usually requires a reboot.

  • Use third-party SAN array management tools to create and configure hardware RAID devices.

  • Use third-party SAN array management tools to create logical devices such as LUNs. Logical device types that are supported for a given array depend on the array vendor.

17.3.3 Software RAIDs

The Linux software RAID management software runs on top of multipathing. For each device that has multiple I/O paths and that you plan to use in a software RAID, you must configure the device for multipathing before you attempt to create the software RAID device. Automatic discovery of multipathed devices is not available. The software RAID is not aware of the multipathing management running underneath.

For information about setting up multipathing for existing software RAIDs, see Section 17.12, “Configuring Multipath I/O for an Existing Software RAID”.

17.3.4 High-Availability Solutions

High-availability solutions for clustering storage resources run on top of the multipathing service on each node. Make sure that the configuration settings in the /etc/multipath.conf file on each node are consistent across the cluster.

Make sure that multipath devices have the same name across all devices. Refer to Section 17.9.1, “Multipath Device Names in HA Clusters” for details.

The Distributed Replicated Block Device (DRBD) high-availability solution for mirroring devices across a LAN runs on top of multipathing. For each device that has multiple I/O paths and that you plan to use in a DRDB solution, you must configure the device for multipathing before you configure DRBD.

17.3.5 Always Keep the initrd in Synchronization with the System Configuration

One of the most important requirements when using Multipath is to make sure that the initrd and the installed system behave the same regarding the root file system and all other file systems required to boot the system. If Multipath is enabled in the system, it also needs to be enabled in the initrd and vice versa. See Section 17.5.1, “Enabling, Disabling, Starting and Stopping Multipath I/O Services” for details.

If the initrd and the system are not synchronized, the system will not properly boot and the start-up procedure will result in an emergency shell. See Section 17.15.1, “The System Exits to Emergency Shell at Boot When Multipath Is Enabled” for instructions on how to avoid or repair such a scenario.

17.4 Multipath Management Tools

The multipathing support in SUSE Linux Enterprise Server is based on the Device Mapper Multipath module of the Linux kernel and the multipath-tools user space package. You can use the Multiple Devices Administration utility (multipath) to view the status of multipathed devices.

17.4.1 Device Mapper Multipath Module

The Device Mapper Multipath (DM-MP) module provides the multipathing capability for Linux. DM-MPIO is the preferred solution for multipathing on SUSE Linux Enterprise Server. It is the only multipathing option shipped with the product that is completely supported by SUSE.

DM-MPIO features automatic configuration of the multipathing subsystem for a large variety of setups. Configurations of up to eight paths to each device are supported. Configurations are supported for active/passive (one path active, others passive) or active/active (all paths active with round-robin load balancing).

The DM-MPIO framework is extensible in two ways:

The user space component of DM-MPIO takes care of automatic path discovery and grouping, and automated path retesting, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes the need for administrator attention in a production environment.

DM-MPIO protects against failures in the paths to the device, and not failures in the device itself. If one of the active paths is lost (for example, a network adapter breaks or a fiber-optic cable is removed), I/O is redirected to the remaining paths. If the configuration is active/passive, then the path fails over to one of the passive paths. If you are using the round-robin load-balancing configuration, the traffic is balanced across the remaining healthy paths. If all active paths fail, inactive secondary paths must be woken up, so failover occurs with a delay of approximately 30 seconds.

If a disk array has more than one storage processor, ensure that the SAN switch has a connection to the storage processor that owns the LUNs you want to access. On most disk arrays, all LUNs belong to both storage processors, so both connections are active.

Note
Note: Storage Processors

On some disk arrays, the storage array manages the traffic through storage processors so that it presents only one storage processor at a time. One processor is active and the other one is passive until there is a failure. If you are connected to the wrong storage processor (the one with the passive path) you might not see the expected LUNs, or you might see the LUNs but get errors when you try to access them.

Table 17.1: Multipath I/O Features of Storage Arrays

Features of Storage Arrays

Description

Active/passive controllers

One controller is active and serves all LUNs. The second controller acts as a standby. The second controller also presents the LUNs to the multipath component so that the operating system knows about redundant paths. If the primary controller fails, the second controller takes over, and it serves all LUNs.

In some arrays, the LUNs can be assigned to different controllers. A given LUN is assigned to one controller to be its active controller. One controller does the disk I/O for any LUN at a time, and the second controller is the standby for that LUN. The second controller also presents the paths, but disk I/O is not possible. Servers that use that LUN are connected to the LUN’s assigned controller. If the primary controller for a set of LUNs fails, the second controller takes over, and it serves all LUNs.

Active/active controllers

Both controllers share the load for all LUNs, and can process disk I/O for any LUN. If one controller fails, the second controller automatically handles all traffic.

Load balancing

The Device Mapper Multipath driver automatically load balances traffic across all active paths.

Controller failover

When the active controller fails over to the passive, or standby, controller, the Device Mapper Multipath driver automatically activates the paths between the host and the standby, making them the primary paths.

Boot/Root device support

Multipathing is supported for the root (/) device in SUSE Linux Enterprise Server 10 and later. The host server must be connected to the currently active controller and storage processor for the boot device.

Multipathing is supported for the /boot device in SUSE Linux Enterprise Server 11 and later.

Device Mapper Multipath detects every path for a multipathed device as a separate SCSI device. The SCSI device names take the form /dev/sdN, where N is an autogenerated letter for the device, beginning with a and issued sequentially as the devices are created, such as /dev/sda, /dev/sdb, and so on. If the number of devices exceeds 26, the letters are duplicated so that the next device after /dev/sdz will be named /dev/sdaa, /dev/sdab, and so on.

If multiple paths are not automatically detected, you can configure them manually in the /etc/multipath.conf file. The multipath.conf file does not exist until you create and configure it. For information, see Section 17.6, “Creating or Modifying the /etc/multipath.conf File”.

17.4.2 Multipath I/O Management Tools

The packages multipath-tools and kpartx provide tools that take care of automatic path discovery and grouping. They automatically test the path periodically, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes the need for administrator attention in a production environment.

Table 17.2: Tools in the multipath-tools Package

Tool

Description

multipath

Scans the system for multipathed devices and assembles them.

multipathd

Waits for maps events, then executes multipath.

kpartx

Maps linear devmaps to partitions on the multipathed device, which makes it possible to create multipath monitoring for partitions on the device.

mpathpersist

Manages SCSI-persistent reservations on Device Mapper Multipath devices.ppc

17.4.3 Using MDADM for Multipathed Devices

Udev is the default device handler, and devices are automatically known to the system by the Worldwide ID instead of by the device node name. This resolves problems in previous releases of MDADM and LVM where the configuration files (mdadm.conf and lvm.conf) did not properly recognize multipathed devices.

As with LVM2, MDADM requires that the devices be accessed by the ID rather than by the device node path. Therefore, the DEVICE entry in /etc/mdadm.conf should be set as follows:

DEVICE /dev/disk/by-id/*

If you are using user-friendly names, specify the path as follows so that only the device mapper names are scanned after multipathing is configured:

DEVICE /dev/disk/by-id/dm-uuid-.*-mpath-.*

17.4.4 The multipath Command

Use the Linux multipath(8) command to configure and manage multipathed devices. The general syntax for the command looks like the following:

multipath [-v verbosity_level] [-b bindings_file] [-d] [-h|-l|-ll|-f|-F|-B|-c|-q|-r|-w|-W|-t] [-p failover|multibus|group_by_serial|group_by_prio|group_by_node_name] [DEVICENAME]

Refer to man 8 multipath for details.

General Examples

multipath

Configures all multipath devices.

multipath DEVICENAME

Configures a specific multipath device.

Replace DEVICENAME with the device node name such as /dev/sdb (as shown by udev in the $DEVNAME variable), or in the major:minor format. The device may alternatively be a multipath map name.

multipath -f

Selectively suppresses a multipath map, and its device-mapped partitions.

multipath -d

Dry run. Displays potential multipath devices, but does not create any devices and does not update device maps.

multipath -v2 -d

Displays multipath map information for potential multipath devices in a dry run. The -v2 option shows only local disks. This verbosity level prints the created or updated multipath names only for use to feed other tools like kpartx.

There is no output if the device already exists and there are no changes. Use multipath -ll to see the status of configured multipath devices.

multipath -v2 DEVICENAME

Configures a specific potential multipath device and displays multipath map information for it. This verbosity level prints only the created or updated multipath names for use to feed other tools like kpartx.

There is no output if the device already exists and there are no changes. Use multipath -ll to see the status of configured multipath devices.

Replace DEVICENAME with the device node name such as /dev/sdb (as shown by udev in the $DEVNAME variable), or in the major:minor format. The device may alternatively be a multipath map name.

multipath -v3

Configures potential multipath devices and displays multipath map information for them. This verbosity level prints all detected paths, multipaths, and device maps. Both WWID and devnode blacklisted devices are displayed.

multipath -v3 DEVICENAME

Configures a specific potential multipath device and displays information for it. The -v3 option shows the full path list. This verbosity level prints all detected paths, multipaths, and device maps. Both WWID and devnode blacklisted devices are displayed.

Replace DEVICENAME with the device node name such as /dev/sdb (as shown by udev in the $DEVNAME variable), or in the major:minor format. The device may alternatively be a multipath map name.

multipath -ll

Displays the status of all multipath devices.

multipath -ll DEVICENAME

Displays the status of a specified multipath device.

Replace DEVICENAME with the device node name such as /dev/sdb (as shown by udev in the $DEVNAME variable), or in the major:minor format. The device may alternatively be a multipath map name.

multipath -F

Flushes all unused multipath device maps. This unresolves the multiple paths; it does not delete the devices.

multipath -F DEVICENAME

Flushes unused multipath device maps for a specified multipath device. This unresolves the multiple paths; it does not delete the device.

Replace DEVICENAME with the device node name such as /dev/sdb (as shown by udev in the $DEVNAME variable), or in the major:minor format. The device may alternatively be a multipath map name.

multipath -p [ failover | multibus | group_by_serial | group_by_prio | group_by_node_name ]

Sets the group policy by specifying one of the group policy options that are described in the following table:

Table 17.3: Group Policy Options for the multipath -p Command

Policy Option

Description

failover

(Default) One path per priority group. You can use only one path at a time.

multibus

All paths in one priority group.

group_by_serial

One priority group per detected SCSI serial number (the controller node worldwide number).

group_by_prio

One priority group per path priority value. Paths with the same priority are in the same priority group. Priorities are determined by callout programs specified as a global, per-controller, or per-multipath option in the /etc/multipath.conf configuration file.

group_by_node_name

One priority group per target node name. Target node names are fetched in the /sys/class/fc_transport/target*/node_name location.

multipath -t

Shows internal hardware table and active configuration of multipath. Refer to man multipath for details about the configuration parameters.

17.4.5 The mpathpersist Utility

The mpathpersist utility can be used to manage SCSI persistent reservations on Device Mapper Multipath devices. The general syntax for the command looks like the following:

mpathpersist [options] [device]

Refer to man 8 mpathpersist for details.

Use this utility with the service action reservation key (reservation_key attribute) in the /etc/multipath.conf file to set persistent reservations for SCSI devices. The attribute is not used by default. If it is not set, the multipathd daemon does not check for persistent reservation for newly discovered paths or reinstated paths.

reservation_key <RESERVATION_KEY>

You can add the attribute to the defaults section or the multipaths section. For example:

multipaths {
  multipath {
    wwid   XXXXXXXXXXXXXXXX
    alias      yellow
    reservation_key  0x123abc
  }
}

Set the reservation_key parameter for all mpath devices applicable for persistent management, then restart the multipathd daemon by running the following command:

sudo systemctl restart multipathd

After it is set up, you can specify the reservation key in the mpathpersist commands.

Examples

mpathpersist --out --register --param-sark=123abc --prout-type=5 -d /dev/mapper/mpath9

Register the Service Action Reservation Key for the /dev/mapper/mpath9 device.

mpathpersist -i -k -d /dev/mapper/mpath9

Read the Service Action Reservation Key for the /dev/mapper/mpath9 device.

mpathpersist --out --reserve --param-sark=123abc --prout-type=8 -d /dev/mapper/mpath9

Reserve the Service Action Reservation Key for the /dev/mapper/mpath9 device.

mpathpersist -i -s -d /dev/mapper/mpath9

Read the reservation status of the /dev/mapper/mpath9 device.

17.5 Configuring the System for Multipathing

17.5.1 Enabling, Disabling, Starting and Stopping Multipath I/O Services

To enable multipath services to start at boot time, run the following command:

sudo systemctl enable multipathd

To manually start the service in the running system or to check its status, enter one of the following commands:

sudo systemctl start multipathd
sudo systemctl status multipathd

To stop the multipath services in the current session and to disable it, so it will not be started the next time the system is booted, run the following commands:

sudo systemctl stop multipathd
sudo systemctl disable multipathd
Important
Important: Rebuilding the initrd

Whenever you enable or disable the multipath services it is also required to rebuild the initrd, otherwise the system may not boot anymore. When enabling the multipath services, also run the following command to rebuild the initrd:

dracut --force --add multipath

When disabling the services, then run the following command to rebuild the initrd:

dracut --force -o multipath

Additionally and optionally, if you also want to make sure multipath devices do not get set up, even when starting multipath manually, add the following lines to the end of /etc/multipath.conf before rebuilding the initrd:

blacklist {
    wwid ".*"
}

17.5.2 Preparing SAN Devices for Multipathing

Before configuring multipath I/O for your SAN devices, prepare the SAN devices, as necessary, by doing the following:

  • Configure and zone the SAN with the vendor’s tools.

  • Configure permissions for host LUNs on the storage arrays with the vendor’s tools.

  • Install the Linux HBA driver module. Upon module installation, the driver automatically scans the HBA to discover any SAN devices that have permissions for the host. It presents them to the host for further configuration.

    Note
    Note: No Native Multipathing

    Make sure that the HBA driver you are using does not have native multipathing enabled.

    See the vendor’s specific instructions for more details.

  • After the driver module is loaded, discover the device nodes assigned to specific array LUNs or partitions.

  • If the SAN device will be used as the root device on the server, modify the timeout settings for the device as described in Section 17.14.9, “SAN Timeout Settings When the Root Device Is Multipathed”.

If the LUNs are not seen by the HBA driver, lsscsi can be used to check whether the SCSI devices are seen correctly by the operating system. When the LUNs are not seen by the HBA driver, check the zoning setup of the SAN. In particular, check whether LUN masking is active and whether the LUNs are correctly assigned to the server.

If the LUNs are seen by the HBA driver, but there are no corresponding block devices, additional kernel parameters are needed to change the SCSI device scanning behavior, such as to indicate that LUNs are not numbered consecutively. For information, see TID 3955167: Troubleshooting SCSI (LUN) Scanning Issues in the SUSE Knowledgebase at https://www.suse.com/support/kb/doc.php?id=3955167.

17.5.3 Partitioning Multipath Devices

Partitioning devices that have multiple paths is not recommended, but it is supported. You can use the kpartx tool to create partitions on multipath devices without rebooting. You can also partition the device before you attempt to configure multipathing by using the Partitioner function in YaST, or by using a third-party partitioning tool.

Multipath devices are device-mapper devices. Modifying device-mapper devices with command line tools (such as parted, kpartx, or fdisk) works, but it does not necessarily generate the udev events that are required to update other layers. After you partition the device-mapper device, you should check the multipath map to make sure the device-mapper devices were mapped. If they are missing, you can remap the multipath devices or reboot the server to pick up all of the new partitions in the multipath map.

The device-mapper device for a partition on a multipath device is not the same as an independent device. When you create an LVM logical volume using the whole device, you must specify a device that contains no partitions. If you specify a multipath partition as the target device for the LVM logical volume, LVM recognizes that the underlying physical device is partitioned and the create fails. If you need to subdivide a SAN device, you can carve LUNs on the SAN device and present each LUN as a separate multipath device to the server.

17.6 Creating or Modifying the /etc/multipath.conf File

The /etc/multipath.conf file does not exist unless you create it. Default multipath device settings are applied automatically when the multipathd daemon runs unless you create the multipath configuration file and personalize the settings.

Important
Important: Testing and Permanently Applying Changes from /etc/multipath.conf

Whenever you create or modify the /etc/multipath.conf file, the changes are not automatically applied when you save the file. This allows you to perform a dry run to verify your changes before they are committed. When you are satisfied with the revised settings, you can update the multipath maps as described in Section 17.6.4, “Applying the /etc/multipath.conf File Changes to Update the Multipath Maps”.

17.6.1 Creating the /etc/multipath.conf File

  1. Create an empty /etc/multipath.conf file with an editor of your choice.

  2. Make sure to add an appropriate device section for your SAN. Most vendors provide documentation on the proper setup of the device section. Note that different SANs require individual device sections.

    If you are using a storage subsystem that is automatically detected (see Section 17.2.1, “Storage Arrays That Are Automatically Detected for Multipathing”), adding a device is not required—the default settings for this device will be applied in this case.

  3. Create a blacklist section containing all non-multipath devices of your machine. Refer to Section 17.8, “Blacklisting Non-Multipath Devices” for details.

  4. If required, add more sections to the configuration file. Refer to Section 17.6.2, “Sections in the /etc/multipath.conf File” for a brief introduction. More details are available when running man 5 multipath.conf.

  5. When finished, save /etc/multipath.conf and test your settings as described in Section 17.6.3, “Verifying the Multipath Setup in the /etc/multipath.conf File”.

  6. When you have successfully verified the configuration, apply it as described in Section 17.6.4, “Applying the /etc/multipath.conf File Changes to Update the Multipath Maps”.

17.6.2 Sections in the /etc/multipath.conf File

The /etc/multipath.conf file is organized into the following sections. See man 5 multipath.conf for details.

defaults

General default settings for multipath I/0. These values are used if no values are given in the appropriate device or multipath sections. For information, see Section 17.7, “Configuring Default Policies for Polling, Queuing, and Failback”.

blacklist

Lists the device names to discard as not multipath candidates. Devices can be identified by their device node name (devnode), their WWID (wwid), or their vendor or product strings (device). You can typically ignore non-multipathed devices, such as hpsa, fd, hd, md, dm, sr, scd, st, ram, raw, and loop. For more information and examples, see Section 17.8, “Blacklisting Non-Multipath Devices”.

blacklist_exceptions

Lists the device names of devices to be treated as multipath candidates even if they are on the blacklist. Devices can be identified by their device node name (devnode), their WWID (wwid), or their vendor or product strings (device). You must specify the excepted devices by using the same keyword that you used in the blacklist. For example, if you used the devnode keyword for devices in the blacklist, you use the devnode keyword to exclude some devices in the blacklist exceptions. It is not possible to blacklist devices by using the devnode keyword and to exclude some of them by using the wwid keyword. For more information and examples, see Section 17.8, “Blacklisting Non-Multipath Devices”.

multipaths

Specifies settings for individual multipath devices. Except for settings that do not support individual settings, these values overwrite what is specified in the defaults and devices sections of the configuration file.

devices

Specifies settings for individual storage controllers. These values overwrite values specified in the defaults section of the configuration file. If you use a storage array that is not supported by default, you can create a devices subsection to specify the default settings for it. These values can be overwritten by settings for individual multipath devices if the keyword allows it.

For information, see the following:

17.6.3 Verifying the Multipath Setup in the /etc/multipath.conf File

Whenever you create or modify the /etc/multipath.conf file, the changes are not automatically applied when you save the file. You can perform a dry run of the setup to verify the multipath setup before you update the multipath maps.

At the server command prompt, enter

sudo multipath -v2 -d

This command scans the devices, then displays what the setup would look like if you commit the changes. It is assumed that the multipathd daemon is already running with the old (or default) multipath settings when you modify the /etc/multipath.conf file and perform the dry run. If the changes are acceptable, continue with the next step.

The output is similar to the following:

26353900f02796769
[size=127 GB]
[features="0"]
[hwhandler="1    emc"]

\_ round-robin 0 [first]
  \_ 1:0:1:2 sdav 66:240  [ready ]
  \_ 0:0:1:2 sdr  65:16   [ready ]

\_ round-robin 0
  \_ 1:0:0:2 sdag 66:0    [ready ]
  \_ 0:0:0:2 sdc   8:32   [ready ]

Paths are grouped into priority groups. Only one priority group is in active use at a time. To model an active/active configuration, all paths end in the same group. To model an active/passive configuration, the paths that should not be active in parallel are placed in several distinct priority groups. This normally happens automatically on device discovery.

The output shows the order, the scheduling policy used to balance I/O within the group, and the paths for each priority group. For each path, its physical address (host:bus:target:lun), device node name, major:minor number, and state is shown.

By using a verbosity level of -v3 in the dry run, you can see all detected paths, multipaths, and device maps. Both WWID and device node blacklisted devices are displayed.

The following is an example of -v3 output on a 64-bit SLES 11 SP2 server with two Qlogic HBAs connected to a Xiotech Magnitude 3000 SAN. Some multiple entries have been omitted to shorten the example.

tux > sudo multipath -v3 d
dm-22: device node name blacklisted
< content omitted >
loop7: device node name blacklisted
< content omitted >
md0: device node name blacklisted
< content omitted >
dm-0: device node name blacklisted
sdf: not found in pathvec
sdf: mask = 0x1f
sdf: dev_t = 8:80
sdf: size = 105005056
sdf: subsystem = scsi
sdf: vendor = XIOtech
sdf: product = Magnitude 3D
sdf: rev = 3.00
sdf: h:b:t:l = 1:0:0:2
sdf: tgt_node_name = 0x202100d0b2028da
sdf: serial = 000028DA0014
sdf: getuid= "/lib/udev/scsi_id --whitelisted --device=/dev/%n" (config file default)
sdf: uid = 200d0b2da28001400 (callout)
sdf: prio = const (config file default)
sdf: const prio = 1
[...]
ram15: device node name blacklisted
[...]
===== paths list =====
uuid              hcil    dev dev_t pri dm_st  chk_st  vend/prod/rev
200d0b2da28001400 1:0:0:2 sdf 8:80  1   [undef][undef] XIOtech,Magnitude 3D
200d0b2da28005400 1:0:0:1 sde 8:64  1   [undef][undef] XIOtech,Magnitude 3D
200d0b2da28004d00 1:0:0:0 sdd 8:48  1   [undef][undef] XIOtech,Magnitude 3D
200d0b2da28001400 0:0:0:2 sdc 8:32  1   [undef][undef] XIOtech,Magnitude 3D
200d0b2da28005400 0:0:0:1 sdb 8:16  1   [undef][undef] XIOtech,Magnitude 3D
200d0b2da28004d00 0:0:0:0 sda 8:0   1   [undef][undef] XIOtech,Magnitude 3D
params = 0 0 2 1 round-robin 0 1 1 8:80 1000 round-robin 0 1 1 8:32 1000
status = 2 0 0 0 2 1 A 0 1 0 8:80 A 0 E 0 1 0 8:32 A 0
sdf: mask = 0x4
sdf: path checker = directio (config file default)
directio: starting new request
directio: async io getevents returns 1 (errno=Success)
directio: io finished 4096/0
sdf: state = 2
[...]

17.6.4 Applying the /etc/multipath.conf File Changes to Update the Multipath Maps

Changes to the /etc/multipath.conf file cannot take effect when multipathd is running. After you make changes, save and close the file, then run the following commands to apply the changes and update the multipath maps:

  1. Apply your configuration changes:

    tux > sudo multipathd reconfigure
  2. Run dracut -f to re-create the initrd image on your system, then reboot for the changes to take effect.

17.6.5 Generating a WWID

To identify a device over different paths, multipath uses a World Wide Identification (WWID) for each device. If the WWID is the same for two device paths, they are assumed to represent the same device. We recommend not changing the method of WWID generation, unless there is a compelling reason to do so. For more details, see man multipath.conf.

17.7 Configuring Default Policies for Polling, Queuing, and Failback

The goal of multipath I/O is to provide connectivity fault tolerance between the storage system and the server. The desired default behavior depends on whether the server is a stand-alone server or a node in a high-availability cluster.

When you configure multipath I/O for a stand-alone server, the no_path_retry setting protects the server operating system from receiving I/O errors as long as possible. It queues messages until a multipath failover occurs and provides a healthy connection.

When you configure multipath I/O for a node in a high-availability cluster, you want multipath to report the I/O failure to trigger the resource failover instead of waiting for a multipath failover to be resolved. In cluster environments, you must modify the no_path_retry setting so that the cluster node receives an I/O error in relation to the cluster verification process (recommended to be 50% of the heartbeat tolerance) if the connection is lost to the storage system. In addition, you want the multipath I/O fallback to be set to manual to avoid a ping-pong of resources because of path failures.

The /etc/multipath.conf file should contain a defaults section where you can specify default behaviors for polling, queuing, and failback. If the field is not otherwise specified in a device section, the default setting is applied for that SAN configuration.

The following are the compiled default settings. They will be used unless you overwrite these values by creating and configuring a personalized /etc/multipath.conf file.

defaults {
  verbosity 2
#  udev_dir is deprecated in SLES 11 SP3
#  udev_dir              /dev
  polling_interval      5
#  path_selector default value is service-time in SLES 11 SP3
#  path_selector         "round-robin 0"
  path selector         "service-time 0"
  path_grouping_policy  failover
#  getuid_callout is deprecated in SLES 11 SP3 and replaced with uid_attribute
#  getuid_callout        "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
#  uid_attribute is new in SLES 11 SP3
  uid_attribute         "ID_SERIAL"
  prio                  "const"
  prio_args             ""
  features              "0"
  path_checker          "tur"
  alias_prefix          "mpath"
  rr_min_io_rq          1
  max_fds               "max"
  rr_weight             "uniform"
  queue_without_daemon  "yes"
  flush_on_last_del     "no"
  user_friendly_names   "no"
  fast_io_fail_tmo      5
  bindings_file         "/etc/multipath/bindings"
  wwids_file            "/etc/multipath/wwids"
  log_checker_err       "always"

  retain_attached_hw_handler  "no"
  detect_prio           "no"
  failback              "manual"
  no_path_retry         "fail"
  }

For information about setting the polling, queuing, and failback policies, see the following parameters in Section 17.10, “Configuring Path Failover Policies and Priorities”:

After you have modified the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then restart the server for the changes to take effect. See Section 17.6.4, “Applying the /etc/multipath.conf File Changes to Update the Multipath Maps” for details.

17.8 Blacklisting Non-Multipath Devices

The /etc/multipath.conf file can contain a blacklist section where all non-multipath devices are listed. You can blacklist devices by WWID (wwid keyword), device name (devnode keyword), or device type (device section). You can also use the blacklist_exceptions section to enable multipath for some devices that are blacklisted by the regular expressions used in the blacklist section.

Note
Note: Preferred Blacklisting Methods

The preferred method for blacklisting devices is by WWID or by vendor and product. Blacklisting by devnode is not recommended, because device nodes can change and thus are not useful for persistent device identification.

Warning
Warning: Regular Expressions in multipath.conf

Regular expressions in the /etc/multipath.conf do not work in general. They only work if they are matched against common strings. However, the standard configuration of multipath already contains regular expressions for many devices and vendors. Matching regular expressions with other regular expressions does not work. Make sure that you are only matching against strings shown with multipath -t.

You can typically ignore non-multipathed devices, such as hpsa, fd, hd, md, dm, sr, scd, st, ram, raw, and loop. For example, local SATA hard disks and flash disks do not have multiple paths. If you want multipath to ignore single-path devices, put them in the blacklist section.

Note
Note: Compatibility

The keyword devnode_blacklist has been deprecated and replaced with the keyword blacklist.

With SUSE Linux Enterprise Server 12 the glibc-provided regular expressions are used. To match an arbitrary string, you must now use ".*" rather than "*".

For example, to blacklist local devices and all arrays from the hpsa driver from being managed by multipath, the blacklist section looks like this:

blacklist {
      wwid "26353900f02796769"
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
      devnode "^sd[a-z][0-9]*"
}

You can also blacklist only the partitions from a driver instead of the entire array. For example, you can use the following regular expression to blacklist only partitions from the cciss driver and not the entire array:

blacklist {
      devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}

You can blacklist by specific device types by adding a device section in the blacklist, and using the vendor and product keywords.

blacklist {
      device {
           vendor  "DELL"
           product ".*"
       }
}

You can use a blacklist_exceptions section to enable multipath for some devices that were blacklisted by the regular expressions used in the blacklist section. You add exceptions by WWID (wwid keyword), device name (devnode keyword), or device type (device section). You must specify the exceptions in the same way that you blacklisted the corresponding devices. That is, wwid exceptions apply to a wwid blacklist, devnode exceptions apply to a devnode blacklist, and device type exceptions apply to a device type blacklist.

For example, you can enable multipath for a desired device type when you have different device types from the same vendor. Blacklist all of the vendor’s device types in the blacklist section, and then enable multipath for the desired device type by adding a device section in a blacklist_exceptions section.

blacklist {
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*"
      device {
           vendor  "DELL"
           product ".*"
       }
}

blacklist_exceptions {
      device {
           vendor  "DELL"
           product "MD3220i"
       }
}

You can also use the blacklist_exceptions to enable multipath only for specific devices. For example:

blacklist {
      wwid ".*"
}

blacklist_exceptions {
        wwid "3600d0230000000000e13955cc3751234"
        wwid "3600d0230000000000e13955cc3751235"
}

After you have modified the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then restart the server for the changes to take effect. See Section 17.6.4, “Applying the /etc/multipath.conf File Changes to Update the Multipath Maps” for details.

Following the reboot, the local devices should no longer be listed in the multipath maps when you issue the multipath -ll command.

Note
Note: Using the find_multipaths Option

Starting with SUSE Linux Enterprise Server 12 SP2, the multipath tools support the option find_multipaths in the defaults section of /etc/multipath.conf. Simply put, this option prevents multipath and multipathd from setting up multipath maps for devices with only a single path (see the man 5 multipath.conf for details). In certain configurations, this may save the administrator from the effort of creating blacklist entries, for example for local SATA disks.

Convenient as it seems at first, using the find_multipaths option also has disadvantages. It complicates and slows down the system boot, because for every device found, the boot logic needs to wait until all devices have been discovered to see whether a second path exists for the device. Additionally, problems can arise when some paths are down or otherwise invisible at boot time—a device can be falsely detected as a single-path device and activated, causing later addition of more paths to fail.

find_multipaths considers all devices that are listed in /etc/multipath/wwids with matching WWIDs as being multipath devices. This is important when find_multipaths is first activated: Unless /etc/multipath/wwids is deleted or edited, activating this option has no effect, because all previously existing multipath maps (including single-path ones) are listed in the wwids file. On SAN-boot systems with a multipathed root file system, make sure to keep /etc/multipath/wwids synchronized between the initial RAM disk and the file system.

In Summary, using find_multipaths may be convenient in certain use cases, but SUSE still recommends the default configuration with a properly configured blacklist and blacklist exceptions.

17.9 Configuring User-Friendly Names or Alias Names

A multipath device can be identified by its WWID, by a user-friendly name, or by an alias that you assign for it. Device node names in the form of /dev/sdn and /dev/dm-n can change on reboot and might be assigned to different devices each time. A device’s WWID, user-friendly name, and alias name persist across reboots, and are the preferred way to identify the device.

Important
Important: Using Persistent Names is Recommended

Because device node names in the form of /dev/sdn and /dev/dm-n can change on reboot, referring to multipath devices by their WWID is preferred. You can also use a user-friendly name or alias that is mapped to the WWID to identify the device uniquely across reboots.

The following table describes the types of device names that can be used for a device in the /etc/multipath.conf file. For an example of multipath.conf settings, see the /usr/share/doc/packages/multipath-tools/multipath.conf.synthetic file.

Table 17.4: Comparison of Multipath Device Name Types

Name Types

Description

WWID (default)

The serial WWID (Worldwide Identifier) is an identifier for the multipath device that is guaranteed to be globally unique and unchanging. The default name used in multipathing is the ID of the logical unit as found in the /dev/disk/by-id directory. For example, a device with the WWID of 3600508e0000000009e6baa6f609e7908 is listed as /dev/disk/by-id/scsi-3600508e0000000009e6baa6f609e7908.

User-friendly

The Device Mapper Multipath device names in the /dev/mapper directory also reference the ID of the logical unit. These multipath device names are user-friendly names in the form of /dev/mapper/mpathN, such as /dev/mapper/mpath0. The names are unique and persistent because they use the /var/lib/multipath/bindings file to track the association between the UUID and user-friendly names.

Alias

An alias name is a globally unique name that the administrator provides for a multipath device. Alias names override the WWID and the user-friendly /dev/mapper/mpathN names.

If you are using user_friendly_names, do not set the alias to mpathN format. This may conflict with an automatically assigned user-friendly name, and give you incorrect device node names.

The global multipath user_friendly_names option in the /etc/multipath.conf file is used to enable or disable the use of user-friendly names for multipath devices. If it is set to no (the default), multipath uses the WWID as the name of the device. If it is set to yes, multipath uses the /var/lib/multipath/bindings file to assign a persistent and unique name to the device in the form of mpath<N> in the /dev/mapper directory. The bindings file option in the /etc/multipath.conf file can be used to specify an alternate location for the bindings file.

The global multipath alias option in the /etc/multipath.conf file is used to explicitly assign a name to the device. If an alias name is set up for a multipath device, the alias is used instead of the WWID or the user-friendly name.

Using the user_friendly_names option can be problematic in the following situations:

Root Device Is Using Multipath:

If the system root device is using multipath and you use the user_friendly_names option, the user-friendly settings in the /var/lib/multipath/bindings file are included in the initrd. If you later change the storage setup, such as by adding or removing devices, there is a mismatch between the bindings setting inside the initrd and the bindings settings in /var/lib/multipath/bindings.

Warning
Warning: Binding Mismatches

A bindings mismatch between initrd and /var/lib/multipath/bindings can lead to a wrong assignment of mount points to devices, which can result in file system corruption and data loss.

To avoid this problem, we recommend that you use the default WWID settings for the system root device. You should not use aliases for the system root device. Because the device name would differ, using an alias causes you to lose the ability to seamlessly switch off multipathing via the kernel command line.

Mounting /var from Another Partition:

The default location of the user_friendly_names configuration file is /var/lib/multipath/bindings. If the /var data is not located on the system root device but mounted from another partition, the bindings file is not available when setting up multipathing.

Make sure that the /var/lib/multipath/bindings file is available on the system root device and multipath can find it. For example, this can be done as follows:

  1. Move the /var/lib/multipath/bindings file to /etc/multipath/bindings.

  2. Set the bindings_file option in the defaults section of /etc/multipath.conf to this new location. For example:

    defaults {
                   user_friendly_names yes
                   bindings_file "/etc/multipath/bindings"
    }
Multipath Is in the initrd:

Even if the system root device is not on multipath, it is possible for multipath to be included in the initrd. For example, this can happen if the system root device is on LVM. If you use the user_friendly_names option and multipath is in the initrd, you should boot with the parameter multipath=off to avoid problems.

This disables multipath only in the initrd during system boots. After the system boots, the boot.multipath and multipathd boot scripts can activate multipathing.

Multipathing in HA Clusters:

See Section 17.9.1, “Multipath Device Names in HA Clusters” for details.

To enable user-friendly names or to specify aliases:

  1. Open the /etc/multipath.conf file in a text editor with root privileges.

  2. (Optional) Modify the location of the /var/lib/multipath/bindings file.

    The alternate path must be available on the system root device where multipath can find it.

    1. Move the /var/lib/multipath/bindings file to /etc/multipath/bindings.

    2. Set the bindings_file option in the defaults section of /etc/multipath.conf to this new location. For example:

      defaults {
                user_friendly_names yes
                bindings_file "/etc/multipath/bindings"
      }
  3. (Optional, not recommended) Enable user-friendly names:

    1. Uncomment the defaults section and its ending bracket.

    2. Uncomment the user_friendly_names option, then change its value from No to Yes.

      For example:

      ## Use user-friendly names, instead of using WWIDs as names.
      defaults {
        user_friendly_names yes
      }
  4. (Optional) Specify your own names for devices by using the alias option in the multipath section.

    For example:

    ## Use alias names, instead of using WWIDs as names.
    multipaths {
           multipath {
                   wwid           36006048000028350131253594d303030
                   alias             blue1
           }
           multipath {
                   wwid           36006048000028350131253594d303041
                   alias             blue2
           }
           multipath {
                   wwid           36006048000028350131253594d303145
                   alias             yellow1
           }
           multipath {
                   wwid           36006048000028350131253594d303334
                   alias             yellow2
           }
    }
    Important
    Important: WWID Compared to WWN

    When you define device aliases in the /etc/multipath.conf file, ensure that you use each device’s WWID (such as 3600508e0000000009e6baa6f609e7908) and not its WWN, which replaces the first character of a device ID with 0x, such as 0x600508e0000000009e6baa6f609e7908.

  5. Save your changes, then close the file.

  6. After you have modified the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then restart the server for the changes to take effect. See Section 17.6.4, “Applying the /etc/multipath.conf File Changes to Update the Multipath Maps” for details.

To use the entire LUN directly (for example, if you are using the SAN features to partition your storage), you can use the /dev/disk/by-id/xxx names for mkfs, fstab, your application, and so on. Partitioned devices have _part<n> appended to the device name, such as /dev/disk/by-id/xxx_part1.

In the /dev/disk/by-id directory, the multipath-mapped devices are represented by the device’s dm-uuid* name or alias name (if you assign an alias for it in the /etc/multipath.conf file). The scsi- and wwn- device names represent physical paths to the devices.

17.9.1 Multipath Device Names in HA Clusters

Make sure that multipath devices have the same name across all devices by doing the following:

  • Use UUID and alias names to ensure that multipath device names are consistent across all nodes in the cluster. Alias names must be unique across all nodes. Copy the /etc/multipath.conf file from the node to the /etc/ directory for all of the other nodes in the cluster.

  • When using links to multipath-mapped devices, ensure that you specify the dm-uuid* name or alias name in the /dev/disk/by-id directory, and not a fixed path instance of the device. For information, see Section 17.9, “Configuring User-Friendly Names or Alias Names”.

  • Set the user_friendly_names configuration option to no to disable it. A user-friendly name is unique to a node, but a device might not be assigned the same user-friendly name on every node in the cluster.

Note
Note: User-Friendly Names

If you really need to use user-friendly names, you can force the system-defined user-friendly names to be consistent across all nodes in the cluster by doing the following:

  1. In the /etc/multipath.conf file on one node:

    1. Set the user_friendly_names configuration option to yes to enable it.

      Multipath uses the /var/lib/multipath/bindings file to assign a persistent and unique name to the device in the form of mpath<N> in the /dev/mapper directory.

    2. (Optional) Set the bindings_file option in the defaults section of the /etc/multipath.conf file to specify an alternate location for the bindings file.

      The default location is /var/lib/multipath/bindings.

  2. Set up all of the multipath devices on the node.

  3. Copy the /etc/multipath.conf file from the node to the /etc/ directory of all the other nodes in the cluster.

  4. Copy the bindings file from the node to the bindings_file path on all of the other nodes in the cluster.

  5. After you have modified the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then restart the node for the changes to take effect. See Section 17.6.4, “Applying the /etc/multipath.conf File Changes to Update the Multipath Maps” for details. This applies to all affected nodes.

17.10 Configuring Path Failover Policies and Priorities

In a Linux host, when there are multiple paths to a storage controller, each path appears as a separate block device, and results in multiple block devices for single LUN. The Device Mapper Multipath service detects multiple paths with the same LUN ID, and creates a new multipath device with that ID. For example, a host with two HBAs attached to a storage controller with two ports via a single unzoned Fibre Channel switch sees four block devices: /dev/sda, /dev/sdb, /dev/sdc, and /dev/sdd. The Device Mapper Multipath service creates a single block device, /dev/mpath/mpath1, that reroutes I/O through those four underlying block devices.

This section describes how to specify policies for failover and configure priorities for the paths. Note that after you have modified the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then restart the server for the changes to take effect. See Section 17.6.4, “Applying the /etc/multipath.conf File Changes to Update the Multipath Maps” for details.

17.10.1 Configuring the Path Failover Policies

Use the multipath command with the -p option to set the path failover policy:

sudo multipath DEVICENAME -p POLICY

Replace POLICY with one of the following policy options:

Table 17.5: Group Policy Options for the multipath -p Command

Policy Option

Description

failover

(Default) One path per priority group.

multibus

All paths in one priority group.

group_by_serial

One priority group per detected serial number.

group_by_prio

One priority group per path priority value. Priorities are determined by callout programs specified as a global, per-controller, or per-multipath option in the /etc/multipath.conf configuration file.

group_by_node_name

One priority group per target node name. Target node names are fetched in the /sys/class/fc_transport/target*/node_name location.

17.10.2 Configuring Failover Priorities

You must manually enter the failover priorities for the device in the /etc/multipath.conf file. Examples for all settings and options can be found in the /usr/share/doc/packages/multipath-tools/multipath.conf.annotated file.

17.10.2.1 Understanding Priority Groups and Attributes

A priority group is a collection of paths that go to the same physical LUN. By default, I/O is distributed in a round-robin fashion across all paths in the group. The multipath command automatically creates priority groups for each LUN in the SAN based on the path_grouping_policy setting for that SAN. The multipath command multiplies the number of paths in a group by the group’s priority to determine which group is the primary. The group with the highest calculated value is the primary. When all paths in the primary group are failed, the priority group with the next highest value becomes active.

A path priority is an integer value assigned to a path. The higher the value, the higher the priority. An external program is used to assign priorities for each path. For a given device, the paths with the same priorities belong to the same priority group.

The prio setting is used in the defaults{} or devices{} section of the /etc/multipath.conf file. It is silently ignored when it is specified for an individual multipath definition in the multipaths{) section. The prio line specifies the prioritizer. If the prioritizer requires an argument, you specify the argument by using the prio_args keyword on a second line.

PRIO Settings for the Defaults or Devices Sections

prio

Specifies the prioritizer program to call to obtain a path priority value. Weights are summed for each path group to determine the next path group to use in case of failure.

Use the prio_args keyword to specify arguments if the specified prioritizer requires arguments.

If no prio keyword is specified, all paths are equal. The default setting is const with a prio_args setting with no value.

prio "const"
prio_args ""

Example prioritizer programs include:

Prioritizer Program

Description

alua

Generates path priorities based on the SCSI-3 ALUA settings.

const

Generates the same priority for all paths.

emc

Generates the path priority for EMC arrays.

hdc

Generates the path priority for Hitachi HDS Modular storage arrays.

hp_sw

Generates the path priority for Compaq/HP controller in active/standby mode.

ontap

Generates the path priority for NetApp arrays.

random

Generates a random priority for each path.

rdac

Generates the path priority for LSI/Engenio RDAC controller.

weightedpath

Generates the path priority based on the weighted values you specify in the arguments for prio_args.

path_latency

Generate the path priority based on a latency algorithm, which is configured with the prio_args keyword.

prio_args arguments

These are the arguments for the prioritizer programs that require arguments. Most prio programs do not need arguments. There is no default. The values depend on the prio setting and whether the prioritizer requires any of the following arguments:

weighted

Requires a value of the form [hbtl|devname|serial|wwn] REGEX1 PRIO1 REGEX2 PRIO2...

hbtl

Regex must be of SCSI H:B:T:L format, for example 1:0:.:. and *:0:0:., with a weight value, where H, B, T, L are the host, bus, target, and LUN IDs for a device. For example:

prio "weightedpath"
prio_args "hbtl 1:.:.:. 2 4:.:.:. 4"
devname

Regex is in device name format. For example: sda, sd.e

serial

Regex is in serial number format. For example: .*J1FR.*324. Look up your serial number with the multipathd show paths format %z command. (multipathd show wildcards displays all format wildcards.)

alua

If exclusive_pref_bit is set for a device (alua exclusive_pref_bit), paths with the preferred path bit set will always be in their own path group.

path_latency

path_latency adjusts latencies between remote and local storage arrays, if both arrays use the same type of hardware. Usually the latency on the remote array will be higher, so you can tune the latency to bring them closer together. This requires a value pair of the form io_num=20 base_num=10.

io_num is the number of read IOs sent to the current path continuously, which are used to calculate the average path latency. Valid values are integers from 2 to 200.

base_num is the logarithmic base number, used to partition different priority ranks. Valid values are integer from 2 - 10. The maximum average latency value is 100s, the minimum is 1us. For example, if base_num=10, the paths will be grouped in priority groups with path latency <=1us, (1us, 10us], (10us, 100us], (100us, 1ms], (1ms, 10ms], (10ms, 100ms], (100ms, 1s], (1s, 10s], (10s, 100s], >100s

Multipath Attributes

Multipath attributes are used to control the behavior of multipath I/O for devices. You can specify attributes as defaults for all multipath devices. You can also specify attributes that apply only to a given multipath device by creating an entry for that device in the multipaths section of the multipath configuration file.

user_friendly_names

Specifies whether to use world-wide IDs (WWIDs) or to use the /var/lib/multipath/bindings file to assign a persistent and unique alias to the multipath devices in the form of /dev/mapper/mpathN.

This option can be used in the devices section and the multipaths section.

Value

Description

no

(Default) Use the WWIDs shown in the /dev/disk/by-id/ location.

yes

Autogenerate user-friendly names as aliases for the multipath devices instead of the actual ID.

failback

Specifies whether to monitor the failed path recovery, and indicates the timing for group failback after failed paths return to service.

When the failed path recovers, the path is added back into the multipath-enabled path list based on this setting. Multipath evaluates the priority groups, and changes the active priority group when the priority of the primary path exceeds the secondary group.

Value

Description

manual

(Default) The failed path is not monitored for recovery. The administrator runs the multipath command to update enabled paths and priority groups.

followover

Only perform automatic failback when the first path of a pathgroup becomes active. This keeps a node from automatically failing back when another node requested the failover.

immediate

When a path recovers, enable the path immediately.

N

When the path recovers, wait N seconds before enabling the path. Specify an integer value greater than 0.

We recommend failback setting of manual for multipath in cluster environments to prevent multipath failover ping-pong.

failback "manual"
Important
Important: Verification

Make sure that you verify the failback setting with your storage system vendor. Different storage systems can require different settings.

no_path_retry

Specifies the behaviors to use on path failure.

Value

Description

N

Specifies the number of retries until multipath stops the queuing and fails the path. Specify an integer value greater than 0.

In a cluster, you can specify a value of “0” to prevent queuing and allow resources to fail over.

fail

Specifies immediate failure (no queuing).

queue

Never stop queuing (queue forever until the path comes alive).

We recommend a retry setting of fail or 0 in the /etc/multipath.conf file when working in a cluster. This causes the resources to fail over when the connection is lost to storage. Otherwise, the messages queue and the resource failover cannot occur.

no_path_retry "fail"
no_path_retry "0"
Important
Important: Verification

Make sure that you verify the retry settings with your storage system vendor. Different storage systems can require different settings.

path_checker

Determines the state of the path.

Value

Description

directio

Reads the first sector that has direct I/O. This is useful for DASD devices. Logs failure messages in the systemd journal (see Chapter 15, journalctl: Query the systemd Journal).

tur

Issues an SCSI test unit ready command to the device. This is the preferred setting if the LUN supports it. On failure, the command does not fill up the systemd log journal with messages.

CUSTOM_VENDOR_VALUE

Some SAN vendors provide custom path_checker options:

  • cciss_tur Checks the path state for HP Smart Storage Arrays.

  • emc_clariion Queries the EMC Clariion EVPD page 0xC0 to determine the path state.

  • hp_sw Checks the path state (Up, Down, or Ghost) for HP storage arrays with Active/Standby firmware.

  • rdac Checks the path state for the LSI/Engenio RDAC storage controller.

path_grouping_policy

Specifies the path grouping policy for a multipath device hosted by a given controller.

Value

Description

failover

(Default) One path is assigned per priority group so that only one path at a time is used.

multibus

All valid paths are in one priority group. Traffic is load-balanced across all active paths in the group.

group_by_prio

One priority group exists for each path priority value. Paths with the same priority are in the same priority group. Priorities are assigned by an external program.

group_by_serial

Paths are grouped by the SCSI target serial number (controller node WWN).

group_by_node_name

One priority group is assigned per target node name. Target node names are fetched in /sys/class/fc_transport/target*/node_name.

path_selector

Specifies the path-selector algorithm to use for load balancing.

Value

Description

round-robin 0

The load-balancing algorithm used to balance traffic across all active paths in a priority group.

queue-length 0

A dynamic load balancer that balances the number of in-flight I/O on paths similar to the least-pending option.

service-time 0

(Default) A service-time oriented load balancer that balances I/O on paths according to the latency.

pg_timeout

Specifies path group timeout handling. No value can be specified; an internal default is set.

polling_interval

Specifies the time in seconds between the end of one path checking cycle and the beginning of the next path checking cycle.

Specify an integer value greater than 0. The default value is 5. Make sure that you verify the polling_interval setting with your storage system vendor. Different storage systems can require different settings.

rr_min_io_rq

Specifies the number of I/O requests to route to a path before switching to the next path in the current path group, using request-based device-mapper-multipath.

Specify an integer value greater than 0. The default value is 1.

rr_min_io_rq "1"
rr_weight

Specifies the weighting method to use for paths.

Value

Description

uniform

(Default) All paths have the same round-robin weights.

priorities

Each path’s weight is determined by the path’s priority times the rr_min_io_rq setting.

uid_attribute

A udev attribute that provides a unique path identifier. The default value is ID_SERIAL.

17.10.2.2 Configuring for Round-Robin Load Balancing

All paths are active. I/O is configured for some number of seconds or some number of I/O transactions before moving to the next open path in the sequence.

17.10.2.3 Configuring for Single Path Failover

A single path with the highest priority (lowest value setting) is active for traffic. Other paths are available for failover, but are not used unless failover occurs.

17.10.2.4 Grouping I/O Paths for Round-Robin Load Balancing

Multiple paths with the same priority fall into the active group. When all paths in that group fail, the device fails over to the next highest priority group. All paths in the group share the traffic load in a round-robin load balancing fashion.

17.10.3 Reporting Target Path Groups

Use the SCSI Report Target Port Groups (sg_rtpg(8)) command. For information, see the man page for sg_rtpg(8).

17.11 Configuring Multipath I/O for the Root Device

Device Mapper Multipath I/O (DM-MPIO) is available and supported for /boot and /root in SUSE Linux Enterprise Server. In addition, the YaST partitioner in the YaST installer supports enabling multipath during the install.

17.11.1 Enabling Multipath I/O at Install Time

To install the operating system on a multipath device, the multipath software must be running at install time. The multipathd daemon is not automatically active during the system installation. You can start it by using the Configure Multipath option in the YaST Partitioner.

17.11.1.1 Enabling Multipath I/O at Install Time on an Active/Active Multipath Storage LUN

  1. Choose Expert Partitioner on the Suggested Partitioning screen during the installation.

  2. Select the Hard Disks main icon, click the Configure button, then select Configure Multipath.

  3. Start multipath.

    YaST starts to rescan the disks and shows available multipath devices (such as /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65). This is the device that should be used for all further processing.

  4. Click Next to continue with the installation.

17.11.1.2 Enabling Multipath I/O at Install Time on an Active/Passive Multipath Storage LUN

The multipathd daemon is not automatically active during the system installation. You can start it by using the Configure Multipath option in the YaST Partitioner.

To enable multipath I/O at install time for an active/passive multipath storage LUN:

  1. Choose Expert Partitioner on the Suggested Partitioning screen during the installation.

  2. Select the Hard Disks main icon, click the Configure button, then select Configure Multipath.

  3. Start multipath.

    YaST starts to rescan the disks and shows available multipath devices (such as /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65). This is the device that should be used for all further processing. Write down the device path and UUID; you will need it later.

  4. Click Next to continue with the installation.

  5. After all settings are done and the installation is finished, YaST starts to write the boot loader information, and displays a countdown to restart the system. Stop the counter by clicking the Stop button and press CtrlAltF5 to access a console.

  6. Use the console to determine if a passive path was entered in the /boot/grub2/device.map file for the hd0 entry.

    This is necessary because the installation does not distinguish between active and passive paths.

    1. Mount the root device to /mnt by entering

      mount /dev/disk/by-id/UUID;_part2 /mnt

      For example, enter

      mount /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65_part2 /mnt
    2. Mount the boot device to /mnt/boot by entering

      mount /dev/disk/by-id/UUID_part1 /mnt/boot

      For example, enter

      mount /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65_part2 /mnt/boot
    3. In the /mnt/boot/grub2/device.map file, determine if the hd0 entry points to a passive path, then do one of the following:

      • Active path:  No action is needed. Skip all remaining steps and return to the YaST graphical environment by pressing CtrlAltF7 and continue with the installation.

      • Passive path:  The configuration must be changed and the boot loader must be reinstalled.

  7. If the hd0 entry points to a passive path, change the configuration and reinstall the boot loader:

    1. Enter the following commands at the console prompt:

                mount -o bind /dev /mnt/dev
                mount -o bind /sys /mnt/sys
                mount -o bind /proc /mnt/proc
                chroot /mnt
    2. At the console, run multipath -ll, then check the output to find the active path.

      Passive paths are flagged as ghost.

    3. In the /boot/grub2/device.map file, change the hd0 entry to an active path, save the changes, and close the file.

    4. Reinstall the boot loader by entering

                grub-install /dev/disk/by-id/UUID_part1 /mnt/boot

      For example, enter

      grub-install /dev/disk/by-id/dm-uuid-mpath-3600a0b80000f4593000012ae4ab0ae65_part2 /mnt/boot
    5. Enter the following commands:

      exit
      umount /mnt/*
      umount /mnt
  8. Return to the YaST graphical environment by pressing CtrlAltF7.

  9. Click OK to continue with the installation reboot.

17.11.2 Enabling Multipath I/O for an Existing Root Device

  1. Install Linux with only a single path active, preferably one where the by-id symbolic links are listed in the partitioner.

  2. Mount the devices by using the /dev/disk/by-id path used during the install.

  3. Open or create /etc/dracut.conf.d/10-mp.conf and add the following line (mind the leading whitespace):

    force_drivers+=" dm-multipath"
  4. For IBM IBM Z, before running dracut, edit the /etc/zipl.conf file to change the by-path information in zipl.conf with the same by-id information that was used in /etc/fstab.

  5. Run dracut -f to update the initrd image.

  6. For IBM IBM Z, after running dracut, run zipl.

  7. Reboot the server.

17.11.3 Disabling Multipath I/O on the Root Device

Add multipath=off to the kernel command line. This can be done with the YaST Boot Loader module. Open Boot Loader Installation › Kernel Parameters and add the parameter to both command lines.

This affects only the root device. All other devices are not affected.

17.12 Configuring Multipath I/O for an Existing Software RAID

Ideally, you should configure multipathing for devices before you use them as components of a software RAID device. If you add multipathing after creating any software RAID devices, the DM-MPIO service might be starting after the multipath service on reboot, which makes multipathing appear not to be available for RAIDs. You can use the procedure in this section to get multipathing running for a previously existing software RAID.

For example, you might need to configure multipathing for devices in a software RAID under the following circumstances:

  • If you create a new software RAID as part of the Partitioning settings during a new install or upgrade.

  • If you did not configure the devices for multipathing before using them in the software RAID as a member device or spare.

  • If you grow your system by adding new HBA adapters to the server or expanding the storage subsystem in your SAN.

Note
Note: Assumptions

The following instructions assume the software RAID device is /dev/mapper/mpath0, which is its device name as recognized by the kernel. It assumes you have enabled user-friendly names in the /etc/multipath.conf file as described in Section 17.9, “Configuring User-Friendly Names or Alias Names”.

Make sure to modify the instructions for the device name of your software RAID.

  1. Open a terminal console.

    Except where otherwise directed, use this console to enter the commands in the following steps.

  2. If any software RAID devices are currently mounted or running, enter the following commands for each device to unmount the device and stop it.

    sudo umount /dev/mapper/mpath0
    sudo mdadm --misc --stop /dev/mapper/mpath0
  3. Stop the md service by entering

    sudo systemctl stop mdmonitor
  4. Start the multipathd daemon by entering the following command:

    systemctl start multipathd
  5. After the multipathing service has been started, verify that the software RAID’s component devices are listed in the /dev/disk/by-id directory. Do one of the following:

    • Devices Are Listed:  The device names should now have symbolic links to their Device Mapper Multipath device names, such as /dev/dm-1.

    • Devices Are Not Listed:  Force the multipath service to recognize them by flushing and rediscovering the devices by entering

      sudo multipath -F
      sudo multipath -v0

      The devices should now be listed in /dev/disk/by-id, and have symbolic links to their Device Mapper Multipath device names. For example:

      lrwxrwxrwx 1 root root 10 2011-01-06 11:42 dm-uuid-mpath-36006016088d014007e0d0d2213ecdf11 -> ../../dm-1
  6. Restart the mdmonitor service and the RAID device by entering

    systemctl start mdmonitor
  7. Check the status of the software RAID by entering

    mdadm --detail /dev/mapper/mpath0

    The RAID’s component devices should match their Device Mapper Multipath device names that are listed as the symbolic links of devices in the /dev/disk/by-id directory.

  8. In case the root (/) device or any parts of it (such as /var, /etc, /log) are on the SAN and multipath is needed to boot, rebuild the initrd:

    dracut -f --add-multipath
  9. Reboot the server to apply the changes.

  10. Verify that the software RAID array comes up properly on top of the multipathed devices by checking the RAID status. Enter

    mdadm --detail /dev/mapper/mpath0

    For example:

    Number Major Minor RaidDevice State
    0 253 0 0 active sync /dev/dm-0
    1 253 1 1 active sync /dev/dm-1
    2 253 2 2 active sync /dev/dm-2
Note
Note: Using mdadm with Multipath Devices

The mdadm tool requires that the devices be accessed by the ID rather than by the device node path. Refer to Section 17.4.3, “Using MDADM for Multipathed Devices” for details.

17.13 Using LVM2 on Multipath Devices

When using multipath, all paths to a resource are present as devices in the device tree. By default LVM checks if there is a multipath device on top of any device in the device tree. If LVM finds a multipath device on top, it assumes that the device is a multipath component and ignores the (underlying) device. This is the most likely desired behavior, but it can be changed in /etc/lvm/lvm.conf. When multipath_component_detection is set to 0, LVM is scanning multipath component devices. The default entry in lvm.conf is:

    # By default, LVM2 will ignore devices used as component paths
    # of device-mapper multipath devices.
    # 1 enables; 0 disables.
    multipath_component_detection = 1

17.14 Best Practice

17.14.1 Scanning for New Devices without Rebooting

If your system has already been configured for multipathing and you later need to add more storage to the SAN, you can use the rescan-scsi-bus.sh script to scan for the new devices. By default, this script scans all HBAs with typical LUN ranges. The general syntax for the command looks like the following:

rescan-scsi-bus.sh [options] [host [host ...]]

For most storage subsystems, the script can be run successfully without options. However, some special cases might need to use one or more options. Run rescan-scsi-bus.sh --help for details.

Warning
Warning: EMC PowerPath Environments

In EMC PowerPath environments, do not use the rescan-scsi-bus.sh utility provided with the operating system or the HBA vendor scripts for scanning the SCSI buses. To avoid potential file system corruption, EMC requires that you follow the procedure provided in the vendor documentation for EMC PowerPath for Linux.

Use the following procedure to scan the devices and make them available to multipathing without rebooting the system.

  1. On the storage subsystem, use the vendor’s tools to allocate the device and update its access control settings to allow the Linux system access to the new storage. Refer to the vendor’s documentation for details.

  2. Scan all targets for a host to make its new device known to the middle layer of the Linux kernel’s SCSI subsystem. At a terminal console prompt, enter

    sudo rescan-scsi-bus.sh

    Depending on your setup, you might need to run rescan-scsi-bus.sh with optional parameters. Refer to rescan-scsi-bus.sh --help for details.

  3. Check for scanning progress in the systemd journal (see Chapter 15, journalctl: Query the systemd Journal for details). At a terminal console prompt, enter

    sudo journalctl -r

    This command displays the last lines of the log. For example:

    tux > sudo journalctl -r
    Feb 14 01:03 kernel: SCSI device sde: 81920000
    Feb 14 01:03 kernel: SCSI device sdf: 81920000
    Feb 14 01:03 multipathd: sde: path checker registered
    Feb 14 01:03 multipathd: sdf: path checker registered
    Feb 14 01:03 multipathd: mpath4: event checker started
    Feb 14 01:03 multipathd: mpath5: event checker started
    Feb 14 01:03:multipathd: mpath4: remaining active paths: 1
    Feb 14 01:03 multipathd: mpath5: remaining active paths: 1
    [...]
  4. Repeat the previous steps to add paths through other HBA adapters on the Linux system that are connected to the new device.

  5. Run the multipath command to recognize the devices for DM-MPIO configuration. At a terminal console prompt, enter

    sudo multipath

    You can now configure the new device for multipathing.

17.14.2 Scanning for New Partitioned Devices without Rebooting

Use the example in this section to detect a newly added multipathed LUN without rebooting.

Warning
Warning: EMC PowerPath Environments

In EMC PowerPath environments, do not use the rescan-scsi-bus.sh utility provided with the operating system or the HBA vendor scripts for scanning the SCSI buses. To avoid potential file system corruption, EMC requires that you follow the procedure provided in the vendor documentation for EMC PowerPath for Linux.

  1. Open a terminal console.

  2. Scan all targets for a host to make its new device known to the middle layer of the Linux kernel’s SCSI subsystem. At a terminal console prompt, enter

    rescan-scsi-bus.sh

    Depending on your setup, you might need to run rescan-scsi-bus.sh with optional parameters. Refer to rescan-scsi-bus.sh --help for details.

  3. Verify that the device is seen (such as if the link has a new time stamp) by entering

    ls -lrt /dev/dm-*

    You can also verify the devices in /dev/disk/by-id by entering

    ls -l /dev/disk/by-id/
  4. Verify the new device appears in the log by entering

    sudo journalctl -r
  5. Use a text editor to add a new alias definition for the device in the /etc/multipath.conf file, such as data_vol3.

    For example, if the UUID is 36006016088d014006e98a7a94a85db11, make the following changes:

    defaults {
         user_friendly_names   yes
      }
    multipaths {
         multipath {
              wwid    36006016088d014006e98a7a94a85db11
              alias  data_vol3
              }
      }
  6. Create a partition table for the device by entering

    fdisk /dev/disk/by-id/dm-uuid-mpath-<UUID>

    Replace UUID with the device WWID, such as 36006016088d014006e98a7a94a85db11.

  7. Trigger udev by entering

    sudo echo 'add' > /sys/block/DM_DEVICE/uevent

    For example, to generate the device-mapper devices for the partitions on dm-8, enter

    sudo echo 'add' > /sys/block/dm-8/uevent
  8. Create a file system on the device /dev/disk/by-id/dm-uuid-mpath-UUID_partN. Depending on your choice for the file system, you may use one of the following commands for this purpose: mkfs.btrfs mkfs.ext3, mkfs.ext4, or mkfs.xfs. Refer to the respective man pages for details. Replace UUID_partN with the actual UUID and partition number, such as 36006016088d014006e98a7a94a85db11_part1.

  9. Create a label for the new partition by entering the following command:

    sudo tune2fs -L LABELNAME /dev/disk/by-id/dm-uuid-UUID_partN

    Replace UUID_partN with the actual UUID and partition number, such as 36006016088d014006e98a7a94a85db11_part1. Replace LABELNAME with a label of your choice.

  10. Reconfigure DM-MPIO to let it read the aliases by entering

    sudo multipathd -k'reconfigure'
  11. Verify that the device is recognized by multipathd by entering

    sudo multipath -ll
  12. Use a text editor to add a mount entry in the /etc/fstab file.

    At this point, the alias you created in a previous step is not yet in the /dev/disk/by-label directory. Add a mount entry for the /dev/dm-9 path, then change the entry before the next time you reboot to

    LABEL=LABELNAME
  13. Create a directory to use as the mount point, then mount the device.

17.14.3 Viewing Multipath I/O Status

Querying the multipath I/O status outputs the current status of the multipath maps.

The multipath -l option displays the current path status as of the last time that the path checker was run. It does not run the path checker.

The multipath -ll option runs the path checker, updates the path information, then displays the current status information. This command always displays the latest information about the path status.

tux > sudo multipath -ll
3600601607cf30e00184589a37a31d911
[size=127 GB][features="0"][hwhandler="1 emc"]

\_ round-robin 0 [active][first]
  \_ 1:0:1:2 sdav 66:240  [ready ][active]
  \_ 0:0:1:2 sdr  65:16   [ready ][active]

\_ round-robin 0 [enabled]
  \_ 1:0:0:2 sdag 66:0    [ready ][active]
  \_ 0:0:0:2 sdc  8:32    [ready ][active]

For each device, it shows the device’s ID, size, features, and hardware handlers.

Paths to the device are automatically grouped into priority groups on device discovery. Only one priority group is active at a time. For an active/active configuration, all paths are in the same group. For an active/passive configuration, the passive paths are placed in separate priority groups.

The following information is displayed for each group:

  • Scheduling policy used to balance I/O within the group, such as round-robin

  • Whether the group is active, disabled, or enabled

  • Whether the group is the first (highest priority) group

  • Paths contained within the group

The following information is displayed for each path:

  • The physical address as HOST:BUS:TARGET:LUN, such as 1:0:1:2

  • Device node name, such as sda

  • Major:minor numbers

  • Status of the device

17.14.4 Managing I/O in Error Situations

You might need to configure multipathing to queue I/O if all paths fail concurrently by enabling queue_if_no_path. Otherwise, I/O fails immediately if all paths are gone. In certain scenarios, where the driver, the HBA, or the fabric experience spurious errors, DM-MPIO should be configured to queue all I/O where those errors lead to a loss of all paths, and never propagate errors upward.

When you use multipathed devices in a cluster, you might choose to disable queue_if_no_path. This automatically fails the path instead of queuing the I/O, and escalates the I/O error to cause a failover of the cluster resources.

Because enabling queue_if_no_path leads to I/O being queued indefinitely unless a path is reinstated, ensure that multipathd is running and works for your scenario. Otherwise, I/O might be stalled indefinitely on the affected multipathed device until reboot or until you manually return to failover instead of queuing.

To test the scenario:

  1. Open a terminal console.

  2. Activate queuing instead of failover for the device I/O by entering

    sudo dmsetup message DEVICE_ID 0 queue_if_no_path

    Replace the DEVICE_ID with the ID for your device. The 0 value represents the sector and is used when sector information is not needed.

    For example, enter:

    sudo dmsetup message 3600601607cf30e00184589a37a31d911 0 queue_if_no_path
  3. Return to failover for the device I/O by entering

    sudo dmsetup message DEVICE_ID 0 fail_if_no_path

    This command immediately causes all queued I/O to fail.

    Replace the DEVICE_ID with the ID for your device. For example, enter

    sudo dmsetup message 3600601607cf30e00184589a37a31d911 0 fail_if_no_path

To set up queuing I/O for scenarios where all paths fail:

  1. Open a terminal console.

  2. Open the /etc/multipath.conf file in a text editor.

  3. Uncomment the defaults section and its ending bracket, then add the default_features setting, as follows:

    defaults {
      default_features "1 queue_if_no_path"
    }
  4. After you modify the /etc/multipath.conf file, you must run dracut -f to re-create the initrd on your system, then reboot for the changes to take effect.

  5. When you are ready to return to failover for the device I/O, enter

    sudo dmsetup message MAPNAME 0 fail_if_no_path

    Replace the MAPNAME with the mapped alias name or the device ID for the device. The 0 value represents the sector and is used when sector information is not needed.

    This command immediately causes all queued I/O to fail and propagates the error to the calling application.

17.14.5 Resolving Stalled I/O

If all paths fail concurrently and I/O is queued and stalled, do the following:

  1. Enter the following command at a terminal prompt:

    sudo dmsetup message MAPNAME 0 fail_if_no_path

    Replace MAPNAME with the correct device ID or mapped alias name for the device. The 0 value represents the sector and is used when sector information is not needed.

    This command immediately causes all queued I/O to fail and propagates the error to the calling application.

  2. Reactivate queuing by entering the following command:

    sudo dmsetup message MAPNAME 0 queue_if_no_path

17.14.6 Configuring Default Settings for IBM IBM Z Devices

Testing of the IBM Z device with multipathing has shown that the dev_loss_tmo parameter should be set to infinity (2147483647), and the fast_io_fail_tmo parameter should be set to 5 seconds. If you are using IBM Z devices, modify the /etc/multipath.conf file to specify the values as follows:

defaults {
       dev_loss_tmo 2147483647
       fast_io_fail_tmo 5
}

The dev_loss_tmo parameter sets the number of seconds to wait before marking a multipath link as bad. When the path fails, any current I/O on that failed path fails. The default value varies according to the device driver being used. To use the driver’s internal timeouts, set the value to zero (0). It can also be set to "infinity" or 2147483647, which sets it to the max value of 2147483647 seconds (68 years).

The fast_io_fail_tmo parameter sets the length of time to wait before failing I/O when a link problem is detected. I/O that reaches the driver fails. If I/O is in a blocked queue, the I/O does not fail until the dev_loss_tmo time elapses and the queue is unblocked.

If you modify the /etc/multipath.conf file, the changes are not applied until you update the multipath maps, or until the multipathd daemon is restarted (systemctl restart multipathd).

17.14.7 Using Multipath with NetApp Devices

When using multipath for NetApp devices, we recommend the following settings in the /etc/multipath.conf file:

  • Set the default values for the following parameters globally for NetApp devices:

    max_fds max
    queue_without_daemon no
  • Set the default values for the following parameters for NetApp devices in the hardware table:

    dev_loss_tmo infinity
    fast_io_fail_tmo 5
    features "3 queue_if_no_path pg_init_retries 50"

17.14.8 Using --noflush with Multipath Devices

The --noflush option should always be used when running on multipath devices.

For example, in scripts where you perform a table reload, you use the --noflush option on resume to ensure that any outstanding I/O is not flushed, because you need the multipath topology information.

load
resume --noflush

17.14.9 SAN Timeout Settings When the Root Device Is Multipathed

A system with root (/) on a multipath device might stall when all paths have failed and are removed from the system because a dev_loss_tmo timeout is received from the storage subsystem (such as Fibre Channel storage arrays).

If the system device is configured with multiple paths and the multipath no_path_retry setting is active, you should modify the storage subsystem’s dev_loss_tmo setting accordingly to ensure that no devices are removed during an all-paths-down scenario. We strongly recommend that you set the dev_loss_tmo value to be equal to or higher than the no_path_retry setting from multipath.

The recommended setting for the storage subsystem’s dev_los_tmo is

<dev_loss_tmo> = <no_path_retry> * <polling_interval>

where the following definitions apply for the multipath values:

  • no_path_retry is the number of retries for multipath I/O until the path is considered to be lost, and queuing of IO is stopped.

  • polling_interval is the time in seconds between path checks.

Each of these multipath values should be set from the /etc/multipath.conf configuration file. For information, see Section 17.6, “Creating or Modifying the /etc/multipath.conf File”.

17.15 Troubleshooting MPIO

This section describes some known issues and possible solutions for MPIO.

17.15.1 The System Exits to Emergency Shell at Boot When Multipath Is Enabled

During boot the system exits into the emergency shell with messages similar to the following:

[  OK  ] Listening on multipathd control socket.
         Starting Device-Mapper Multipath Device Controller...
[  OK  ] Listening on Device-mapper event daemon FIFOs.
         Starting Device-mapper event daemon...
         Expecting device dev-disk-by\x2duuid-34be48b2\x2dc21...32dd9.device...
         Expecting device dev-sda2.device...
[  OK  ] Listening on udev Kernel Socket.
[  OK  ] Listening on udev Control Socket.
         Starting udev Coldplug all Devices...
         Expecting device dev-disk-by\x2duuid-1172afe0\x2d63c...5d0a7.device...
         Expecting device dev-disk-by\x2duuid-c4a3d1de\x2d4dc...ef77d.device...
[  OK  ] Started Create list of required static device nodes ...current kernel.
         Starting Create static device nodes in /dev...
[  OK  ] Started Collect Read-Ahead Data.
[  OK  ] Started Device-mapper event daemon.
[  OK  ] Started udev Coldplug all Devices.
         Starting udev Wait for Complete Device Initialization...
[  OK  ] Started Replay Read-Ahead Data.
         Starting Load Kernel Modules...
         Starting Remount Root and Kernel File Systems...
[  OK  ] Started Create static devices
[   13.682489] floppy0: no floppy controllers found
[*     ] (1 of 4) A start job is running for dev-disk-by\x2du...(7s / 1min 30s)
[*     ] (1 of 4) A start job is running for dev-disk-by\x2du...(7s / 1min 30s)

...

Timed out waiting for device dev-disk-by\x2duuid-c4a...cfef77d.device.
[DEPEND] Dependency failed for /opt.
[DEPEND] Dependency failed for Local File Systems.
[DEPEND] Dependency failed for Postfix Mail Transport Agent.
Welcome to emergency shell
Give root password for maintenance
(or press Control-D to continue):

At this stage, you are working in a temporary dracut emergency shell from the initrd environment. To make the configuration changes described below persistent, you need to perform them in the in the environment of the installed system.

  1. Identify what the system root (/) file system is. Inspect the content of the /proc/cmdline and look for the root= parameter.

  2. Verify whether the root file system is mounted:

    tux > sudo systemctl status sysroot.mount
    Tip
    Tip

    dracut mounts the root file system under /sysroot by default.

    From now on, let us assume that the root file system is mounted under /sysroot.

  3. Mount system-required file systems under /sysroot, chroot into it, then mount all file systems. For example:

    tux > sudo for x in proc sys dev run; do mount --bind /$x /sysroot/$x; done
    tux > sudo chroot /sysroot /bin/bash
    tux > sudo mount -a

    Refer to Section 41.6.2.3, “Accessing the Installed System” for more details.

  4. Make changes to the multipath or dracut configuration as suggested in the procedures below. Remember to rebuild initrd to include the modifications.

  5. Exit the chroot environment by entering the exit command, then exit the emergency shell and reboot the server by pressing CtrlD.

Procedure 17.1: Emergency Shell: Blacklist File Systems

This fix is required if the root file system is not on multipath but multipath is enabled. In such a setup, multipath tries to set its paths for all devices that are not blacklisted. Since the device with the root file system is already mounted, it is inaccessible for multipath and causes it to fail. Fix this issue by configuring multipath correctly by blacklisting the root device in /etc/multipath.conf:

  1. Run multipath -v2 in the emergency shell and identify the device for the root file system. It will result in an output similar to:

    root # multipath -v2
    Dec 18 10:10:03 | 3600508b1001030343841423043300400: ignoring map

    The string between | and : is the WWID needed for blacklisting.

  2. Open /etc/multipath.conf and add the following:

    blacklist {
      wwid "WWID"
    }

    Replace WWID with the ID you retrieved in the previous step. For more information see Section 17.8, “Blacklisting Non-Multipath Devices”.

  3. Rebuild the initrd using the following command:

    tux > dracut -f --add-multipath
Procedure 17.2: Emergency Shell: Rebuild the initrd

This fix is required if the multipath status (enabled or disabled) differs between initrd and system. To fix this, rebuild the initrd:

  • If multipath has been enabled in the system, rebuild the initrd with multipath support with this command:

    dracut --force --add multipath

    In case Multipath has been disabled in the system, rebuild the initrd with Multipath support with this command:

    dracut --force -o multipath
Procedure 17.3: Emergency Shell: Rebuild the initrd

This fix is required if the initrd does not contain drivers to access network attached storage. This may, for example, be the case when the system was installed without multipath or when the respective hardware was added or replaced.

  1. Add the required driver(s) to the variable force_drivers in the file /etc/dracut.conf.d/01-dist.conf. For example, if your system contains a RAID controller accessed by the hpsa driver and multipathed devices connected to a QLogic controller accessed by the driver qla23xx, this entry would look like:

    force_drivers+="hpsa qla23xx"
  2. Rebuild the initrd using the following command:

    dracut -f --add-multipath
  3. To prevent the system from booting into emergency mode if attaching the network storage fails, it is recommended to add the mount option _netdev to the respective entries in /etc/fstab.

17.15.2 PRIO Settings for Individual Devices Fail After Upgrading to Multipath 0.4.9 or Later

Multipath Tools from version 0.4.9 onward uses the prio setting in the defaults{} or devices{} section of the /etc/multipath.conf file. It silently ignores the keyword prio when it is specified for an individual multipath definition in the multipaths{) section.

Multipath Tools 0.4.8 allowed the prio setting in the individual multipath definition in the multipaths{) section to override the prio settings in the defaults{} or devices{} section.

17.15.3 PRIO Settings with Arguments Fail After Upgrading to multipath-tools-0.4.9 or Later

When you upgrade from multipath-tools-0.4.8 to multipath-tools-0.4.9, the prio settings in the /etc/multipath.conf file are broken for prioritizers that require an argument. In multipath-tools-0.4.9, the prio keyword is used to specify the prioritizer, and the prio_args keyword is used to specify the argument for prioritizers that require an argument. Previously, both the prioritizer and its argument were specified on the same prio line.

For example, in multipath-tools-0.4.8, the following line was used to specify a prioritizer and its arguments on the same line.

prio "weightedpath hbtl [1,3]:.:.+:.+ 260 [0,2]:.:.+:.+ 20"

After upgrading to multipath-tools-0.4.9 or later, the command causes an error. The message is similar to the following:

<Month day hh:mm:ss> | Prioritizer 'weightedpath hbtl [1,3]:.:.+:.+ 260
[0,2]:.:.+:.+ 20' not found in /lib64/multipath

To resolve this problem, use a text editor to modify the prio line in the /etc/multipath.conf file. Create two lines with the prioritizer specified on the prio line, and the prioritizer argument specified on the prio_args line below it:

prio "weightedpath"
prio_args "hbtl [1,3]:.:.+:.+ 260 [0,2]:.:.+:.+ 20"

Restart the multipathd daemon for the changes to become active by running sudo systemctl restart multipathd.

17.15.4 Technical Information Documents

For information about troubleshooting multipath I/O issues on SUSE Linux Enterprise Server, see the following Technical Information Documents (TIDs) in the SUSE Knowledgebase: