Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE Linux Enterprise Server 11 SP4 11 SP4

10 Managing Software RAIDs 6 and 10 with mdadm

This section describes how to create software RAID 6 and 10 devices, using the Multiple Devices Administration (mdadm(8)) tool. You can also use mdadm to create RAIDs 0, 1, 4, and 5. The mdadm tool provides the functionality of legacy programs mdtools and raidtools.

10.1 Creating a RAID 6

10.1.1 Understanding RAID 6

RAID 6 is essentially an extension of RAID 5 that allows for additional fault tolerance by using a second independent distributed parity scheme (dual parity). Even if two of the hard disk drives fail during the data recovery process, the system continues to be operational, with no data loss.

RAID 6 provides for extremely high data fault tolerance by sustaining multiple simultaneous drive failures. It handles the loss of any two devices without data loss. Accordingly, it requires N+2 drives to store N drives worth of data. It requires a minimum of 4 devices.

The performance for RAID 6 is slightly lower but comparable to RAID 5 in normal mode and single disk failure mode. It is very slow in dual disk failure mode.

Table 10.1: Comparison of RAID 5 and RAID 6

Feature

RAID 5

RAID 6

Number of devices

N+1, minimum of 3

N+2, minimum of 4

Parity

Distributed, single

Distributed, dual

Performance

Medium impact on write and rebuild

More impact on sequential write than RAID 5

Fault-tolerance

Failure of one component device

Failure of two component devices

10.1.2 Creating a RAID 6

The procedure in this section creates a RAID 6 device /dev/md0 with four devices: /dev/sda1, /dev/sdb1, /dev/sdc1, and /dev/sdd1. Ensure that you modify the procedure to use your actual device nodes.

  1. Open a terminal console, then log in as the root user or equivalent.

  2. Create a RAID 6 device. At the command prompt, enter

    mdadm --create /dev/md0 --run --level=raid6 --chunk=128 --raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdc1 /dev/sdd1

    The default chunk size is 64 KB.

  3. Create a file system on the RAID 6 device /dev/md0, such as a Reiser file system (reiserfs). For example, at the command prompt, enter

    mkfs.reiserfs /dev/md0

    Modify the command if you want to use a different file system.

  4. Edit the /etc/mdadm.conf file to add entries for the component devices and the RAID device /dev/md0.

  5. Edit the /etc/fstab file to add an entry for the RAID 6 device /dev/md0.

  6. Reboot the server.

    The RAID 6 device is mounted to /local.

  7. (Optional) Add a hot spare to service the RAID array. For example, at the command prompt enter:

    mdadm /dev/md0 -a /dev/sde1

10.2 Creating Nested RAID 10 Devices with mdadm

10.2.1 Understanding Nested RAID Devices

A nested RAID device consists of a RAID array that uses another RAID array as its basic element, instead of using physical disks. The goal of this configuration is to improve the performance and fault tolerance of the RAID.

Linux supports nesting of RAID 1 (mirroring) and RAID 0 (striping) arrays. Generally, this combination is referred to as RAID 10. To distinguish the order of the nesting, this document uses the following terminology:

  • RAID 1+0:  RAID 1 (mirror) arrays are built first, then combined to form a RAID 0 (stripe) array.

  • RAID 0+1:  RAID 0 (stripe) arrays are built first, then combined to form a RAID 1 (mirror) array.

The following table describes the advantages and disadvantages of RAID 10 nesting as 1+0 versus 0+1. It assumes that the storage objects you use reside on different disks, each with a dedicated I/O capability.

Table 10.2: Nested RAID Levels

RAID Level

Description

Performance and Fault Tolerance

10 (1+0)

RAID 0 (stripe) built with RAID 1 (mirror) arrays

RAID 1+0 provides high levels of I/O performance, data redundancy, and disk fault tolerance. Because each member device in the RAID 0 is mirrored individually, multiple disk failures can be tolerated and data remains available as long as the disks that fail are in different mirrors.

You can optionally configure a spare for each underlying mirrored array, or configure a spare to serve a spare group that serves all mirrors.

10 (0+1)

RAID 1 (mirror) built with RAID 0 (stripe) arrays

RAID 0+1 provides high levels of I/O performance and data redundancy, but slightly less fault tolerance than a 1+0. If multiple disks fail on one side of the mirror, then the other mirror is available. However, if disks are lost concurrently on both sides of the mirror, all data is lost.

This solution offers less disk fault tolerance than a 1+0 solution, but if you need to perform maintenance or maintain the mirror on a different site, you can take an entire side of the mirror offline and still have a fully functional storage device. Also, if you lose the connection between the two sites, either site operates independently of the other. That is not true if you stripe the mirrored segments, because the mirrors are managed at a lower level.

If a device fails, the mirror on that side fails because RAID 1 is not fault-tolerant. Create a new RAID 0 to replace the failed side, then resynchronize the mirrors.

10.2.2 Creating Nested RAID 10 (1+0) with mdadm

A nested RAID 1+0 is built by creating two or more RAID 1 (mirror) devices, then using them as component devices in a RAID 0.

Important
Important

If you need to manage multiple connections to the devices, you must configure multipath I/O before configuring the RAID devices. For information, see Chapter 7, Managing Multipath I/O for Devices.

The procedure in this section uses the device names shown in the following table. Ensure that you modify the device names with the names of your own devices.

Table 10.3: Scenario for Creating a RAID 10 (1+0) by Nesting

Raw Devices

RAID 1 (mirror)

RAID 1+0 (striped mirrors)

/dev/sdb1
/dev/sdc1

/dev/md0

/dev/md2

/dev/sdd1
/dev/sde1

/dev/md1

  1. Open a terminal console, then log in as the root user or equivalent.

  2. Create 2 software RAID 1 devices, using two different devices for each RAID 1 device. At the command prompt, enter these two commands:

    mdadm --create /dev/md0 --run --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1
    mdadm --create /dev/md1 --run --level=1 --raid-devices=2 /dev/sdd1 /dev/sde1
  3. Create the nested RAID 1+0 device. At the command prompt, enter the following command using the software RAID 1 devices you created in Step 2:

    mdadm --create /dev/md2 --run --level=0 --chunk=64 --raid-devices=2 /dev/md0 /dev/md1

    The default chunk size is 64 KB.

  4. Create a file system on the RAID 1+0 device /dev/md2, such as a Reiser file system (reiserfs). For example, at the command prompt, enter

    mkfs.reiserfs /dev/md2

    Modify the command if you want to use a different file system.

  5. Edit the /etc/mdadm.conf file to add entries for the component devices and the RAID device /dev/md2.

  6. Edit the /etc/fstab file to add an entry for the RAID 1+0 device /dev/md2.

  7. Reboot the server.

    The RAID 1+0 device is mounted to /local.

10.2.3 Creating Nested RAID 10 (0+1) with mdadm

A nested RAID 0+1 is built by creating two to four RAID 0 (striping) devices, then mirroring them as component devices in a RAID 1.

Important
Important

If you need to manage multiple connections to the devices, you must configure multipath I/O before configuring the RAID devices. For information, see Chapter 7, Managing Multipath I/O for Devices.

In this configuration, spare devices cannot be specified for the underlying RAID 0 devices because RAID 0 cannot tolerate a device loss. If a device fails on one side of the mirror, you must create a replacement RAID 0 device, than add it into the mirror.

The procedure in this section uses the device names shown in the following table. Ensure that you modify the device names with the names of your own devices.

Table 10.4: Scenario for Creating a RAID 10 (0+1) by Nesting

Raw Devices

RAID 0 (stripe)

RAID 0+1 (mirrored stripes)

/dev/sdb1
/dev/sdc1

/dev/md0

/dev/md2

/dev/sdd1
/dev/sde1

/dev/md1

  1. Open a terminal console, then log in as the root user or equivalent.

  2. Create two software RAID 0 devices, using two different devices for each RAID 0 device. At the command prompt, enter these two commands:

    mdadm --create /dev/md0 --run --level=0 --chunk=64 --raid-devices=2 /dev/sdb1 /dev/sdc1
    mdadm --create /dev/md1 --run --level=0 --chunk=64 --raid-devices=2 /dev/sdd1 /dev/sde1

    The default chunk size is 64 KB.

  3. Create the nested RAID 0+1 device. At the command prompt, enter the following command using the software RAID 0 devices you created in Step 2:

    mdadm --create /dev/md2 --run --level=1 --raid-devices=2 /dev/md0 /dev/md1
  4. Create a file system on the RAID 0+1 device /dev/md2, such as a Reiser file system (reiserfs). For example, at the command prompt, enter

    mkfs.reiserfs /dev/md2

    Modify the command if you want to use a different file system.

  5. Edit the /etc/mdadm.conf file to add entries for the component devices and the RAID device /dev/md2.

  6. Edit the /etc/fstab file to add an entry for the RAID 0+1 device /dev/md2.

  7. Reboot the server.

    The RAID 0+1 device is mounted to /local.

10.3 Creating a Complex RAID 10

10.3.1 Understanding the Complex RAID10

In mdadm, the RAID10 level creates a single complex software RAID that combines features of both RAID 0 (striping) and RAID 1 (mirroring). Multiple copies of all data blocks are arranged on multiple drives following a striping discipline. Component devices should be the same size.

10.3.1.1 Comparing the Complex RAID10 and Nested RAID 10 (1+0)

The complex RAID 10 is similar in purpose to a nested RAID 10 (1+0), but differs in the following ways:

Table 10.5: Complex vs. Nested RAID 10

Feature

Complex RAID10

Nested RAID 10 (1+0)

Number of devices

Allows an even or odd number of component devices

Requires an even number of component devices

Component devices

Managed as a single RAID device

Manage as a nested RAID device

Striping

Striping occurs in the near or far layout on component devices.

The far layout provides sequential read throughput that scales by number of drives, rather than number of RAID 1 pairs.

Striping occurs consecutively across component devices

Multiple copies of data

Two or more copies, up to the number of devices in the array

Copies on each mirrored segment

Hot spare devices

A single spare can service all component devices

Configure a spare for each underlying mirrored array, or configure a spare to serve a spare group that serves all mirrors.

10.3.1.2 Number of Replicas in the Complex RAID10

When configuring an complex RAID10 array, you must specify the number of replicas of each data block that are required. The default number of replicas is 2, but the value can be 2 to the number of devices in the array.

10.3.1.3 Number of Devices in the Complex RAID10

You must use at least as many component devices as the number of replicas you specify. However, number of component devices in a RAID10 array does not need to be a multiple of the number of replicas of each data block. The effective storage size is the number of devices divided by the number of replicas.

For example, if you specify 2 replicas for an array created with 5 component devices, a copy of each block is stored on two different devices. The effective storage size for one copy of all data is 5/2 or 2.5 times the size of a component device.

10.3.1.4 Near Layout

With the near layout, copies of a block of data are striped near each other on different component devices. That is, multiple copies of one data block are at similar offsets in different devices. Near is the default layout for RAID10. For example, if you use an odd number of component devices and two copies of data, some copies are perhaps one chunk further into the device.

The near layout for the mdadm RAID10 yields read and write performance similar to RAID 0 over half the number of drives.

Near layout with an even number of disks and two replicas:

sda1 sdb1 sdc1 sde1
  0    0    1    1
  2    2    3    3
  4    4    5    5
  6    6    7    7
  8    8    9    9

Near layout with an odd number of disks and two replicas:

sda1 sdb1 sdc1 sde1 sdf1
  0    0    1    1    2
  2    3    3    4    4
  5    5    6    6    7
  7    8    8    9    9
  10   10   11   11   12

10.3.1.5 Far Layout

The far layout stripes data over the early part of all drives, then stripes a second copy of the data over the later part of all drives, making sure that all copies of a block are on different drives. The second set of values starts halfway through the component drives.

With a far layout, the read performance of the mdadm RAID10 is similar to a RAID 0 over the full number of drives, but write performance is substantially slower than a RAID 0 because there is more seeking of the drive heads. It is best used for read-intensive operations such as for read-only file servers.

The speed of the raid10 for writing is similar to other mirrored RAID types, like raid1 and raid10 using near layout, as the elevator of the file system schedules the writes in a more optimal way than raw writing. Using raid10 in the far layout well-suited for mirrored writing applications.

Far layout with an even number of disks and two replicas:

sda1 sdb1 sdc1 sde1
  0    1    2    3
  4    5    6    7       
  . . .
  3    0    1    2
  7    4    5    6

Far layout with an odd number of disks and two replicas:

sda1 sdb1 sdc1 sde1 sdf1
  0    1    2    3    4
  5    6    7    8    9
  . . .
  4    0    1    2    3
  9    5    6    7    8

10.3.1.6 Offset Layout

The offset layout duplicates stripes so that the multiple copies of a given chunk are laid out on consecutive drives and at consecutive offsets. Effectively, each stripe is duplicated and the copies are offset by one device. This should give similar read characteristics to a far layout if a suitably large chunk size is used, but without as much seeking for writes.

Offset layout with an even number of disks and two replicas:

sda1 sdb1 sdc1 sde1
  0    1    2    3
  3    0    1    2       
  4    5    6    7
  7    4    5    6
  8    9   10   11
 11    8    9   10

Offset layout with an odd number of disks and two replicas:

sda1 sdb1 sdc1 sde1 sdf1
  0    1    2    3    4
  4    0    1    2    3
  5    6    7    8    9
  9    5    6    7    8
 10   11   12   13   14
 14   10   11   12   13

10.3.2 Creating a Complex RAID 10 with mdadm

The RAID10 option for mdadm creates a RAID 10 device without nesting. For information about RAID10, see Section 10.3.1, “Understanding the Complex RAID10”.

The procedure in this section uses the device names shown in the following table. Ensure that you modify the device names with the names of your own devices.

Table 10.6: Scenario for Creating a RAID 10 Using the mdadm RAID10 Option

Raw Devices

RAID10 (near or far striping scheme)

/dev/sdf1

/dev/sdg1

/dev/sdh1

/dev/sdi1

/dev/md3

  1. In YaST, create a 0xFD Linux RAID partition on the devices you want to use in the RAID, such as /dev/sdf1, /dev/sdg1, /dev/sdh1, and /dev/sdi1.

  2. Open a terminal console, then log in as the root user or equivalent.

  3. Create a RAID 10 command. At the command prompt, enter (all on the same line):

    mdadm --create /dev/md3 --run --level=10 --chunk=4 --raid-devices=4 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1
  4. Create a Reiser file system on the RAID 10 device /dev/md3. At the command prompt, enter

    mkfs.reiserfs /dev/md3
  5. Edit the /etc/mdadm.conf file to add entries for the component devices and the RAID device /dev/md3. For example:

    DEVICE /dev/md3
  6. Edit the /etc/fstab file to add an entry for the RAID 10 device /dev/md3.

  7. Reboot the server.

    The RAID10 device is mounted to /raid10.

10.3.3 Creating a Complex RAID10 with the YaST Partitioner

  1. Launch YaST as the root user, then open the Partitioner.

  2. Select Hard Disks to view the available disks, such as sdab, sdc, sdd, and sde.

  3. For each disk that you will use in the software RAID, create a RAID partition on the device. Each partition should be the same size. For a RAID 10 device, you need

    1. Under Hard Disks, select the device, then select the Partitions tab in the right panel.

    2. Click Add to open the Add Partition wizard.

    3. Under New Partition Type, select Primary Partition, then click Next.

    4. For New Partition Size, specify the desired size of the RAID partition on this disk, then click Next.

    5. Under Formatting Options, select Do not format partition, then select 0xFD Linux RAID from the File system ID drop-down list.

    6. Under Mounting Options, select Do not mount partition, then click Finish.

    7. Repeat these steps until you have defined a RAID partition on the disks you want to use in the RAID 10 device.

  4. Create a RAID 10 device:

    1. Select RAID, then select Add RAID in the right panel to open the Add RAID wizard.

    2. Under RAID Type, select RAID 10 (Mirroring and Striping).

    3. In the Available Devices list, select the desired Linux RAID partitions, then click Add to move them to the Selected Devices list.

    4. (Optional) Click Classify, specify the preferred order of the disks in the RAID array.

      For RAID types where the order of added disks matters, you can specify the order in which the devices will be used to ensure that one half of the array resides on one disk subsystem and the other half of the array resides on a different disk subsystem. For example, if one disk subsystem fails, the system keeps running from the second disk subsystem.

      1. Select each disk in turn and click one of the Class X buttons, where X is the letter you want to assign to the disk. Available classes are A, B, C, D and E but for many cases fewer classes are needed (e.g. only A and B). Assign all available RAID disks this way.

        You can press the Ctrl or Shift key to select multiple devices. You can also right-click a selected device and choose the appropriate class from the context menu.

      2. Specify the order the devices by selecting one of the sorting options:

        Sorted:  Sorts all devices of class A before all devices of class B and so on. For example: AABBCC.

        Interleaved:  Sorts devices by the first device of class A, then first device of class B, then all the following classes with assigned devices. Then the second device of class A, the second device of class B, and so on follows. All devices without a class are sorted to the end of devices list. For example, ABCABC.

        Pattern File:  Select an existing file that contains multiple lines, where each is a regular expression and a class name ("sda.* A"). All devices that match the regular expression are assigned to the specified class for that line. The regular expression is matched against the kernel name (/dev/sda1), the udev path name (/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0-part1) and then the udev ID (/dev/disk/by-id/ata-ST3500418AS_9VMN8X8L-part1). The first match made determines the class if a device’s name matches more then one regular expression.

      3. At the bottom of the dialog box, click OK to confirm the order.

    5. Click Next.

    6. Under RAID Options, specify the Chunk Size and Parity Algorithm, then click Next.

      For a RAID 10, the parity options are n (near), f (far), and o (offset). The number indicates the number of replicas of each data block are required. Two is the default. For information, see Section 10.3.1, “Understanding the Complex RAID10”.

    7. Add a file system and mount options to the RAID device, then click Finish.

  5. Select RAID, select the newly created RAID device, then click Used Devices to view its partitions.

  6. Click Next.

  7. Verify the changes to be made, then click Finish to create the RAID.

10.4 Creating a Degraded RAID Array

A degraded array is one in which some devices are missing. Degraded arrays are supported only for RAID 1, RAID 4, RAID 5, and RAID 6. These RAID types are designed to withstand some missing devices as part of their fault-tolerance features. Typically, degraded arrays occur when a device fails. It is possible to create a degraded array on purpose.

RAID Type

Allowable Number of Slots Missing

 

RAID 1

All but one device

 

RAID 4

One slot

 

RAID 5

One slot

 

RAID 6

One or two slots

 

To create a degraded array in which some devices are missing, simply give the word missing in place of a device name. This causes mdadm to leave the corresponding slot in the array empty.

When creating a RAID 5 array, mdadm automatically creates a degraded array with an extra spare drive. This is because building the spare into a degraded array is generally faster than resynchronizing the parity on a non-degraded, but not clean, array. You can override this feature with the --force option.

Creating a degraded array might be useful if you want create a RAID, but one of the devices you want to use already has data on it. In that case, you create a degraded array with other devices, copy data from the in-use device to the RAID that is running in degraded mode, add the device into the RAID, then wait while the RAID is rebuilt so that the data is now across all devices. An example of this process is given in the following procedure:

  1. Create a degraded RAID 1 device /dev/md0, using one single drive /dev/sd1, enter the following at the command prompt:

    mdadm --create /dev/md0 -l 1 -n 2 /dev/sda1 missing

    The device should be the same size or larger than the device you plan to add to it.

  2. If the device you want to add to the mirror contains data that you want to move to the RAID array, copy it now to the RAID array while it is running in degraded mode.

  3. Add a device to the mirror. For example, to add /dev/sdb1 to the RAID, enter the following at the command prompt:

    mdadm /dev/md0 -a /dev/sdb1

    You can add only one device at a time. You must wait for the kernel to build the mirror and bring it fully online before you add another mirror.

  4. Monitor the build progress by entering the following at the command prompt:

    cat /proc/mdstat

    To see the rebuild progress while being refreshed every second, enter

    watch -n 1 cat /proc/mdstat