13 Tuning I/O Performance #
I/O scheduling controls how input/output operations will be submitted to
storage. SUSE Linux Enterprise Server offers various I/O algorithms—called
elevators
—suiting different workloads.
Elevators can help to reduce seek operations and can prioritize I/O requests.
Choosing the best suited I/O elevator not only depends on the workload, but on the hardware, too. Single ATA disk systems, SSDs, RAID arrays, or network storage systems, for example, each require different tuning strategies.
13.1 Switching I/O Scheduling #
SUSE Linux Enterprise Server picks a default I/O scheduler at boot-time, which can be changed on the fly per block device. This makes it possible to set different algorithms, for example, for the device hosting the system partition and the device hosting a database.
The default I/O scheduler is chosen for each device based on whether the
device reports to be rotational disk or not. For rotational disks, the
BFQ
I/O scheduler is picked.
Other devices default to MQ-DEADLINE
or NONE
.
To change the elevator for a specific device in the running system, run the following command:
>
sudo
echo SCHEDULER > /sys/block/DEVICE/queue/scheduler
Here, SCHEDULER is one of
bfq
, none
,
kyber
, or mq-deadline
.
DEVICE is the block device
(sda
for example). Note that this change will not
persist during reboot. For permanent I/O scheduler change for a particular
device, copy /usr/lib/udev/rules.d/60-io-scheduler.rules
to
/etc/udev/rules.d/60-io-scheduler.rules
, and edit
the latter file to suit your needs.
On IBM Z, the default I/O scheduler for a storage device is set by the device driver.
elevator
boot parameter removed
The elevator
boot parameter has been removed. The blk-mq I/O path replaces cfq, and does not include the
elevator
boot parameter.
13.2 Available I/O Elevators with blk-mq I/O path #
Below is a list of elevators available on SUSE Linux Enterprise Server for devices that use the blk-mq I/O path. If an elevator has tunable parameters, they can be set with the command:
echo VALUE > /sys/block/DEVICE/queue/iosched/TUNABLE
In the command above, VALUE is the desired value for the TUNABLE and DEVICE is the block device.
To find out what elevators are available for a device
(sda
for example), run the following
command (the currently selected scheduler is listed in brackets):
>
cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none
When switching from legacy block to blk-mq I/O path for a device,
the none
option is roughly comparable to
noop
, mq-deadline
is comparable
to deadline
, and bfq
is
comparable to cfq
.
13.2.1 MQ-DEADLINE
#
MQ-DEADLINE
is a
latency-oriented I/O scheduler. MQ-DEADLINE
has the following
tunable parameters:
MQ-DEADLINE
tunable parameters #File | Description |
---|---|
| Controls how many times reads are preferred
over writes. A value of Default is |
| Sets the deadline (current time plus the
Default is |
| Sets the deadline (current time plus the
Default is |
| Enables (1) or disables (0) attempts to front merge requests. Default is |
| Sets the maximum number of requests per batch
(deadline expiration is only checked for batches). This
parameter allows to balance between latency and
throughput. When set to Default is |
13.2.2 NONE
#
When NONE
is selected
as I/O elevator option for blk-mq, no I/O scheduler
is used, and I/O requests are passed down to the
device without further I/O scheduling interaction.
NONE
is the default for
NVM Express devices. With no overhead compared to other I/O
elevator options, it is considered the fastest way of passing down
I/O requests on multiple queues to such devices.
There are no tunable parameters for NONE
.
13.2.3 BFQ
(Budget Fair Queueing) #
BFQ
is a
fairness-oriented scheduler. It is described as "a
proportional-share storage-I/O scheduling algorithm based on the
slice-by-slice service scheme of CFQ. But BFQ assigns budgets,
measured in number of sectors, to processes instead of time
slices." (Source:
linux-4.12/block/bfq-iosched.c)
BFQ
allows to assign
I/O priorities to tasks which are taken into account during
scheduling decisions (see Section 9.3.3, “Prioritizing Disk Access with ionice
”).
BFQ
scheduler has the
following tunable parameters:
BFQ
tunable parameters #File | Description |
---|---|
| Value in milliseconds specifies how long to idle, waiting for next request on an empty queue. Default is |
| Same as Default is |
| Enables (1) or disables (0) Default is |
| Maximum value (in Kbytes) for backward seeking. Default is |
| Used to compute the cost of backward seeking. Default is |
| Value (in milliseconds) is used to set the timeout of asynchronous requests. Default is |
| Value in milliseconds specifies the timeout of synchronous requests. Default is |
| Maximum time in milliseconds that a task (queue) is serviced after it has been selected. Default is |
| Limit for number of sectors that are served
at maximum within Default is |
| Enables (1) or disables (0) Default is |
13.2.4 KYBER
#
KYBER
is a
latency-oriented I/O scheduler. It makes it possible to set target latencies
for reads and synchronous writes and throttles I/O requests in
order to try to meet these target latencies.
KYBER
tunable parameters #File | Description |
---|---|
| Sets the target latency for read operations in nanoseconds. Default is |
| Sets the target latency for write operations in nanoseconds. Default is |
13.3 I/O Barrier Tuning #
Some file systems (for example, Ext3 or Ext4) send write barriers to disk after fsync or during transaction commits. Write barriers enforce proper ordering of writes, making volatile disk write caches safe to use (at some performance penalty). If your disks are battery-backed in one way or another, disabling barriers can safely improve performance.
nobarrier
is deprecated in XFS
Note that the nobarrier
option has been completely deprecated
for XFS, and it is not a valid mount option in SUSE Linux Enterprise 15 SP2 and upward. Any
XFS mount command that explicitly specifies the flag will fail to mount the
file system. To prevent this from happening, make sure that no scripts and
fstab entries contain the nobarrier
option.
Sending write barriers can be disabled using the
nobarrier
mount option.
Disabling barriers when disks cannot guarantee caches are properly written in case of power failure can lead to severe file system corruption and data loss.