SUSE Linux Enterprise Real Time 15 SP5

Setup Guide #

Publication Date: 01/16/2025

SUSE Linux Enterprise Real Time is part of SUSE® Linux Enterprise family. It allows you to run tasks which require deterministic real-time processing in a SUSE Linux Enterprise environment.

To meet this requirement, SUSE Linux Enterprise Real Time offers several options for CPU and I/O scheduling, CPU shielding, and for setting CPU affinities of processes.

Revision History: SUSE Linux Enterprise Real Time Documentation

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”.

For SUSE trademarks, see https://www.suse.com/company/legal/. All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.

All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof.

1 Product overview #

If your business can respond more quickly to new information and changing market conditions, you have a distinct advantage over those that cannot. Running your time-sensitive mission-critical applications using SUSE Linux Enterprise Real Time reduces process dispatch latencies and gives you the time advantage you need to increase profits, or avoid further financial losses, ahead of your competitors.

1.1 Key features #

Some of the key features for SUSE Linux Enterprise Real Time are:

Pre-emptible real-time kernel.
Ability to assign high-priority processes.
Greater predictability to complete critical processes on time, every time.
In comparison to the normal Linux kernel, which is optimized for overall system performance regardless of individual process response time, the SUSE Linux Enterprise Real Time kernel is tuned toward predictable process response time.
Increased reliability.
Lower infrastructure costs.
Tracing and debugging tools that help you analyze and identify bottlenecks in mission-critical applications.

1.2 Specific scenario #

SUSE Linux Enterprise Real Time 15 SP5 supports virtualization and Docker usage as a Technology Preview (best-effort support). For reference, see Virtualization Guide.

2 Installing SUSE Linux Enterprise Real Time #

Keep the following points in mind:

Boot from the quarterly update medium. SUSE Linux Enterprise Real Time is only available from the quarterly update medium as this product is released roughly three months later than the rest of the SLE.
Refer to the Installation Quick Start of SUSE Linux Enterprise Server 15 SP5 https://documentation.suse.com/sles/15-SP4/#redirectmsg to install SUSE Linux Enterprise Real Time.
In the Language, keyboard, and product selection, select the entry SUSE Linux Enterprise Real Time.

To install SUSE Linux Enterprise Real Time 15 SP5, proceed as follows:

Start the normal SUSE Linux Enterprise 15 SP5.
On the boot command line, add start_shell=1.

When the prompt shows up, open the file control.xml and add the following lines:

<base_product>
  <display_name>SUSE Linux Enterprise Real Time 15 SP5</display_name>
  <name>SLE_RT</name>
  <version>15.5</version>
  <register_target>sle-15-$arch</register_target>
  <archs>x86_64</archs>
</base_product>

Save the file, exit the editor, and restart YaST
Select SUSE Linux Enterprise Real Time 15 SP5 from the product selection list.
Continue with the normal installation. The rest is the same of the other product installation.
Reboot and select the real time kernel.

The following sections provide a brief introduction to the tools and possibilities of SUSE Linux Enterprise Real Time.

3 Managing CPU sets with `cset` #

In some circumstances, it is beneficial to be able to run specific tasks only on defined CPUs. For this reason, the Linux kernel provides a feature called cpuset. The cpuset feature provides the means to do so-called “soft partitioning” of the system. This enables you to dedicate CPUs, together with some predefined memory, to work on particular tasks.

Note: CPUs, processors, and cores

Modern servers are typically built around multi-core CPUs, which means that a single processor socket typically contains many separate processor units. For example, a low-end processor might have four cores, while a high-end one may have from tens to hundreds of cores.

Secondarily, some vendors support simultaneous multithreading (SMT), which enables a single core to support two or more execution threads which can be partially overlapped. The processor makes this visible to the operating system by reporting each threads as an additional core.

The cpuset feature works on the level of logical processors: individual cores or SMT units, not processor sockets. When this document refers to “a CPU”, this denotes a logical processor.

cset consists of one “super command” called shield and the “regular commands” set and proc. The purpose of the super command shield is to create a common CPU shielding setup within one step by combining regular commands.

For more information about the options and parameters of the shield subcommand, view its help by running:

cset help shield

3.1 Setting up a CPU shield for a single CPU #

The command cset provides the high level functionality to set up and manipulate CPU Sets. An example for setting up a CPU shield is:

cset shield --cpu=3

On a machine with four CPUs, this will shield CPU #3. CPUs #0-2 are unshielded.

3.2 Setting up CPU shields for multiple CPUs #

If you need to shield more than one CPU, the argument of the --cpu option accepts comma-separated lists of CPUs, including range specifications:

cset shield --cpu=1,3,5-7

On a machine with eight CPUs, this command will shield CPUs #1, #3, and #5-7. CPUs #0, #2, and #4 will remain unshielded.

Existing CPU shields can be extended by the same command. For example, to add CPU #4 to the CPU set described above, use this command:

cset shield --cpu=1,3-7

This command updates the current CPU shield schema. CPUs #1, #3, and #5-6 were already shielded. Afterward, CPU #4 will also be shielded.

To reduce the number of shielded CPUs, redefine the scheme so as to exclude the CPUs you wish to unshield. For example, to unshield CPU #1, use the following command:

cset shield --cpu=3-7

Now only CPUs #3-7 are shielded. CPUs #0-2 are available for system usage.

3.3 Showing CPU shields #

After the CPU shielding is set up you can display the current configuration by running cset shield without additional options:

cset shield
cset: --> shielding system active with
cset: "system" cpuset of: 0-2 cpu, with: 47
cset: "user" cpuset of:  3-7 cpu,  with: 0

By default, CPU shielding consists of at least of three cpusets:

root exists always and contains all available CPUs.
system is the cpuset of unshielded CPUs.
user is the cpuset of shielded CPUs

3.4 Shielding processes #

After a shielded CPU set is created, certain processes or groups of processes can be assigned to the shielded cpuset. To start a new process in the shielded CPU set, use the --exec option:

cset shield --exec APPLICATION

To move already-running processes to the shielded CPU set, use the --shield and --pid options. The --pid option accepts a comma-separated list of PIDs and range specifications:

cset shield --shield --pid=1,2,600-700

This moves processes with PID 1, 2, and from 600 to 700 to the shielded CPU set. If there is a gap in the range from 600 to 700, then only those available process will be moved to the shield without warning. The cset command handles threads like processes and will also interpret TIDs and assign them to the required CPU set.

Warning

The --shield option does not check the processes you request to move into the shield. This means that the command will move any processes that are bound to specific CPUs—even kernel threads. You can cause a complete system lockup by indiscriminately specifying arbitrary PIDs to the --shield command.

3.5 Showing shielded processes #

Use the cset shield command to show the number of currently shielded processes. (The same command can be used to show the current CPU shield setup.) To list shielded and unshielded processes, add the --verbose option:

cset shield --verbose
cset: --> shielding system active with
cset: "system" cpuset of: 0-2,4-15 cpu, with:
   USER       PID  PPID S TASK NAME
      -------- ----- ----- - ---------
         root         1     0 S init [3]
[...]

cset: "user" cpuset of:    3 cpu, with: 1
   USER       PID  PPID S TASK NAME
      -------- ----- ----- - ---------
         root     10202 10170 S application

3.6 Unshielding processes #

To remove a process (or group of processes) from the CPU shield, use the --unshield option. The argument for --unshield is similar to the --shield option. This option accepts a comma-separated list of PIDs/TIDs and range specifications:

cset shield --unshield --pid=2,650-655

This command will unshield the process with the PID 2 and the processes in the range between 650 and 655.

3.7 Resetting CPU sets #

To delete CPU sets, use the cset option --reset. This will unshield all CPUs and migrate dedicated processes to all available CPUs again.

4 Managing tree-like structures with `cset` #

More detailed configuration of cpusets can be done with the cset commands set and proc.

The subcommand set is used to create, modify and destroy cpusets. Compared to the supercommand shield, the set subcommand can additionally assign memory nodes for NUMA machines.

Besides assigning memory nodes, the subcommand set creates cpusets in a tree-like structure, rooted at the root cpuset.

To create a cpuset with the subcommand set you need to specify the CPUs which should be used. Either use a comma-separated list or a range specification:

cset set --cpu=1-7 "/one"

This command will create a cpuset called one with assigned CPUs from #1 to #7. To specify a new cpuset called two that is a subset of one, proceed as follows:

cset set --cpu=6 "/one/two"

Cpusets follow certain rules. Children can only include CPUs that the parents already have. If you try to specify a different cpuset, the kernel cpuset subsystem will not let you create that cpuset. For example, if you create a cpuset that contains CPU3, and then attempt to create a child of that cpuset with a CPU other than 3, you will get an error, and the cpuset will not be created. The resulting error is somewhat cryptic and is usually “Permission denied”.

To show a table containing useful information, such as CPU lists and memory lists, use the -r parameter. The “-X” column shows the exclusive state of CPU or memory. The “path” column shows the real path in the virtual cpuset file system.

cset set -r

On NUMA machines, memory nodes can be assigned to a cpuset similar to CPUs. The --mem option of the subcommand set allows a comma-separated and inclusive range specification of memory nodes. This example will assign MEM1, MEM3, MEM4, MEM5 and MEM6 to the cpuset new_set:

cset set --mem=1,3-6 new_set

Additionally, with the --cpu_exclusive and --mem_exclusive options (without any additional arguments) set the CPUs or memory nodes exclusive to a cpuset:

cset set --cpu_exclusive "/one"

The status of exclusive state of CPU or memory is shown in the -X column when running:

cset set -r

For more detailed information about options and parameters of the subcommand set, view the cset help:

cset help set

After the cpuset is initialized, the subcommand proc can start processes on certain cpusets with the --exec option. The following will start the application fastapp within the cpuset new_set:

cset proc --exec --set new_set fastapp

To move an already-running process inside an already-existing cpuset, use the option --move. It accepts a comma-separated list and range specifications of PIDs. The following command will move processes with PID 2442 and within the range between 3000 and 3200 into the cpuset new_set:

cset proc --move 2442,3000-3200 new_set

Listing processes running within a specific cpuset can be done by using the option --list.

cset proc --list new_set

The subcommand proc can also move the entire list of processes within one cpuset to another cpuset by using the option --fromset and --toset. This will move all process assigned to old_set and assign them to new_set:

cset proc --move --fromset old_set \
   --toset new_set

For more detailed information about options and parameters of the subcommand proc, view the help:

cset help proc

5 Setting Real-time Attributes of a Process with `chrt` #

Use the chrt command to manipulate the real-time attributes of an already-running process (such as scheduling policy and priority), or to execute a new process with specified real-time attributes.

It is highly recommended for applications which do not use real-time specific attributes by themselves, but should nevertheless experience the full advantages of real-time. To get full real-time experiences, call these applications with the chrt command and the right set of scheduler policy and priority parameters.

With the following command, all running processes with their real-time specific attributes are shown. The selection class shows the current scheduler policy and rtprio the real-time priority:

ps -eo pid,tid,class,rtprio,comm
...
 1437  1437 FF      40  fastapp

The truncated example above shows the fastapp process with PID 1437 running and with scheduler policy SCHED_FIFO and priority 40. Scheduler policy abbreviations are:

TS - SCHED_OTHER
FF - SCHED_FIFO
RR - SCHED_RR

It is also possible to obtain the current scheduler policy, and the priority of single processes, by specifying the PID of the process with the -p parameter. For example:

chrt -p 1437

Scheduler policies have different minimum and maximum priority values. Minimum and maximum values for each available scheduler policy can be retrieved with chrt:

chrt -m

To change the scheduler policy and the priority of a running process, chrt provides the options --fifo for SCHED_FIFO, --rr for SCHED_RR and --other for SCHED_OTHER. The following example will change the scheduler policy to SCHED_FIFO with priority 42 for PID 1437:

chrt --fifo -p 42 1437

Warning

Handle the changing of real-time attributes of processes with care. Increasing the priority of certain processes can harm the entire system, depending on the behavior of the process. In some cases, this can lead to a complete system lockup or bad influences on certain devices.

For more information about chrt, see the chrt man page with man 1 chrt.

6 Specifying CPU affinity with `taskset` #

The default behavior of the kernel is to keep a process running on the same CPU if the system load is balanced over the available CPUs. Otherwise, the kernel tries to improve the load balancing by moving processes to an idling CPU. In some situations, however, it is desirable to set a CPU affinity for a given process. In this case, the kernel will not move the process away from the selected CPUs. For example, if you use shielding, the shielded CPUs will not run any processes that do not have an affinity to the shielded CPUs. Another possibility to remove load from the other CPUs is to run all low priority tasks on a selected CPU.

If a task is running inside a specific cpuset, the affinity dialog must match at least one of the CPUs available in this set. The taskset command will not move a process outside the cpuset it is running in.

To set or retrieve the CPU affinity of a task, use a bitmask. This mask is represented by a hexadecimal number. If you count the bits of this bitmask, the lowest bit represents the first logical CPU as found in /proc/cpuinfo. For example:

0x00000001 means CPU 0.
0x00000002 means CPU 1.
0x00000003 means CPUs 0 and 1.
0xFFFFFFFE means all but the first CPU.

If a given dialog does not contain any valid CPU on the system, the taskset command will return an error. If taskset returns without an error, the given program has been scheduled to the specified list of CPUs.

The command taskset starts a new process with a given CPU affinity, or to redefine the CPU affinity of an already running process.

Examples #

taskset -p PID: Retrieves the current CPU affinity of the process with PID pid.
taskset -p maskPID: Sets the CPU affinity of the process with the PID to mask.
taskset maskcommand: Runs command with a CPU affinity of mask.

For more detailed information about taskset, see the man page man 1 taskset.

7 Changing I/O priorities with `ionice` #

Handling I/O is one of the critical issues for all high-performance systems. If a task has lots of CPU power available, but must wait for the disk, it will not work as efficiently as it could. The Linux kernel provides three different scheduling classes to determine the I/O handling for a process. All of these classes can be fine-tuned with a nice level.

The Best Effort scheduler

The Best Effort scheduler is the default I/O scheduler, and is used for all processes that do not specify a different I/O scheduler class. By default, this scheduler sets its nice level according to the nice value of the running process.

There are eight different nice levels available for this scheduler. The lowest priority is represented by a nice level of 7, and the highest priority is 0.

This scheduler has the scheduling class number 2.

The Real Time scheduler

The real-time I/O class always gets the highest priority for disk access. The other schedulers will only be served if no real-time request is present. This scheduling class may easily lock up the system if not implemented with care.

The real-time scheduler defines nice levels (similar to the Best Effort scheduler).

This scheduler has the scheduling class number 1.

The Idle scheduler

The Idle scheduler does not define any nice levels. I/O is only done in this class if no other scheduler is running an I/O request. This scheduler has the lowest available priority and can be used for processes that are not time-critical.

This scheduler has the scheduling class number 3.

To change I/O schedulers and nice values, use the ionice command. This provides a means to tune the scheduler of already-running processes, or to start new processes with specific I/O settings.

Examples #

ionice -c3 -p$$: Sets the scheduler of the current shell to Idle.
ionice: Without additional parameters, this prints the I/O scheduler settings of the current shell.
ionice -c1 -p42 -n2: Sets the scheduler of the process with process ID 42 to Real Time, and its nice value to 2.
ionice -c3 /bin/bash: Starts the Bash shell with the Idle I/O scheduler.

For more detailed information about ionice, see the ionice man page with man 1 ionice

8 Changing the I/O scheduler for block devices #

The Linux kernel provides several block device schedulers that can be selected individually for each block device. All but the noop scheduler perform a kind of ordering of requested blocks to reduce head movements on the hard disk. If you use an external storage system that has its own scheduler, you should disable the Linux internal reordering by selecting the noop scheduler.

The Linux I/O schedulers #

noop

The noop scheduler is a very simple scheduler that performs basic merging and sorting on I/O requests. This scheduler is mainly used for specialized environments that run their own schedulers optimized for the used hardware, such as storage systems or hardware RAID controllers.

deadline

The main point of deadline scheduling is to try hard to answer a request before a given deadline. This results in very good I/O for a random single I/O in real-time environments.

In principle, the deadline scheduler uses two lists with all requests. One is sorted by block sequences to reduce seeking latencies, the other is sorted by expire times for each request. Normally, requests are served according to the block sequence, but if a request reaches its deadline, the scheduler starts to work on this request.

cfq

The Completely Fair Queuing scheduler uses a separate I/O queue for each process. All of these queues get a similar time slice for disk access. With this procedure, the CFQ tries to divide the bandwidth evenly between all requesting processes. This scheduler allows throughput similar to the anticipatory scheduler, but the maximum latency is much shorter.

For the average system, this scheduler yields the best results, and thus it is the default I/O scheduler on SUSE Linux Enterprise systems.

To print the current scheduler of a block device such as /dev/sda, use the following command:

cat /sys/block/sda/queue/scheduler
noop deadline [cfq]

In this case, the scheduler for /dev/sda is set to cfq, the Completely Fair Queuing scheduler. This is the default scheduler on SUSE Linux Enterprise Real Time.

To change the schedulers, echo one of the names noop, deadline, or cfq into /sys/block/<device>/scheduler. For example, if you want to set the I/O scheduler of the device /dev/sda to noop, use the following command:

echo "noop" > /sys/block/sda/queue/scheduler

To set other variables in the /sys file system, use a similar approach.

9 Tuning the block device I/O scheduler #

All schedulers, except for the noop scheduler, have several common parameters that may be tuned for each block device. You can access these parameters with sysfs in the /sys/block/<device>/queue/iosched/ directory. The following parameters are tuneable for the respective scheduler:

Anticipatory scheduler

read_batch_expire: If write requests are scheduled, this is the time in milliseconds that reads are served before pending writes get a time slice. If writes are more important than reads, set this value lower than read_expire.
write_batch_expire: Similar to read_batch_expire for write requests.

Deadline scheduler

read_expire: The main focus of this scheduler is to limit the start latency for a request to a given time. Therefore, for each request, a deadline is calculated from the current time plus the value of read_expire in milliseconds.
write_expire: Similar to read_expire for write requests.
fifo_batch: If a request hits its deadline, it is necessary to move the request from the sorted I/O scheduler list to the dispatch queue. The variable fifo_batch controls how many requests are moved, depending on the cost of each request.
front_merges: The scheduler normally tries to find contiguous I/O requests and merges them. There are two kinds of merges: The new I/O request may be in front of the existing I/O request (front merge), or it may follow behind the existing request (back merge). Most merges are back merges. Therefore, you can disable the front merge functionality by setting front_merges to 0.
write_starved: In case some read or write requests hit their deadline, the scheduler prefers the read requests by default. To prevent write requests from being postponed forever, the variable write_starved controls how often read requests are preferred until write requests are preferred over read requests.

CFQ Scheduler

back_seek_max and back_seek_penalty

The CFQ scheduler normally uses a strict ascending elevator. When needed, it also allows small backward seeks, but it puts some penalty on them. The maximum backward sector seek is defined with back_seek_max, and the multiplier for the penalty is set by back_seek_penalty.

fifo_expire_async and fifo_expire_sync

The fifo_expire_* variables define the timeout in milliseconds for asynchronous and synchronous I/O requests. To prefer synchronous operations over asynchronous ones, fifo_expire_sync value should be lower than fifo_expire_async.

quantum

Defines number of I/O requests to be dispatched at once by the block device. This parameter is used for synchronous requests.

slice_async, slice_async_rq, slice_sync, and slice_idle

These variables define the time slices a block device gets for synchronous or asynchronous operations.

slice_async and slice_sync serve as a base value in milliseconds for asynchronous or synchronous disk slice length calculations.
slice_async_rq for how many requests can an asynchronous disk slice accommodate.
slice_idle defines how long I/O scheduler idles before servicing next thread.

The system default Block Device I/O Scheduler can be also set by the kernel parameter elevator=. For example, elevator=deadline changes the I/O Scheduler to deadline.

10 More information #

A lot of information about real-time implementations and administration can be found on the Internet. The following list contains several selected links:

More detailed information about the real-time Linux development and an introduction how to write a real-time application can be found in the real-time Linux community Wiki. https://rt.wiki.kernel.org, https://rt.wiki.kernel.org/index.php/HOWTO:_Build_an_RT-application
The cpuset feature of the kernel is explained in /usr/src/linux/Documentation/cgroups/cpusets.txt. More detailed documentation is available from https://lwn.net/Articles/127936/. -->
For more information about the deadline I/O scheduler, refer to https://en.wikipedia.org/wiki/Deadline_scheduler. In your installed system, find further information in /usr/src/linux/Documentation/block/deadline-iosched.txt.
The CFQ I/O scheduler is covered in detail in https://en.wikipedia.org/wiki/CFQ and /usr/src/linux/Documentation/block/cfq-iosched.txt.