3 Shielding with systemd #
systemd has native support for the cpuset controller since SUSE Linux Enterprise Real Time 15 SP4. Shielding the sensitive workload can be achieved with the proper configuration of respective units. This is only supported with cgroup unified hierarchy (v2) and hence the shielded vs. unshielded division copies the structure of typical systemd cgroup tree.
3.1 Setup of the shield #
The general idea is to have one cpuset for the main sensitive workload
and a complementary cpuset for the supporting tasks.
Resources are distributed in the top-down fashion, so to ensure proper
allocation for the main workload we must take into consideration all the
top-level cgroups on the system.
systemd by default creates the following units:
init.scope
,
system.slice
,
user.slice
, and
machine.slice
.
We must configure all of these units not to stand in the way of our main workload. For instance with following drop-in file(s):
root #
cat /etc/systemd/system/init.scope.d/40-shielding.conf
[Scope]
AllowedCPUs=0-1
root #
cat /etc/systemd/system/system.slice.d/40-shielding.conf
[Slice]
AllowedCPUs=0-1
This way we constrain the supporting system workload just to the first two CPUs.
Finally, we create a dedicated slice for our sensitive workload with all the remaining system CPUs:
root #
cat /etc/systemd/system/workload.slice
[Slice]
AllowedCPUs=2-15
The setup can also be changed at runtime (for debugging reasons):
root #
systemctl set-property --runtime workload.slice AllowedCPUs=4-15root #
systemctl set-property --runtime init.scope AllowedCPUs=0-3root #
systemctl set-property --runtime system.slice AllowedCPUs=0-3
3.2 Running jobs in the shield #
When the workload.slice
is prepared according to the previous section, running the sensitive jobs is as simple as configuring their service into that slice.
root #
cat /etc/systemd/system/sensitive.service.d/40-shielding.conf
[Service]
Slice=workload.slice
Beware that the Slice=
directive only takes effect upon service (re)start.
Should not the sensitive job have a form of a service but an ad-hoc command, you may start it in a systemd scope:
root #
systemd-run --scope -p Slice=workload.slice command arg1 ...
Existing processes cannot be moved under the shield since that would involve process migration between cgroups which would cause distortion of the accounting state. But sensitive workload should start with their resources secured in advance anyway.