Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE Linux Enterprise Server 11 SP4

10 Kernel Control Groups

Kernel Control Groups (abbreviated known as cgroups) are a kernel feature that allows aggregating or partitioning tasks (processes) and all their children into hierarchical organized groups. These hierarchical groups can be configured to show a specialized behavior that helps with tuning the system to make best use of available hardware and network resources.

10.1 Technical Overview and Definitions

The following terms are used in this chapter:

  • cgroup is another name for Control Groups.

  • In a cgroup there is a set of tasks (processes) associated with a set of subsystems that act as parameters constituting an environment for the tasks.

  • Subsystems provide the parameters that can be assigned and define CPU sets, freezer, or—more general—resource controllers for memory, disk I/O, network traffic, etc.

  • cgroups are organized in a tree-structured hierarchy. There can be more than one hierarchy in the system. You use a different or alternate hierarchy to cope with specific situations.

  • Every task running in the system is in exactly one of the cgroups in the hierarchy.

10.2 Scenario

See the following resource planning scenario for a better understanding (source: /usr/src/linux/Documentation/cgroups/cgroups.txt):

Resource Planning
Figure 10.1: Resource Planning

Web browsers such as Firefox will be part of the Web network class, while the NFS daemons such as (k)nfsd will be part of the NFS network class. On the other side, Firefox will share appropriate CPU and memory classes depending on whether a professor or student started it.

10.3 Control Group Subsystems

The following subsystems are available and can be classified as two types:

Isolation and Special Controllers

cpuset, freezer, devices, checkpoint/restart

Resource Controllers

cpu (scheduler), cpuacct, memory, disk I/O, network

Either mount each subsystem separately:

mount -t cgroup -o cpu none /cpu
mount -t cgroup -o cpuset none /cpuset

or all subsystems in one go; you can use an arbitrary device name (e.g., none), which will appear in /proc/mounts:

mount -t cgroup none /sys/fs/cgroup

Some additional information on available subsystems:

Cpuset (Isolation)

Use cpuset to tie processes to system subsets of CPUs and memory (memory nodes). For an example, see Section 10.4.3, “Example: Cpusets”.

Freezer (Control)

The Freezer subsystem is useful for high-performance computing clusters (HPC clusters). Use it to freeze (stop) all tasks in a group or to stop tasks, if they reach a defined checkpoint. For more information, see /usr/src/linux/Documentation/cgroups/freezer-subsystem.txt.

Here are basic commands to use the freezer subsystem:

mount -t cgroup -o freezer freezer /freezer
# Create a child cgroup:
mkdir /freezer/0
# Put a task into this cgroup:
echo $task_pid > /freezer/0/tasks
# Freeze it:
echo FROZEN > /freezer/0/freezer.state
# Unfreeze (thaw) it:
echo THAWED > /freezer/0/freezer.state
Checkpoint/Restart (Control)

Save the state of all processes in a cgroup to a dump file. Restart it later (or just save the state and continue).

Move a saved container between physical machines (as VM can do).

Dump all process images of a cgroup to a file.

Devices (Isolation)

A system administrator can provide a list of devices that can be accessed by processes under cgroups.

It limits access to a device or a file system on a device to only tasks that belong to the specified cgroup. For more information, see /usr/src/linux/Documentation/cgroups/devices.txt.

Cpuacct (Control)

The CPU accounting controller groups tasks using cgroups and accounts the CPU usage of these groups. For more information, see /usr/src/linux/Documentation/cgroups/cpuacct.txt.

CPU (Resource Control)

Share CPU bandwidth between groups with the group scheduling function of CFS (the scheduler). Mechanically complicated.

Memory (Resource Control)
  • Limits memory usage of user space processes.

  • Control swap usage by setting swapaccount=1 as a kernel boot parameter.

  • Limit LRU (Least Recently Used) pages.

  • Anonymous and file cache.

  • No limits for kernel memory.

  • Maybe in another subsystem if needed.

For more information, see /usr/src/linux/Documentation/cgroups/memory.txt.

Blkio (Resource Control)

The blkio (Block IO) controller is now available as a disk I/O controller. With the blkio controller you can currently set policies for proportional bandwidth and for throttling.

These are the basic commands to configure proportional weight division of bandwidth by setting weight values in blkio.weight:

# Setup in /sys/fs/cgroup
mkdir /sys/fs/cgroup/blkio
mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
# Start two cgroups
mkdir -p /sys/fs/cgroup/blkio/group1 /sys/fs/cgroup/blkio/group2
# Set weights
echo 1000 > /sys/fs/cgroup/blkio/group1/blkio.weight
echo  500 > /sys/fs/cgroup/blkio/group2/blkio.weight
# Write the PIDs of the processes to be controlled to the
# appropriate groups
command1 &
echo $! > /sys/fs/cgroup/blkio/group1/tasks

command2 &
echo $! > /sys/fs/cgroup/blkio/group2/tasks

These are the basic commands to configure throttling or upper limit policy by setting values in blkio.throttle.read_bps_device for reads and blkio.throttle.write_bps_device for writes:

# Setup in /sys/fs/cgroup
mkdir /sys/fs/cgroup/blkio
mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
# Bandwidth rate of a device for the root group; format:
# <major>:<minor>  <byes_per_second>
echo "8:16  1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device

For more information about caveats, usage scenarios, and additional parameters, see /usr/src/linux/Documentation/cgroups/blkio-controller.txt.

Network Traffic (Resource Control)

With cgroup_tc, a network traffic controller is available. It can be used to manage traffic that is associated with the tasks in a cgroup. Additionally, cls_flow can classify packets based on the tc_classid field in the packet.

For example, to limit the traffic from all tasks from a file_server cgroup to 100 Mbps, proceed as follows:

# create a file_transfer cgroup and assign it a unique classid
# of 0x10 - this will be used later to direct packets.
mkdir -p /dev/cgroup
mount -t cgroup tc -otc /dev/cgroup
mkdir /dev/cgroup/file_transfer
echo 0x10 > /dev/cgroup/file_transfer/tc.classid
echo $PID_OF_FILE_XFER_PROCESS > /dev/cgroup/file_transfer/tasks

# Now create an HTB class that rate-limits traffic to 100 mbits and attach
# a filter to direct all traffic from the file_transfer cgroup
# to this new class.
tc qdisc add dev eth0 root handle 1: htb
tc class add dev eth0 parent 1: classid 1:10 htb rate 100mbit ceil 100mbit
tc filter add dev eth0 parent 1: handle 800 protocol ip prio 1 \
  flow map key cgroup-classid baseclass 1:10

This example is taken from https://lwn.net/Articles/291161/, where you can find more information about this feature.

10.4 Using Controller Groups

10.4.1 Prerequisites

To conveniently use cgroups, install the following additional packages:

  • libcgroup1 — basic user space tools to simplify resource management

  • cpuset — contains the cset to manipulate cpusets

  • libcpuset1 — C API to cpusets

  • kernel-source — only needed for documentation purposes

  • lxc — Linux container implementation

10.4.2 Checking the Environment

The kernel shipped with SUSE Linux Enterprise Server supports cgroups. There is no need to apply additional patches. Execute lxc-checkconfig to see a cgroups environment similar to the following output:

--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled
Multiple /dev/pts instances: enabled

--- Control groups ---
Cgroup: enabled
Cgroup namespace: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled

--- Misc ---
Veth pair device: enabled
Macvlan: enabled
Vlan: enabled
File capabilities: enabled

To find out which subsystems are available, proceed as follows:

mkdir /cgroups
mount -t cgroup none /cgroups
grep cgroup /proc/mounts

The following subsystems are available: perf_event, blkio, net_cls, freezer, devices, memory, cpuacct, cpu, cpuset.

10.4.3 Example: Cpusets

With the command line proceed as follows:

  1. To determine the number of CPUs and memory nodes see /proc/cpuinfo and /proc/zoneinfo.

  2. Create the cpuset hierarchy as a virtual file system (source: /usr/src/linux/Documentation/cgroups/cpusets.txt):

    mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
    cd /sys/fs/cgroup/cpuset
    mkdir Charlie
    cd Charlie
    # List of CPUs in this cpuset:
    echo 2-3 > cpuset.cpus
    # List of memory nodes in this cpuset:
    echo 1 > cpuset.mems
    echo $$ > tasks
    # The subshell 'sh' is now running in cpuset Charlie
    # The next line should display '/Charlie'
    cat /proc/self/cpuset
  3. Remove the cpuset using shell commands:

    rmdir /sys/fs/cgroup/cpuset/Charlie

    This fails as long as this cpuset is in use. First, you must remove the inside cpusets or tasks (processes) that belong to it. Check it with:

    cat /sys/fs/cgroup/cpuset/Charlie/tasks

For background information and additional configuration flags, see /usr/src/linux/Documentation/cgroups/cpusets.txt.

With the cset tool, proceed as follows:

# Determine the number of CPUs and memory nodes
cset set --list
# Creating the cpuset hierarchy
cset set --cpu=2-3 --mem=1 --set=Charlie
# Starting processes in a cpuset
cset proc --set Charlie --exec -- stress -c 1 &
# Moving existing processes to a cpuset
cset proc --move --pid PID --toset=Charlie
# List task in a cpuset
cset proc --list --set Charlie
# Removing a cpuset
cset set --destroy Charlie

10.4.4 Example: cgroups

Using shell commands, proceed as follows:

  1. Create the cgroups hierarchy:

    mount -t cgroup cgroup /sys/fs/cgroup
    cd /sys/fs/cgroup/cpuset/cgroup
    mkdir priority
    cd priority
    cat cpu.shares
  2. Understanding cpu.shares:

    • 1024 is the default (for more information, see /Documentation/scheduler/sched-design-CFS.txt) = 50% utilization

    • 1524 = 60% utilization

    • 2048 = 67% utilization

    • 512 = 40% utilization

  3. Changing cpu.shares

    echo 1024 > cpu.shares

10.5 For More Information

  • Kernel documentation (package kernel-source): files in /usr/src/linux/Documentation/cgroups:

    • /usr/src/linux/Documentation/cgroups/blkio-controller.txt

    • /usr/src/linux/Documentation/cgroups/cgroups.txt

    • /usr/src/linux/Documentation/cgroups/cpuacct.txt

    • /usr/src/linux/Documentation/cgroups/cpusets.txt

    • /usr/src/linux/Documentation/cgroups/devices.txt

    • /usr/src/linux/Documentation/cgroups/freezer-subsystem.txt

    • /usr/src/linux/Documentation/cgroups/memcg_test.txt

    • /usr/src/linux/Documentation/cgroups/memory.txt

    • /usr/src/linux/Documentation/cgroups/resource_counter.txt

  • For Linux Containers (LXC) based on cgroups, see Virtualization with Linux Containers (LXC).

  • http://lwn.net/Articles/243795/—Corbet, Jonathan: Controlling memory use in containers (2007).

  • http://lwn.net/Articles/236038/—Corbet, Jonathan: Process containers (2007).

Print this page