Getting started with Control Groups (Cgroups)

Preamble

I already tested a Operating System resource scheduler with HP Process Resource Manager (PRM). When we migrating from HPUX to Linux, Oracle was still not certified on RedHat 6 and we had to use RHEL 5 and nothing was existing on this release to control resource usage…

We can see around that cgroups ancestor is supposed to be /etc/security/limits.conf but for me it has nothing to see as limits.conf (with associated pam_limits module) is putting soft and hard limits to processes. While cgroups aim to prioritize server resources (cpu, memory, I/O, network, …)…

Control groups (Cgroups) is a kernel feature that has been introduced with kernel 2.6.24 and so is availbale on all Linux distribution using this kernel or above…

I have tested this functionality on Oracle Linux Server release 6.4.

Installation

Start by installing libcgroup package:

[root@server1 ~]# yum install libcgroup
Loaded plugins: security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package libcgroup.x86_64 0:0.37-7.2.el6_4 will be installed
--> Finished Dependency Resolution
 
Dependencies Resolved
 
===============================================================================================================================================================================================================
 Package                                        Arch                                        Version                                               Repository                                              Size
===============================================================================================================================================================================================================
Installing:
 libcgroup                                      x86_64                                      0.37-7.2.el6_4                                        public_ol6_latest                                      110 k
 
Transaction Summary
===============================================================================================================================================================================================================
Install       1 Package(s)
 
Total download size: 110 k
Installed size: 272 k
Is this ok [y/N]: y
Downloading Packages:
libcgroup-0.37-7.2.el6_4.x86_64.rpm                                                                                                                                                     | 110 kB     00:00
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : libcgroup-0.37-7.2.el6_4.x86_64                                                                                                                                                             1/1
  Verifying  : libcgroup-0.37-7.2.el6_4.x86_64                                                                                                                                                             1/1
 
Installed:
  libcgroup.x86_64 0:0.37-7.2.el6_4
 
Complete!

It creates default configuration files:

[root@server1 ~]# ll /etc/cg*
-rw-r--r-- 1 root root  812 Jun 25 17:04 /etc/cgconfig.conf
-rw-r--r-- 1 root root 1705 Jun 25 17:04 /etc/cgrules.conf
-rw-r--r-- 1 root root  161 Jun 25 17:04 /etc/cgsnapshot_blacklist.conf

But no service started by default (cgconfig and cgred):

[root@server1 ~]# chkconfig --list | grep cg
cgconfig        0:off   1:off   2:off   3:off   4:off   5:off   6:off
cgred           0:off   1:off   2:off   3:off   4:off   5:off   6:off
.
.

From official documentation:

The cgconfig service installed with the libcgroup package provides a convenient way to create hierarchies, attach subsystems to hierarchies, and manage cgroups within those hierarchies

Cgred is a service (which starts the cgrulesengd daemon) that moves tasks into cgroups according to parameters set in the /etc/cgrules.conf file

Configuration

To list the available subsystems (it differs from one kernel to another):

[root@server1 ~]# uname -a
Linux server1.domain.com 2.6.39-400.210.2.el6uek.x86_64 #1 SMP Thu Oct 17 16:28:13 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@server1 ~]# lssubsys -am
cpuset
cpu
cpuacct
memory
devices
freezer
net_cls
blkio

The RedHat 6 official documentation is providing a little bit more subsystems than my Oracle Unbreakble Enterprise Kernel:

Subsystems Description
blkio The Block I/O (blkio) subsystem controls and monitors access to I/O on block devices by tasks in cgroups. Writing values to some of these pseudofiles limits access or bandwidth, and reading values from some of these pseudofiles provides information on I/O operations
cpu The cpu subsystem schedules CPU access to cgroups. Access to CPU resources can be scheduled using two schedulers:

  • Completely Fair Scheduler (CFS) — a proportional share scheduler which divides the CPU time (CPU bandwidth) proportionately between groups of tasks (cgroups) depending on the priority/weight of the task or shares assigned to cgroups. For more information about resource limiting using CFS.
  • Real-Time scheduler (RT) — a task scheduler that provides a way to specify the amount of CPU time that real-time tasks can use. For more information about resource limiting of real-time tasks.
cpuacct The CPU Accounting (cpuacct) subsystem generates automatic reports on CPU resources used by the tasks in a cgroup, including tasks in child groups
cpuset The cpuset subsystem assigns individual CPUs and memory nodes to cgroups
devices The devices subsystem allows or denies access to devices by tasks in a cgroup
freezer The freezer subsystem suspends or resumes tasks in a cgroup
memory The memory subsystem generates automatic reports on memory resources used by the tasks in a cgroup, and sets limits on memory use by those tasks
net_cls The net_cls subsystem tags network packets with a class identifier (classid) that allows the Linux traffic controller (tc) to identify packets originating from a particular cgroup. The traffic controller can be configured to assign different priorities to packets from different cgroups
net_prio The Network Priority (net_prio) subsystem provides a way to dynamically set the priority of network traffic per each network interface for applications within various cgroups
ns The ns subsystem provides a way to group processes into separate namespaces. Within a particular namespace, processes can interact with each other but are isolated from processes running in other namespaces. These separate namespaces are sometimes referred to as containers when used for operating-system-level virtualization
perf_event When the perf_event subsystem is attached to a hierarchy, all cgroups in that hierarchy can be used to group processes and threads which can then be monitored with the perf tool, as opposed to monitoring each process or thread separately or per-CPU

Default configuration, only subsystems are mount in a virtual filesystem and obviously no rules have been created:

[root@server1 etc]# grep -v ^# cgconfig.conf
 
mount {
        cpuset  = /cgroup/cpuset;
        cpu     = /cgroup/cpu;
        cpuacct = /cgroup/cpuacct;
        memory  = /cgroup/memory;
        devices = /cgroup/devices;
        freezer = /cgroup/freezer;
        net_cls = /cgroup/net_cls;
        blkio   = /cgroup/blkio;
}
 
[root@server1 etc]# grep -v ^# cgrules.conf

I create two groups, one low profile with 25% of CPU and an high_profile with 75% of CPU (I NUMA emulated my configuration to mix cpu and cpuset subsystems):

group low_profile {
  perm {
    task {
      uid = yjaquier;
      gid = users;
    }
  admin {
    uid = root;
    gid = root;
  }
}
        cpuset {
          cpuset.cpus = "0";
          cpuset.mems = "0";
        }
        cpu {
          cpu.shares = "25";
        }
}
 
group high_profile {
  perm {
    task {
      uid = oracle;
      gid = dba;
    }
  admin {
    uid = root;
    gid = root;
  }
}
        cpuset {
          cpuset.cpus = "0";
          cpuset.mems = "0";
        }
        cpu {
          cpu.shares = "75";
        }
}

On my small virtual machine I have only one CPU and memory is interleaved (one memory band):

[root@server1 ~]# numactl --show
policy: default
preferred node: current
physcpubind: 0
cpubind: 0
nodebind: 0
membind: 0

Remark:
If you want to give NUMA affinity to your applications then cpuset is the subsystem to use but as already said it is complex and tuning it you loose its added value.

I associate my personal account (yjaquier) to low_profile and oracle account to high_profile:

[root@server1 ~]# grep -v ^# /etc/cgrules.conf
yjaquier        *       low_profile
oracle          *       high_profile

Finally I start the two cgroups services:

[root@server1 etc]# service cgconfig start
Starting cgconfig service:                                 [  OK  ]
[root@server1 etc]# service cgred start
Starting CGroup Rules Engine Daemon:                       [  OK  ]

Testing

To test it I’m using the C program (eat_cpu.c) I have developed while testing HP Process Resource Manager (PRM). I have improved to remove all compilation warnings (again the idea is to create a CPU intensive executable):

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#define FALSE  0
#define TRUE   1
 
/* To compile: gcc eat_cpu.c -o eat_cpu -lm */
int main(int argc, char *argv[])
{
	pid_t pid;
	int i;
	double x,y;
 
  if (argc <= 1) {
	  printf("Please provide number of CPU\n");
    exit(1);
  }
  else
    printf("%s running for %d CPU(s)\n",argv[0],atoi(argv[1]));
 
  for (i=1; i<=atoi(argv[1]); i++) {
    pid=fork();
    if (pid == 0) {
  	  printf("Creating child %d\n",i);
      x = 0.000001;
      while (TRUE) {
        y = tan(x);
        x = x + 0.000001;
      }
      exit(0);
    }
  }
  return 0;
}

Before giving few result if I execute eat_cpu program one time per account (yjaquier and oracle) without the cgroups in loop I obviously get (each process is using half the only available CPU):

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5143 oracle    20   0  6544   80    0 R 49.9  0.0   1:23.51 eat_cpu
 5145 yjaquier  20   0  6544   76    0 R 49.6  0.0   0:28.28 eat_cpu

Before going further, as many persons, I have asked myself why a difference in CPU usage between ps and top commands. From ps manual:

CPU usage is currently expressed as the percentage of time spent running during the entire lifetime of a process. This is not ideal, and it does not conform to the standards that ps otherwise conforms to. CPU usage is unlikely to add up to exactly 100%.

In other word the CPU percentage of ps command is giving an average CPU usage of the process over its complete life while top is giving current (real time) CPU usage:

[root@server1 ~]# top -b -n 1 | head -n 9
top - 14:32:00 up  4:11,  4 users,  load average: 2.00, 1.55, 0.79
Tasks: 117 total,   3 running, 114 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.6%us,  0.3%sy,  0.0%ni, 94.5%id,  0.6%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   1020820k total,   638148k used,   382672k free,    46312k buffers
Swap:  4194300k total,     3740k used,  4190560k free,   460788k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6882 oracle    20   0  6544   76    0 R 50.0  0.0   0:09.25 eat_cpu
 6884 yjaquier  20   0  6544   80    0 R 50.0  0.0   0:06.13 eat_cpu
[root@server1 ~]# ps -o %cpu,cgroup,euser,cmd 6882 6884
%CPU CGROUP                              EUSER    CMD
50.4 -                                   oracle   /tmp/eat_cpu 1
49.5 -                                   yjaquier /tmp/eat_cpu 1

In above example groups are not in action and ps/top commands are giving similar results. This is not the case when you put Cgroups in loop with non balances allocation…

When the two Cgroups services are running I get (as expected). Either you stop and restart the 2 processes to have them associated with their respective Cgroups or you use cgclassify command::

[root@server1 ~]# cgclassify -g cpu,cpuset:low_profile 6884
[root@server1 ~]# cgclassify -g cpu,cpuset:high_profile 6882
[root@server1 ~]# top -b -n 1 | head -n 9
top - 14:36:57 up  4:16,  4 users,  load average: 2.07, 1.88, 1.14
Tasks: 118 total,   3 running, 115 sleeping,   0 stopped,   0 zombie
Cpu(s):  6.2%us,  0.3%sy,  0.0%ni, 92.9%id,  0.5%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   1020820k total,   638724k used,   382096k free,    46456k buffers
Swap:  4194300k total,     3740k used,  4190560k free,   460840k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6882 oracle    20   0  6544   76    0 R 75.2  0.0   6:28.53 eat_cpu
 6884 yjaquier  20   0  6544   80    0 R 25.7  0.0   5:42.64 eat_cpu
[root@server1 ~]# ps -o %cpu,cgroup,euser,cmd 6882 6884
%CPU CGROUP                              EUSER    CMD
53.7 blkio:/;net_cls:/;freezer:/;devices:/;memory:/;cpuacct:/;cpu:/high_profile;cpuset:/high_profile oracle /tmp/eat_cpu 1
46.0 blkio:/;net_cls:/;freezer:/;devices:/;memory:/;cpuacct:/;cpu:/low_profile;cpuset:/low_profile yjaquier /tmp/eat_cpu 1

Remark:
We see here that monitoring using ps command is no more possible…

If I want to dynamically change the CPU allocation to 80%/20% I have two alternatives:

[root@server1 ~]# cgset -r cpu.shares=80 high_profile
[root@server1 ~]# cgset -r cpu.shares=20 low_profile

Or:

[root@server1 ~]# echo 80 > /cgroup/cpu/high_profile/cpu.shares
[root@server1 ~]# echo 20 > /cgroup/cpu/low_profile/cpu.shares

And it is dynamically refected:

[root@server1 ~]# top -b -n 1 | head -n 9
top - 16:55:04 up  6:34,  4 users,  load average: 2.00, 1.53, 0.77
Tasks: 119 total,   3 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s): 13.7%us,  0.3%sy,  0.0%ni, 85.5%id,  0.4%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   1020820k total,   640996k used,   379824k free,    48196k buffers
Swap:  4194300k total,     3732k used,  4190568k free,   460952k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9974 oracle    20   0  6544   76    0 R 79.8  0.0   5:30.02 eat_cpu
 9972 yjaquier  20   0  6544   80    0 R 20.0  0.0   1:31.46 eat_cpu

One concrete example I could see in Oracle database world is when I have, as a simple example, two databases running on my single CPU box. Without Cgroups and with threee CPU consuming process you get the below balanced distribution:

[root@server1 ~]# top -b -n 1 | head -n 10
top - 17:02:46 up  6:42,  4 users,  load average: 2.15, 1.93, 1.26
Tasks: 120 total,   4 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s): 15.1%us,  0.3%sy,  0.0%ni, 84.0%id,  0.4%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   1020820k total,   641444k used,   379376k free,    48352k buffers
Swap:  4194300k total,     3732k used,  4190568k free,   461088k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9974 oracle    20   0  6544   76    0 R 32.9  0.0  11:34.30 eat_cpu
10382 yjaquier  20   0  6544   80    0 R 32.9  0.0   0:04.88 eat_cpu
 9972 yjaquier  20   0  6544   80    0 R 31.0  0.0   3:08.90 eat_cpu

If I configure Cgroups to allocate to each database runnning on the server 100/(number of database runnning on the server) % of CPU resource (so 50% in my simple example). I get the below distribution and no database can kill whole server performance:

[root@server1 ~]# top -b -n 1 | head -n 10
top - 17:04:44 up  6:44,  4 users,  load average: 2.89, 2.29, 1.47
Tasks: 120 total,   4 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s): 15.5%us,  0.3%sy,  0.0%ni, 83.7%id,  0.4%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   1020820k total,   641764k used,   379056k free,    48408k buffers
Swap:  4194300k total,     3732k used,  4190568k free,   461088k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9974 oracle    20   0  6544   76    0 R 49.1  0.0  12:21.79 eat_cpu
10382 yjaquier  20   0  6544   80    0 R 27.5  0.0   0:40.05 eat_cpu
 9972 yjaquier  20   0  6544   80    0 R 25.5  0.0   3:44.07 eat_cpu

References

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>