Weekly report (13th week) - Debian GNU/Hurd Debianish initialization
.. tags: gsoc, debian, hurd
.. date: 2013/09/13 17:59:49
.. title: cgroupfs is as cgroupy as it gets...
.. slug: cgroupfs-is-as-cgroupy-as-it-gets
\... at least until the cgroup interface is fixed. So, what can it do?
* There is `tasks` and `cgroup.procs`. There are no thread IDs on
Hurd, so cgroupfs works only on a per-process basis, not
per-threads. Consequently `tasks` has the same semantic as
`cgroup.procs`. Seeing that PIDs and TIDs can be used (mostly)
interchangeably on Linux I think this is okay to do.
* You can create and destroy cgroups, child processes are properly
* You can register an `release_agent` and it is executed whenever the
last process in a cgroup dies.
* There is `notify_on_release` to enable or disable the use of
* There is `cgroup.clone_children`, one can toggle this bit but it is
So, what's missing?
* There are no controllers. I haven't looked into this and resource
accounting is one of Hurds weakest points, but it is fathomable that
one could e.g. advise the scheduler inside the Mach kernel based
upon the state of the cgroups if the cgroupfs process is
sufficiently privileged (did I mention that any user can use
* The notification API aka `cgroup.event_control`. The Hurd lacks
`eventfd(2)`, but even if that was implemented, this interface would
still be impossible to implement. Rant below.
* A patch for gnumach to make this bulletproof. I made some
encouraging progress with that one this week, but there's nothing
So, what's wrong with Linux cgroup API?
Well for one thing the whole API is underspecified. Yes, there is
that is not a specification, that's a howto at best. Second, the
notification API is not particularly nice::
To register a new notification handler you need to:
- create a file descriptor for event notification using eventfd(2);
- open a control file to be monitored (e.g. memory.usage_in_bytes);
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
Interpretation of args is defined by control file implementation;
Seriously? There is a POSIXly way to pass file-descriptors around, but
smashing the decimal representation of it into a string is not the way
to do that. Linux gets away with this hack because the kernel knows
the process who wrote(2) that string in the first place, parse the
string into an integer and look it up in the table of file descriptors
for that process.
Now the trouble for cgroupfs is, that it is not the kernel and even if
it were, it wouldn't solve the problem because on Hurd there are no
file descriptors (well there are, but that's only to appease all the
POSIX programs out there). Instead Hurd has ports, and you can send
messages to ports, and this is pretty much everything that you can do
on a Mach system. Reading a file works roughly like this:
1. You open a file and get a port X.
2. You send a message like "I'm like really interested in the first Y
bytes of that file" to X.
3. Whoever has the receiving end of X (probably the one who gave you X
in the first place) answers your request.
Ports look pretty much like file descriptors, they are (usually small)
integers, you can make them, destroy them, pass them around easily
(yes, ports are first class objects in the Mach messaging
system). Everything is implemented atop of this mechanism. It is
transport-agnostic, the other end could be on another machine and you
wouldn't even know. You can create proxies or filters (in fact, that
is exactly how the firewall `eth-filter
is implemented). It's beautiful and extensible at it's heart, like
So if X were a port to e.g. `memory.usage_in_bytes` and the cgroups
interface would be less braindead^W^Wmore carefully designed so that
on Hurd it could be transported like ports usually are, then cgroupfs
could in fact use port X' to look up which file the caller is
interested in (this is possible because cgroupfs was the one handing
out the port in the first place) and generate notifications for that
file. This is not possible when X is "serialized for transport" using
sprintf because port names are specific for each process, so X !=
X'. The kernel would do the translation while sending the message, but
it obviously cannot do that if the number is carried in a character
I'm not sure what I'm going to do next week. The gsoc timeline
suggests a soft-pencils-down, time to scrub code and write
documentation, not sure that this is applicable to me as I have pushed
most of my work upstream as early as possible. I guess I will nag
Samuel so that he merges the outstanding patches and continue working
on my gnumach patch.