On Wed, 2015-01-21 at 17:58 +0100, Matthias Urlichs wrote: > You seem to have sent this email before you finished writing it. Correct. It was late, the tone of post was heading in the wrong direction so I gave up for the night and pressed "Save to Drafts". This morning I discovered I had missed the "Save to Drafts" button, apparently. The post wasn't going to continue in the same direction. Regardless, I stand by my assertion you _will_ be running journald if you use systemd. And while perhaps other people might make design different decisions, if I wanted logging in a resource constrained environment (think battery powered) where any unnecessary overhead is too much, if I use systemd I don't have much of a choice about what logging system I drop. It was also true then when I first came across systemd, I was taken aback by the idea that the designers of an init system would consider it reasonable to force me to adopt their logging system, which was binary(?!?!). What on earth does a logging system have to do with init? Only later did I manage to put 2+2 together and come up with 4. Journald started life as a way to capture stdout and stderr from stuff systemd started. And it just so happens that just about all programs that run in the background are started by systemd. That's how it sees it's job after all - a super server that replaces init, cron, inetd, and so on - so all background programs are started by it. So far from being unrelated, systemd and journald are addressing two facets of the same problem. Systemd starts the background programs, and journald allows those same programs to tell the sysadmin something. And surprise, surprise it turns that the effects of design decisions like that ripple out across the system. Firstly, you notice that simply writing stuff to stdout is way easier to use than some logging API - even shell scripts can do it. Secondly some meta data becomes automagically available - like the path to the program writing it, and the systemd unit that started it. The users of the logging data, sysadmins, then notice that finding the log entries they are interested in becomes easier - because that the thing they search on *is* the name of the files they use to control the system (unit file names and executable file names). No more guessing between [CRON] / [cron], or [ppp] / [pppd] or whatever that daemons programmer decided on the day. And no more figuring to which of the many files under /var/log it will end up in. It's nice when such minor irritants just vanish. I'm not sure why they chose a binary format, and I remain somewhat suspicious of the decision. Writing the entries to a text only files(s), and adding a binary b-tree index that can be re-built at will seems more robust than having a binary format that can does does get corrupted on unplanned shutdowns. On the other hand making it binary means only journalctl can look at it. Since they control journalctl this means they can change the format to something more robust later, and none of us will be the wiser. The bottom line is I think it's fair to say you *are* locked into systemd journald in a way you weren't locked into syslog. Claims that you can just write another one exporting the same API don't ring true, because I suspect the API will be about as stable as an internal kernel API. Tracking an internal kernel API from outside of the tree is really, really hard work. That said, the lock in looks to be an outcome of good design decisions, that have yielded a (mostly) better system than we had before. And where it isn't better, it is designed in such a way that fixing it looks relatively easy. All I can say is well done boys (and girls?). So now onto another lock in. This one was driven home to me at this years LCA, when a Google sysadmin expressed his opinion of systemd over dinner. It was, literally, "we will be so fucked when it arrives". On inquiring, he claimed (I haven't checked) that cgroups were actually a Google initiative that was accepted by the upstream kernel. They (google) runs everything in it's own container - (think GMail, G+, Search, ...), and in an average week they spin up some 2 billion of them. Naturally they have written a large body of their own software to manage them, but systemd insists that it manage the cgroups via it's API, which is incompatible with every other person's API. I don't know enough to about cgroups to say if "so fucked" is an overly dramatic way of putting it, but I notice cgmanager exports it's own cgroup API yet manages co-exist with systemd. Just like journald there are design reasons for the lock in. In this case I think systemd's end goal is to put every system and user in its own cgroup. Some tools exist in systemd.exec to do this (see the Private* settings in systemd.exec). There is no denying putting each service and user in their own container certainly has more then enough security advantages to justify it on its own. The problem is this is all new hot off the press stuff - big changes in the cgroup implementation happened in the very kernel jessie is using - 3.16. Systemd's way of using them it is one of many early explorations of this space. (Many of it's seemingly inexplicable addon's arise from it. For example, you need to assign containers IP addresses, thus systemd ships with a dhcp server and client.) Currently, systemd is one of the less inventive uses of cgroups, doing far less with the concept that competitors like Docket, Rocket, and OS's built around it like OpenShift and CoreOS. So it's insisting it be the one that manages control groups could well be problematic in the future. An ironic outcome of the move to small, one job containers is the init system becomes less important. This is partially because the containers themselves don't need an init system. Replacing /sbin/init with a simple shell script often suffices, and failing that inittab with its ability to restart things is just fine. But the deeper reason is by definition an init system starts and stop things things on one box. In a container world, the containers exist in a cloud of boxes, and you use something like kubernetes to manage them. For kubernetes read: init system for the cloud. And yes, it really is cloud based. It replies on etcd, which is a distributed key value store. People familiar with Windows should recognise the concept immediately. It's like the registry, but redundantly distributed so a box spinning up at some random place in the cloud can access it's configuration information. (I can't help at smile at the thought of people reading this, and it dawning them we are replacing the text configuration files in /etc with the Windows solution.) I can't say I was surprised to discover kubernetes is an internal Google project they released to the world. Finally, I am convinced this is all highly relevant to the Debian project. The move to containerisation is going to effect what out server users expect from us. Currently packaging concerns are largely ignored in the container world, but this surely has to be a passing phase, born from the need to prove central concept before embarking on "side issues". Currently security updates, fixing compatibility issues between packages, repeatable builds, secure distribution - all things Debian has solved, are ignored. If we fixed these issues for the container people without too much effort from them, we would be very, very popular. It doesn't seem like it would be too difficult.
Attachment:
signature.asc
Description: This is a digitally signed message part