Re: Being part of a community and behaving

To: debian-devel@lists.debian.org
Subject: Re: Being part of a community and behaving
From: Matthias Urlichs <matthias@urlichs.de>
Date: Sat, 15 Nov 2014 17:21:18 +0100
Message-id: <[🔎] 20141115162118.GC16997@smurf.noris.de>
In-reply-to: <[🔎] 546748F3.9000402@gmail.com>
References: <[🔎] CAK0OdpxmKyqxinUC+L+35X0GOLE04YE37WWV5UG3yNsZ4SMZRw@mail.gmail.com> <[🔎] 20141113122331.GC22272@auth.logic.tuwien.ac.at> <[🔎] 20141113131941.GA6898@pax.zz.de> <[🔎] 87r3x7f4oa.fsf@hope.eyrie.org> <[🔎] 1415897759.11764.44.camel@G3620.my.own.domain> <[🔎] 5464F146.6000000@ralfj.de> <[🔎] 546748F3.9000402@gmail.com>

Hi,

Raphaël Halimi:
> raph@arche:~$ journalctl | grep Forwarding

try this instead:

$ journalctl _SYSTEMD_UNIT=systemd-journald.service

which will (most likely) also show messages like "Suppressed 1927 messages
from /PATH/FOO.slice". You can then use 

$ journalctl _SYSTEMD_SLICE=FOO.slice

to display the non-suppressed part of the spew that's responsible
for this overrun.

> nov. 10 20:14:34 arche systemd-journal[207]: Forwarding to syslog missed
> 42 messages.

Presumably, systemd is not capable of changing the speed of your syslog
daemon. So what would happen otherwise? If it's stdout/err logging, the
information would not have been logged at all (daemons redirect it to
/dev/null when backgrounding), or it'd have been tossed when sendmsg()
returns an error due to a full buffer. Or, if it's an intermittent problem
with syslog rather than message spew, the fact that you now have one pipe
to syslogd instead of N of them, each 4k big may be relevant.

Thus, the fix is
(a) increase the kernel buffer size of the pipe to syslog,
(b) increase the speed of your syslogd, (c) decrease that daemon's latency,
(d) teach whatever program logged so much to Not Do That, and/or
(e) decrease journald's RateLimitBurst= config variable so that
    it doesn't overload your syslog. Oh yes,
(f) if your syslog still sync()s a log file after every message, tell it
    to not Do That.

(a) should be a straightforward patch. (b) to (d) are not systemd's
problem. (e) defaults to 1000 in 30 seconds, which may be too much
for your syslog to keep up with.

> drop the message" (IIRC it was even more condescending, like "we don't
> have to wait for this" or something). Really ? The very piece of code
> which is supposed to talk to syslog... doesn't wait for syslog ?
> 
Do you want to buffer an unbounded number of messages in RAM instead,
hoping that syslog will catch up eventually? Thanks, but no thanks.
(Implementing a _bounded_ message buffer in systemd doesn't make sense,
because you can get the exact same effect by simply doing (a), above.)

> So if one can't afford to have crippled logs, what's the solution ?

It's likely (though not certain) that your logs have been crippled in the
past, albeit in a different way, and you simply didn't notice because the
logging program didn't tell you. The standard syslog(3) code doesn't.

> Getting rid of syslog completely by turning on persistence in journald,
> and go with binary logs ? Thanks, but no thanks.
> 
Why not?

Seriously. I can do a whole lot more with this strange binary journal thing
than with a text file.

* All error messages from my web server setup, no matter which process
  logged them?
  One command.
* Get everything Joe User did last week (that resulted in a syslog entry)?
  One command.
* Post-process some logs without writing fragile regexps which need to make
  triple sure no random crap throws off your syslog parser?
  Export the entries you need (and only these) as JSON.
* Want a logger that will NOT fill your whole disk with logs, no matter what?
  No problem.
And you get all of this without sequentially scanning a couple of huge
syslog-written files with redundant data (just how many syslog files does a
WARN message from the kernel end up in?).

Yes, binary logs are somewhat less crash-proof. In theory. But in my
experience, a random crash which doesn't even sync() will also leave a big
fat spot of NULLs in the text log, so you don't have any useful information
about the crash in either case. And if it does sync successfully, well, the
text log will be OK, but so will be its binary counterpart.

Yes, this may sound fanboy-ish. But let me tell you, the simple fact is
that this evil buggy monolithic systemd stuff some people complain about
saves me a lot of time, not all of which I then spend on Debian mailing
lists fanboy-ing. :-P  (I'm also somewhat too old to be called "boy". :-/ )

Besides, I'm not blind to the fact that not all is well in systemd land.
But that's a different topic.

-- 
-- Matthias Urlichs

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Re: Being part of a community and behaving
  - From: Philip Hands <phil@hands.com>
- Re: Being part of a community and behaving
  - From: Raphaël Halimi <raphael.halimi@gmail.com>

References:
- Being part of a community and behaving
  - From: Bálint Réczey <balint@balintreczey.hu>
- Re: Being part of a community and behaving
  - From: Norbert Preining <preining@logic.at>
- Re: Being part of a community and behaving
  - From: Florian Lohoff <f@zz.de>
- Re: Being part of a community and behaving
  - From: Russ Allbery <rra@debian.org>
- Re: Being part of a community and behaving
  - From: Svante Signell <svante.signell@gmail.com>
- Re: Being part of a community and behaving
  - From: Ralf Jung <post@ralfj.de>
- Re: Being part of a community and behaving
  - From: Raphaël Halimi <raphael.halimi@gmail.com>

Prev by Date: Re: Being part of a community and behaving
Next by Date: Re: RFC: DEP-14: Recommended layout for Git packaging repositories
Previous by thread: Re: Being part of a community and behaving
Next by thread: Re: Being part of a community and behaving
Index(es):
- Date
- Thread