[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Journald's qualities (was: Selective rotation of journald logs)



Dnia 2024-02-23, o godz. 10:26:49
Stefan Monnier <monnier@iro.umontreal.ca> napisał(a):

> > [13:37:48]cthulhu:/var/log/journal☠ journalctl |dd of=/dev/zero
> > bs=1M 0+15442 records in
> > 0+15442 records out
> > 63643115 bytes (64 MB, 61 MiB) copied, 5,47791 s, 11,6 MB/s
> > du -h /var/log/journal/
> > 337M	/var/log/journal/44cf6f547971fc33309d1e9e000002e7
> > 337M	/var/log/journal/
> >
> > (I've raised a bug 8 years ago about it
> >  https://github.com/systemd/systemd/issues/2460 )  
> 
> Oh, that bug report is quite interesting, thanks.
> Makes one wonder why they don't use naive append-only "plain text"
> logs (tho with appropriate delimiters (maybe some kind of CSV) to make
> searches more reliable than with old-style plain text logs)?
> 
> What are the advantages of journald's representation?
> I mean, to justify the slow search and large disk space usage, there
> is presumably some upside for some use cases.  

The advantage of binary format is that you have separate fields for
program, level etc and it is easier to filter by exactly what you want. 
So "systemctl status servicename" just asks log subsystem for exactly
what it needs.

But journald logs aren't build for querying speed, there is for example
no index (in memory or on disk) that tells you where last log entries
from apps are so you get pathological cases like in ticket.

The problem is that you kinda do 2 types of querying on logs most often:

* get me all/most of what happened in last x lines/hours
* get me everything recent (or just everything period) that this
  particular service returned.

Which kinda exclude the "simple" tricks" like "just hash over service
name so at worst you have to search only 1/64th of files" as they would
make the "get me everything" case slower. But maybe not by all that
much.

The size issue... it's appendable binary format which means that any
metadata systemd adds can't be deduplicated by just having join on
table with service info, it needs to be repeated every single line and
I'm guessing that's where the bloat comes from.

I rambled about SQLite as it is kinda closest comparison for embedded
binary format that can be searched and I think that being able to run
SQL queries (sqlite now even have full text search) against your local
logs with no database daemons running would be killer feature on some of
the smaller devices.


> argument against Sqlite based on the size of Sqlite, 

I'd argue that's not really technically relevant as in most cases you'd
have other app in system using sqlite for *something* and will have
installed the library anyway. And space savings gets wasted just on how
verbose journald format is.

> but what are the
> advantages of journald's representation compared to a naive one?

in short: querability without text parsing. That's about it.

Which *is* a great feature, do not get me wrong, I use it all the time,
but the slowdown caused by its unfortunate implementation detail is
also showing up when I use it pretty often as the most common use case
is "something broke, log in on server that has been running on
automation for last 3 months and see what is happening"

It does have some features around FSS (signing logs) so for folks that
need it there isn't many alternatives as it would be hard to make
similar feature on database.

But on flipside inability to separate log into log rotation groups is
*nasty*. You basically can't have verbose service in the system as that
will rotate away any old logs of other services away. And looking for
those logs is also the worst case:

[17:57:56]cthulhu:~☠ echo 3 > /proc/sys/vm/drop_caches
[17:58:14]cthulhu:~☠ time systemctl status nginx |grep Notice
Notice: journal has been rotated since unit was started, output may be incomplete.

real	0m1,645s
user	0m0,000s
sys	0m0,094s
[17:58:19]cthulhu:~☠ echo 3 > /proc/sys/vm/drop_caches
[17:58:28]cthulhu:~☠ time sqlite /tmp/log_indexed.sqlite 'select * from log where source = "nginx";'  >/dev/null

real	0m0,374s
user	0m0,000s
sys	0m0,009s


I guess another one being appendable log is that in case of crash it's easier to recover than embedded database...

...or so would I say if I didn't saw journald throwing away binary logs with some errors after crash.


> 
> 
>         Stefan
> 
> 
> PS: FWIW, the only time I thought about the design of a log format,
> I was thinking along the lines of reducing redundancy, i.e. keeping
> a kind of "suffix tree" of past messages so as to replace the many
> repeated chunks of text with references to past log entries.
> Then I realized that I could get the same result much more simply by
> gzipping the files, which would naturally be done as part of
> log rotation 🙂
> 

One-service-per-file approach is honestly good enough for most stuff.
PITA to get chronological order thought, every approach really have
some drawbacks and benefits.

But honestly ? Send it to central logger and worry there if you have a
chance, especially if you do some parsing there can be some amazing
productivity gains to be had.

We took a bunch of work to have every firewall log parsed into right
columns but now answering questions like "at which firewall this
traffic is blocked?" is a breeze.

Of course, not really an option for smaller stuff

-- 
Mariusz Gronczewski (XANi) <xani666@gmail.com>
GnuPG: 0xEA8ACE64
https://devrandom.eu


Reply to: