Bug#759382: webalizer log needs
One of the arguments for keeping more apache logs is that web analysis
tools need them. Here is a survey of things I found in Debian and what
they might need. First some general comments:
* ideally the analyzer scans the log, generates stats, and then stores
the stats and doesn't need the log anymore. There still might be
sensitive data in the results, but the data stored is greatly reduced.
* even if logfile analyzers store results and are run often, we need
to consider the case of machines with power management that might mean
they are asleep at times. Hopefully anacron will help deal with that,
but we probably want to be conservative and make sure they have plenty
of opportunity to scan the logs before they are rotated.
Has the ability to do incremental processing with a cache file
debian/rules sets CACHEDIR (is that sufficient?) (only DNS?)
Doesn't seem to run via cron.
Fork of webalizer, see below.
Gathers stats from logs every 10mins, updates html once a day.
Interactive user tool. Doesn't appear to store results and so probably
needs the logs to be useful?
Appears to use a database, has daily/weekly/monthly cronjobs. Need more
info about if it stores stats.
Doesn't appear to store stats, probably requires logs?
Keeps track of where it is in the log files, maintains a
an incremental history file, and runs via cron once a day.
For the things that do store data, I think 7 days should be enough to
ensure that they have a chance to process the logs before they get
The above might be interesting for nginx log retention as well.