Re: apache log retention


On Wed, May 11, 2011 at 08:14:35PM +0100, Stephen Gran wrote:
> I've noticed that we keep apache logs for a year.  This has two issues
> from my point of view, and I think we can attack each separately:
> a) disk space.  A machine like kassia (not in the www rotation, but
>    affected by our retention policy) consumes 30GB of disk space storing
>    just apache logs.  That is space that could better be used elsewhere.
> b) privacy.  We log the IP address of connecting clients, and a year's
>    worth of data may be enough to do some harm in some theoretical edge
>    case.  No one has, to my knowledge, ever asked for this data, but I'd
>    frankly prefer not to have it to give them should they come knocking.

Generating stats about users and how they use www.debian.org would be interesting
now the website is hosted by the project.  

And ensure these logs are readable by debwww ?

Or hashing the IP in a way we can which logs entries come from the same client ?
(just found http://bug.st/mod_anonstats or https://github.com/franzs/mod_log_iphash)
> So, I propose the following:
> We create an apache LogFormat that does not include client information,
> and start using it.  This seems like it will be fairly painless.

Client user-agent is useful too (for www, not for package mirrors).
> We change our log retention polict to something short, like a week or
> two.

2 weeks retention looks ok.

> I know people can be invested in web logs, so I want to solicit opinions
> before just doing this.  If people reading this mail think of others who
> should be involved in the discussion, by all means forward the mail to
> them.

[ cc debian-www ]


Simon Paillard

Reply to: