[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: logresolve

On Wed, Oct 26, 2005 at 10:15:34AM +0200, Maarten Vink wrote:
> Dan MacNeil wrote:
> >Right now it is sometimes taking longer than 24 hours to run reports 
> >(analog) on our apache logs.
> >
> >Does anyone have experience/opinions on putting logresolve into the 
> >picture ?
> >
> >How do you move the resolved log into the place of the unresolved one 
> >without stopping apache for long ?
> >
> >maybe approximately:
> >
> >    cd $RIGHT_DIR
> >    apachectl stop
> >    mv combined.log combined.log.tobe.resolved
> >    apachectl start
> >    logresolve < combined.log.tobe.resolved > combined.log.resolved
> >    apachectl stop
> >    cat combined.log >> combined.log.resolved
> >    mv combined.log.resolved combined.log
> >    apachectl start
> As far as I know, apache will continue writing to a logfile if it's moved 
> to another location on the same filesystem; you can then use "apachectl 
> graceful" to have apache open new logfiles.

yep, that's right.

also, use jdresolve (which is packaged for debian) rather than
logresolve - it is much faster.

Package: jdresolve
Installed-Size: 112
Maintainer: Frederic Peters <fpeters@debian.org>
Architecture: all
Version: 0.6.1-4
MD5sum: 6e6c69bee495dfb638e89116043279ea
Description: fast alternative to apache logresolve
 The jdresolve application resolves IP addresses into hostnames. To
 reduce the time necessary to resolve large batches of addresses,
 jdresolve opens many concurrent connections to the DNS servers, and
 keeps a large number of text lines in memory. These lines can have
 any content, as long as the IP addresses are the first field to the
 left. This is usually the case with most formats of HTTP and FTP log

i rotate, resolve, and process my log files like so:

first, there is a virtual-hosts.conf file which lists each virtual host.
this file is used by numerous scripts in my virtual hosting system
- from generating apache/htdig/whatever config fragments to running
webalizer, htdig, linbot and other tools against each virtual host.

this virtual-hosts.conf file avoids a lot of annoying and unneccessary
duplication of information - just enter all the details into a single
line of this file and the system takes care of the rest. 

in theory, it could easily be put into a database like postgres or
mysql, but i really prefer to use vi for stuff like this and a database is
overkill for this job....i've run this vhosting system on small boxes
with only one or two vhosts up to medium sized boxes with nearly 1000
vhosts, and i don't see any reason (apart from machine capacity) why it
wouldn't work for 10,000 or more vhosts.

---cut here---
# boolean flags available are: cgi, mod_perl, ssi, linbot, allowtilde
# allowtilde suppresses redirection of http://hostname/~username/ to
# the VWS.  useful for testing only.
#   note: mod_perl must be enabled in /etc/apache/httpd.conf
# also note: /etc/apache/httpd.conf needs a line "include /etc/virtuals/include-virtuals.httpd"
# multi-valued flags avaliable are: aliases=(alias1 alias2 alias3)

#ip.ip.ip.ip:hostname:username:cgi,aliases=(www ftp),ssi

# ftp),htdig,allowtilde
---cut here---

then there is the run-logs.pl script. it reads in a virtual-hosts.conf
file and parses it to populate the %virtuals hash. to execute the
commands, it opens a pipe to bash - or, for testing, just prints the
commands to a file in /tmp. it uses webalizer at the moment, but it has
used analog and other tools in the past. it's easy to modify it to use
either or both - in fact, most of the script is generic and can be used
to apply any process to all vhosts.

the resolved log files are renamed to access.YYYYMMDD and then gzipped.

#! /usr/bin/perl

use Date::Format;

$now = time;
$date = time2str("%Y%m%d",$now);

# this script runs webalizer on each virtual server's log
# files.

# Copyright Craig Sanders, 1999.  
# This script is licensed under the terms of the GNU GPL.

# Author: Craig Sanders <cas@taz.net.au>
# $Id: run-logs.pl,v 1.7 2004/05/28 03:31:35 root Exp $
# Revision History:
#   1999-04-13 - first version
#   2000-05-16 - modified to use savelog to rotate logfiles rather than
#                depend on cronolog to have automagically rotated them.
#                modified to remove analog processing.

# number of old log files to keep

$confdir='/etc/virtuals' ;
$virtuals="$confdir/virtual-hosts.conf" ;

open(VIRT,"<$virtuals") || die "couldn't open $virtuals: $!\n" ;
while (<VIRT>) {
	# skip blank lines and comments
	next if (/#|^\s*$/) ;

	($IP,$domain,$username,$flags) = split /:/;
	$virtual{$domain}=$username ;
    # flags aren't used by this particular script.  they're mostly used by the
    # apache and htdig generator scripts.
} ;
close(VIRT) ;

#open(SHELL,">/tmp/runlog.sh") || die "couldn't open pipe to bash: $!\n" ;
open(SHELL,"|/bin/bash") || die "couldn't open pipe to bash: $!\n" ;

# first rotate the log files and restart apache, as quickly as possible.
foreach (sort keys %virtual) {
	$username=$virtual{$_} ;
	$home="/home/$username" ;
	$logdir="$home/www_logs" ;
	print SHELL <<__EOF__;
if [ -e $logfile ] ; then 
  cd $logdir
  savelog -p -c $numerrs error.log
  mv access.log access.$date
  chown $username access.$date error.log.0

print SHELL "PATH='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'\n";
print SHELL "/etc/init.d/apache reload\n";

# then process the rotated log files.
foreach (sort keys %virtual) {
	$username=$virtual{$_} ;
	$home="/home/$username" ;
	$logdir="$home/www_logs" ;
	$outdir="$home/public_html/LOGS" ;
	$logfile="$logdir/access" ;
	#$errfile="$logdir/error.log" ;

	print SHELL <<__EOF__;
echo processing logs for $_
if [ -e $logfile.$date ] ; then
  mkdir -p $outdir/webalizer
  mv $logfile.$date $logfile.orig
  jdresolve -t 40 -s 64 -l 100000 -n -r - <$logfile.orig >$logfile.$date
  chown $username $logfile.$date

  touch -r $logfile.orig $logfile.$date
  rm -f $logfile.orig
  webalizer -n $_ -o $outdir/webalizer $logfile.$date 2>/dev/null
  gzip -9q $logfile.$date
  echo "  no logs today...nothing to do for $_"



finally, the main /var/log/apache/access.log file is rotated by
logrotate once/week - it's mostly empty anyway, that vhost is only used
by me for querying the system anyway.

it could be done in the script above, around where apache is reloaded,
but then you have to disable logrotate processing for apache. it's less
hassle to just let logrotate do it:

/var/log/apache/*.log {
	#rotate 52
	rotate 7
	create 640 root adm
	   if [ -f /var/run/apache.pid ]; then \
	     if [ -x /usr/sbin/invoke-rc.d ]; then \
		invoke-rc.d apache reload > /dev/null; \
	     else \
	        /etc/init.d/apache reload > /dev/null; \
	     fi; \
       /usr/bin/jdresolve -t 40 -s 64 -l 100000 -n -r - </var/log/apache/access.log.1 >/var/log/apache/access.`date +%Y%m%d`
       /usr/bin/webalizer -n "ganesh" -o /var/www/webalizer /var/log/apache/access.`date +%Y%m%d` >/dev/null
	   /bin/gzip -9 /var/log/apache/access.`date +%Y%m%d` >/dev/null 
	   /bin/rm -f /var/log/apache/access.log.1


craig sanders <cas@taz.net.au>           (part time cyborg)

Reply to: