[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Hitcount-sorted translation statistics



Hello,

This is about the order of files listed at pages such as
http://www.debian.org/devel/website/stats/pl.html

I felt that the alphabetical order of files in each section is mostly
useless. It would be much more useful to order the files by their
visitor count, so that translators know which files to focus on to
maximize the impact they can achieve with the time they have.

I have just commited a change to stattrans.pl which makes it possible.
You basically just need to pass an additional '-f' flag pointing the
script at some pre-calculated website hit statistics. Until this is
integrated into the periodic website build, you can try this at home:

cd ..../webwml
cvs up -dPR Perl english locale YOURLANG
wget http://people.debian.org/~porridge/hits.txt.gz
gunzip hits.txt.gz # these are from October at bellini.d.o
mkdir trans-stat
./stattrans.pl -h trans-stat -w . -v -f hits.txt
www-browser ./trans-stat/index.html # hover over filename to see hit count


If "-f" is not specified, the script behaves like it had until now.

The hits.txt file was generated with something like the following, on
bellini.debian.org

(zcat /var/log/apache2/www.debian.org-access.log-201010??.gz)|
perl -n \
 -e '@f=split;' \
 -e '$s = $f[6];' \
 -e '$s =~ s,\...\.html,,;' \
 -e '$s =~ s,/$,/index,;' \
 -e '$S{$s} += 1;' \
 -e 'END{' \
 -e '  printf "%d normalized URLs\n", scalar keys %S;' \
 -e '  foreach my $k (sort { $S{$b} <=> $S{$a} } keys %S) {' \
 -e '    printf "%8d %s\n", $S{$k}, $k' \
 -e '  }' \
 -e '}' > hits.txt

Obviously the parsing might be improved, and it should be done
periodically on an active debian mirror, and propagated to the host
which generates the website (www-master?).

-- 
Marcin Owsiany <porridge@debian.org>             http://marcin.owsiany.pl/
GnuPG: 1024D/60F41216  FE67 DA2D 0ACA FC5E 3F75  D6F6 3A0D 8AA0 60F4 1216


Reply to: