[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#696960: [reporting] Should we retire/replace or improve the packages_X.html pages?



Package: lintian
Version: 2.5.11
Severity: minor

In reporting/html_reports, there is the following comment for the part
that generates the "packages_X.html" files:

"""
# FIXME: Does anyone actually use these pages?  They're basically unreadable.
"""

I had the chance to grep around in the Apache access logs for
lintian.d.o and the comment appears to be mostly justified.  I had
access to about 14 days of logs (20121215 - today).

All in all there were about 89 000 lines in the log.  Here we see
about 14k requests for /tags/<something>.html[1], 27.5k requests
for "full reports" and 20.5k requests for "/maintainer/X.html"[2].

And the "packages_X.html" pages? There are 509 requests for those[3].
Filtering out bots (and wordpress^Wwebsite exploit attempts) we are
down to at most 340 accesses[4], so there are on average 24.3 daily
requests from users.
  Personally I consider that daily average a bit surprising, given the
user has to find his/her target in the middle of ~5000 other links.  I
generally find it easier to use the PTS as indirection (but I do have
the PTS on "speed dial", so ymmv.).

It seems to me that the primary purpose for these pages is to find the
report for the package you want to see, but you cannot remember the
maintainer of the package (or any of its uploaders).

So basically it is used as a "source package -> report" mapping.  But even
if you know the source package you want, you have to access an index page
and then use your browsers "find/search" tool to locate the link.
  More importantly, you cannot do a blind "lintian.d.o/source/$pkg"
reference.  You have to know the maintainer (or an uploader) of a
package to get a reference to its report.

If we want to stick to entirely static serving, I suspect we could get
away with a Apache rewrite rule and a symlink farm.  Concrete example
being something like:

  RewriteRule ^/source/((lib.).*)\.html$ /by-source/$1/$2.html#$2 [NE,L,R]
  RewriteRule ^/source/((.).*)\.html$ /by-source/$1/$2.html#$2 [NE,L,R]

And then have a symlink layout of:

  by-source/0/0ad.html -> ../../maintainer/<somebody>.html
  by-source/0/0ad-data.html -> ../../maintainer/<probably-same-as-above>.html
  ...

Technically, we could also just dump a html file with a "meta
http-equiv" redirect for each source package in a directory.  That
said, I am not too happy with the idea of 18.5k+ files in a single
directory.

~Niels

[1] zgrep "GET /tags/" lintian.debian.org-access.log* | wc -l

[2] Respectively:
zgrep "GET /full/" lintian.debian.org-access.log* | wc -l
zgrep "GET /maintainer/" lintian.debian.org-access.log* | wc -l

[3] zgrep "GET /packages_" lintian.debian.org-access.log* | wc -l

[4] zgrep "GET /packages_" lintian.debian.org-access.log* | grep -vi -e crawler -e bot/ -e /bot -e spider | grep -v '/wp-content/.*HTTP/1' | wc -l


Reply to: