[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Draft] Lintian-harness - supporting lintian.d.o-like setups



Hi,

It is time for another "major" lintian branch.

This time I am proposing we target making lintian.d.o-like setups
easier.  To my knowledge, only two of such setups currently exists and
both of them were done by (or with the help of) the Lintian
Maintainers.

Research indicates that Ubuntu in 2008 considered to improve the
current situation, but to my knowledge that their work never really
further than the blueprint (the spec has a link to the blueprint).

As written with "big fat letters", the below is a draft and it can be
changed.  It is based on the a private TODO-list, the Ubuntu Blueprint
and from what I picked up from discussions here and there.
  I hope we can get a good discussion on the topic.  After a bit of
discussion I will create a wiki-page for the specification.

I am also hoping that some of you are willing to invest a bit of
time coding or/and testing the solution as it develops.  Should you
be interested, doc/README.developers in the source code can hopefully
get you started.

A few parts included are mostly comments for the Lintian Maintainers;
on some internal parts (like the part about refactoring/removing
"unpacked/").
  Should there be parts you do not understand, please do ask about
them.  I may have unwillingly assumed you know the internals or
the workflow of the current code.  :)

~Niels



THIS SPECIFICATION IS A DRAFT and is subject to change.

This specification aims to provide official support for setting up
"lintian.debian.org"-like instances.

This is based on a existing Ubuntu Blueprint[UB] and the
private/TODO-file[TODO] in the Lintian source code.

[UB] https://wiki.ubuntu.com/LintianHarness

[TODO] http://anonscm.debian.org/gitweb/?p=lintian/lintian.git;a=blob;f=private/TODO;h=3936b340b730175ef98bd902785403b69d5437e5;hb=9505109c3006edf6f360da8da2d530c42337ee4f#l92

Requirements
============

Some of these are already supported in the current implementation;
they are listed here again for completeness.

 * A new frontend called "lintian-harness"
   - Must have a well-defined purpose and workflow.
   - Must be well documented.
     - it should not take a Lintian Maintainer to use it.
   - Must be cron-friendly
     - Expected to be the primary use method of the frontend.
   - Must (continue to) support incremental runs.
 * The resulting reports must be (re-)brandable.
   - (i.e.) Lintian may not be checking against the Debian
     Policy Manual, etc.
 * The lintian-harness will be shipped in a separate package
   that depends on the lintian package.
 * Must support fetching from http(s):// mirrors.

Ideas, Issues and Extensions
============================

 * The remaining scripts in unpack/ could be replaced by making the
   existing Laboratory code smarter.
   - reporting/ is one of the last consumers of unpack/
   - Zach suggested "sync'ing from a mirror" would be useful if
     Lintian was turned into a Static Analysis Framework.

 * Use locks when running.
   - Currently you have to manually disable the cron-job if if you are
     doing an "out of band" lintian run.

 * Hooks:
   - Allow local system specific code to be run (i.e.) after the html
     site has been updated.
   - (hopefully) "everyone uses the same frontend"

 * Migrate to template-toolkit?
   - There was some talk about it; it should be done before this spec
     is implemented.

 * Adding support for display comment overrides.
   - This would probably be a good time.

 * Testing
   - The lintian frontend itself is used by 300+ tests, so we are
     fairly certain it is not obviously broken, if there are no test
     failures.

     We can unit test some of the code used by lintian-harness, but
     can we do better and actually test the lintian-harness frontend
     (in some sane manner)?



Proposed Solution
=================


File System Layout
------------------

The website setup currently uses the templates directly from
the LINTIAN_ROOT.  This complicates updating templates, since
LINTIAN_ROOT will be overwritten on upgrades.

This can be solved by splitting the setup into four distinct major
components: LINTIAN_ROOT, SITE_ROOT, WORK_ROOT and HTML_DIR.

 * LINTIAN_ROOT is the base of the Lintian installation.
   Usually this will be /usr/share/lintian.
   - This is read-only for the lintian processes.

 * SITE_ROOT is configuration/setup rules for the site.  This
   is not (by design at least) public available via the HTML
   site.
   - The local admin/user can deploy site specific templates
     and configuration here.
   - This is read-only for the lintian processes.

 * WORK_ROOT is the root dir for lintian to write its cache and
   its logs.
   - This needs to be readable and writable by the lintian
     processes.

 * HTML_DIR is where the html site is written on the machine.
   lintian-harness will generate all the data presented here.
   - This is "write-only" for the lintian processes.
   - lintian-harness may delete HTML_DIR and its entire
     contents.  HTML_DIR is not allowed to be a symlink.
   - lintian will need to be able to create a directory
     in the parent of HTML_DIR (see below on HTML_DIR)
   - Should lintian-harness have configuration options to
     modify permissions (etc.) on HTML_DIR?  Not needed if
     the proper hook exists.

Lintian will ship a base SITE_ROOT in LINTIAN_ROOT/reporting, and can
create a SITE_ROOT based on this.  The local admin can then modify the
SITE_ROOT to fit his/her needs, setup the cronjob and then the setup is
complete.

The SITE_ROOT should have the following structure:

  SITE_ROOT/
    bin/
      ...
    config
    hooks/
      ...
    images/
      loco-small.png
      ...
    lintianrc
    lintian.css
    templates/
      index.tmpl
      ....

Any of the files or directories in SITE_ROOT may be a symlink, in
which case it is followed (regardless of where it points to).

"config" shall contain all the relevant configurations for
lintian-harness and "lintianrc" will be the configuration file for
lintian (if any).

templates/ will contain the relevant templates used by lintian-harness
to write the html output to HTML_DIR.  The contents of "images/" and
"lintian.css" will be copied (as is) to HTML_DIR.

hooks/ would contain executable scripts that will be run by
lintian-harness at the relevant point of the execution.

bin/ will be pre-appended to path by lintian-harness and can be used
to override some system commands.  Particularly symlinking
SITE_ROOT/bin/gpg to /bin/true can be used to disable gpg signature
checks (as done by dpkg-source, when extracting a source package).

Can we do something to assist the local admins in upgrading their
existing SITE_ROOT?


The WORK_ROOT has the following (default) layout:

  WORK_ROOT/
    laboratory/
      ...
    logs/
      lintian.log
      ...
    ...

Unless otherwise specified in SITE_ROOT/config, the laboratory will
be placed in WORK_ROOT/laboratory.

The logs directory will store the logs and some statistical data
collected by lintian and lintian-harness.  "savelog" shall be used
to maintain some past logs.

The lintian.log file will be copied to the HTML_DIR and is also used
by lintian-harness to create the incremental runs (see "incremental
runs" below).

By default WORK_ROOT may be used for other temporary / auxiliary files
(or directories) that can be used in a subsuquent run.  Particular see
"Fetching packages" below on having a package cache.

WORK_ROOT and SITE_ROOT may point to the same directory, but lintian
will need to create and edit files in WORK_ROOT, so it may complicate
making SITE_ROOT read-only.


HTML_DIR is where lintian-harness will produce its final output to be
served by a webserver.  When replacing the existing HTML_DIR,
lintian-harness will create a temporary directory and populate it with
the new contents.  It will then swap the HTML_DIR and the temporary
directories, followed by a removal of the old (renamed) HTML_DIR.

Problem with this is that there is a "minimal" time where HTML_DIR is
absent ("mv HTML_DIR old && mv new HTML_DIR").  Can we use some
other apporach that ensures that (the content in) HTML_DIR is always
present and consistent (without a ton of "mv -f new/.../file
HTML_DIR/.../file")?


Fetching packages
-----------------

On lintian.debian.org there is a local mirror available on the file
system.  Other setups may not want to or have the capacity to have
the mirror locally (even as an NFS mount).

The harness frontend should therefore support more than one method
for fetching the packages to be processed.  Having a local cache
may be useful to avoid unnecessary bandwidth usage, when doing a
full run.

If such a cache is implemented, the layout may need another directory
to ensure that LINTIAN_ROOT and SITE_ROOT does not need to be writable
by the user running lintian-harness.

Fetching packages via HTTP sounds a lot like something APT or aptitude
can do already, so perhaps this solution should use APT (possibly via
libapt-pkg-perl).

The old code needs access to the Sources file, the Packages file and
the packages downloaded.  The two former can most likely be replaced
by using APT's API to access the package metadata.

The second advantage of using APT as a backend for pulling packages is
that lintian-harness would automatically support fetching from any
protocol (or setup) that APT supports.

It seems to be a fair assumption that anyone wanting to setup a
"lintian.d.o"-like machine will have basic knowledge about APT.  That
being said lintian-harness should ship with some basic APT
configuration templates to be used by lintian-harness's APT module.

Incremental runs
----------------

The incremental runs work by lintian-harness analysing which
packages have changed, been removed or have appeared since the last
run.  It then filters out all tags for these packages from the
previous lintian.log.

Finally it instructs lintian to test the changed and new packages,
appending its output to the new lintian.log.  Once lintian has
terminated, lintian-harness will use the lintian.log to generate
the website.

Testing
-------

The lintian frontend itself is used by 300+ tests, so we are fairly
certain it is not obviously broken, if there are no test failures.

We can unit test some of the code used by lintian-harness, but can
we do better and actually test the lintian-harness frontend (in some
sane manner)?


Reply to: