[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#641468: lintian: update the lab layout (i.e. use pools)



On 2011-09-13 18:04, Niels Thykier wrote:
> Package: lintian
> Severity: important
> 
> 
> Jakub realized the source of a lot of our errors on lintian.d.o are
> caused by limitations in the file-system.  We should probably use
> a pool or something similar to reduce the amount of elements in
> each dirs.
> 
> ~Niels
> 
> 
> 

I guess it might be a good time for a little status update here.

The lab-refactor branch is now working for "simple" use cases[1].
However, the lintian.d.o-style usage needs some attention.

In the master branch we use $lab/info/* as a list of "what was in the
mirror last time we checked".  Those files have been repurposed in the
lab-refactor branch, where their new meaning is "what is currently in
the lab".  This means that "dist" search[2] is currently broken.
  To my knowledge there are *2* known cases where "dist" searches make
sense - lintian.d.o and lintian.debathena.o.  I feel we should move that
functionality to a new frontend (such as the "lintian-harness"[3]) that
would focus lintian.d.o-like setups.

Note that "repurposing" is not entirely complete and therefore
reporting/harness is more or less broken right now.  One of the issues
is that unpack/* still use the files in info/* as a dist list and not a
lab list.


I also considered adding a file in info/ to keep track of lab-wide
(meta)data, such as the lab-format.  In the old lab format, this is
stored in every entry.  This makes is slightly more difficult to check
if we are dealing with a compatible lab.
  Consider if you use an "old" lintian to use the new lab style - they
do not store the entries the same place, so it has no reliable way to
detect it is not compatible.  I would prefer that an old lintian would
always be able to say "The lab uses a newer lab-format that this version
of lintian supports" - even if this case will "probably never happen".


I am also wondering what we need in the "per-entry" lintian-status file.
 In the master branch, we store Lintian-Version, Lab-Format, Package
(name), Version (package), Type (package) and Timestamp.
  When we read the status file, we compare lab-format, package version
and timestamp.  With the changes in lab-refactor branch, the lab always
supports multiple versions of the same package, thus the package version
comparision is a no-op.

As I understand it, the timestamp is there to make lintian "re-unpack"
the package if it changed since the last run.  Currently it completely
removes the entry if the timestamp does not "match".  Though this code
only makes sense for "personal" static labs - on the lintian.d.o case,
the version of a package can not be reused (at least not in general).
  The timestamp-part is not in the lab-refactor branch (yet?).

I am considering to replace the "Lab-format" value with an
"entry-format-version".  Not sure it makes sense, but I thinking it may
make migration to newer formats easier.
  If I had not (ab)used the oppertunity to do optimizations in the
.lintian-status file (see below), the migration from the current to the
lab-format would basically just have been a bunch of "mv X Y" + updating
info/*.

Finally, I have added a "Collections" entry to the .lintian-status file.
 This is used to keep track of which collections have been run and
removes the need for ".$coll-$ver" files.
  This will reduce our (expected) file-creation from 18 to 1 per binary
package[4].  For a full mirror run 18 files per binary package roughly
translate to 630 000 files[5].  The udeb and sources we go from 10 and 8
to 1.


So to sum it up:  I am repurposing $lab/info/* files to be a manifest of
what is in the lab (rather than what is on the mirror).  I am breaking
"dist" search and suggest we create a separate frontend for
archive-checks that supports "dist" search.
  I am considering to add a metadata file in $lab/info/ to include stuff
like "Lab format" version.  I have removed data from the (per-entry)
.lintian-status files.  The (per-entry) ".$coll-$ver" files will be
removed and the .lintian-status file will track those.

Any comments?  If not I will (hopefully) get the branch ready to be
merged into master within 2-3 weeks - so if you have not reviewed the
branch yet, now would be a good time to start.  :)

~Niels

[1] That would be single package checks:
lintian $pkg

but also simple static-lab usage

lintian --lab $lab --setup-static
lintian --lab $lab --unpack $pkg[,..., $pkgN]
lintian --lab $lab -r $pkg[,..., $pkgN]
etc.

[2] The "check packages from mirror" search, i.e.

lintian --lab $lab $pkg[,...,$pkgN]

will first check the mirror and then fallback to the lab.  I suggest we
only check the lab in this case.

[3] http://lists.debian.org/debian-lint-maint/2011/08/msg00285.html

[4] 17 binary collections + 1 lintian status file.

[5]  Assumes 35 000 binary packages.  Though currently "only" 576 000
files are created due to the file system limitations (~32 000 binary
packages).




Reply to: