[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: detailed lists with archive contents - more than just Contents



Since Paul Wise advertised dedup.debian.net already, I have a few more
bits.

This was an afternoon proof-of-concept thingy that kind of accidentally
got a debian.net pointer, but so be it. It seems to be somewhat useful.
The service basically records checksums of all regular files in Debian
sid main and provides a web interface to this database.

On Thu, Feb 21, 2013 at 02:34:16PM +0100, Andreas Beckmann wrote:
> What would be much more helpful for archive wide problem analysis
> (without requiring you to download and unpack the whole archive)
> would be a full-detail list of all package content:
> 
> * package
> * fullpathname
Both stored.
> * type (file, directory, symlink, ...)
Everything except regular files (even hard links) is ignored.
> * owner
> * group
> * permissions
Missing, but easily extensible.
> * size (for regular files)
Stored. 
> * symlink target (for symlinks)
Missing.
> * md5sum (for regular files)
Currently storing sha512, but easily changeable.
> * marked-as-conffile
> * ...
Missing.

> Are there tools to generate them?

The software is available in
git://murkel.subdivi.de/~helmut/debian-dedup.git, but currently lacks
any documentation. (Hey it was just a proof-of-concept, right?)

All data is obtained by examining binary packages and stored in a sqlite
database. I am currently importing sid main. A snapshot of the database
at a time where letters "a" till "libm" were imported is available at
http://murkel.subdivi.de/~helmut/dedup.sql.gz (175MB). I intend to
replace this dump with a larger one once the import is complete.

And finally to answer Zack's mail: I added the missing service
information in git, but I need to wait with deploying until the import
run is complete.

So yeah, bug reports, comments and of course patches are welcome.

Helmut


Reply to: