Hi all, Since the debdiff security issues got fixed I've now enabled daily generation of patches between Debian and our derivatives. If you are interested in helping fix some of the issues with this process, please take a look at the FIXMEs in the script. If you are a Debian member you can look at the raw data on stabile or if not you can look at the daily rsynced output of patches smaller than 15MB on alioth. http://anonscm.debian.org/gitweb/?p=dex/census.git;a=blob;f=bin/compare-source-package-list sftp://stabile.debian.org:/srv/qa.debian.org/export/derivs/census/var http://dex.alioth.debian.org/census/patches/ sftp://alioth.debian.org:/home/groups/dex/census/var/patches This is enabled by the existence of snapshot.debian.org, which uses PostgreSQL database for metadata and a hash-based (SHA-1) filesystem structure to store all source and binary packages uploaded to Debian as well as all the apt metadata. The patch generation works like this: Download the Sources files using apt-get run on the sources.list snippets on the census wiki pages of all derivatives. For each source package in each derivative: Check if the dsc has ever been in Debian, if not, check if the other parts have and therefore decide if the package is unmodified or not. Unmodified source packages are skipped and include those with the exact same dsc file or those where all the non-dsc parts are identical. Try some heuristics (name, version, changelog entries) to find out if the package could be based on some package that is or was in Debian. If it was not then skip to the next one and make a note, since Debian might want to know about source packages that are missing from Debian. If it was then use debdiff to create a diff and filterdiff to create a diff of the debian/ dir. Use the lsdiff cache to decide if the patch should be displayed (for eg on the PTS) or not. I think I will drop this lsdiff bit and move it to a future to-be-worked on interface to the patches. Here are some stats about the last run: Ubuntu took 3 hours, all the rest finished in less than 1 hour, mainly due to the extensive caching done by the script: 3.0M symlinks mapping between MD5/SHA-256 hashes and SHA-1 hashes for those files where the apt metadata for derivatives do not have any SHA-1 hashes. If you are responsible for the archives of any derivatives that are missing SHA-1 hashes in your apt metadata, we would greatly appreciate it if you could fix your metadata 27M symlinks mapping between human-readable patch names and the patches directory, which uses SHA-1 hashes for file/dir names. The human readable names look like this: Debian_icu_4.4.2-2_Ubuntu_icu_4.4.2-2ubuntu0.11.04.1.patch 145M changelog source package, version number cache for the modified packages from derivatives (JSON format). 1.1G lsdiff output for all the patches. 57G of files that were never in Debian (according to the snapshots database), including orig.tar.gz/diff.gz etc. 164G of patches, most of this is 204 patches larger than 100MB each that are created due to deficiencies in the script (see the FIXMEs) and also in some cases unnecessary divergence or changes. -- bye, pabs http://wiki.debian.org/PaulWise
Attachment:
signature.asc
Description: This is a digitally signed message part