[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Announcing derivatives patches and call for help and feedback

Hi all,

Up to now the only options for pulling patches from distributions
derived from Debian have been Ubuntu's Debian patches repository[1] and
manual downloads of source packages from derivatives. In my estimation a
more general way to do this would be desirable.

     1. http://patches.ubuntu.com/

As part of my ongoing work on integrating[2] information about our
derivatives into Debian I have been working on a solution to this. The
scripts that I wrote gather the apt sources.list snippets for each
distribution in the derivatives census[3][4], download the apt
repository metadata, iterate through each source package, determine
(using snapshot.d.o data) what the package status (new, modified, or
unmodified) is and generate patches for the modified packages.

     2. http://wiki.debian.org/Derivatives/Integration
     3. http://wiki.debian.org/Derivatives/Census
     4. http://wiki.debian.org/Derivatives/CensusFull

It is not ready to be run on a regular basis, but I have done a full run
across the derivatives from the census, downloading 36G of files and
generating 150G of patches (there is a bug), 100M of changelog
name/version tuples in JSON format, 619M of lsdiff cache, 2.4M of MD5 to
SHA1 mappings and 24M of human-readable patch names.

The raw results can be seen at [5], [6] and [7]. The data at [5] and [6]
contain only the patches 15MB or smaller, larger ones are at [7]. There
were a small number of gigantic patches in [7] that were removed due to
their size. You can browse the patches by Debian package name in the
patches subdir, with the patch filenames indicating which distribution
and source packages were compared. You can find the patches for a
specific distribution by entering its subdir and then entering the
patches subdir. Each patch is accompanied by a patch of just the debian/
directory. You can find a YAML index of modified source packages in the
sources.links files in the subdir for each derivative. There are similar
indices for new source packages (sources.new) and "useful" patches
(sources.patches). The sources.log files contain debugging output from
the script that generated the results.

     5. http://dex.alioth.debian.org/census/
     6. wagner.debian.org:/home/groups/dex/census/var/
     7. stabile.debian.org:/home/pabs/census/var/

Please note that these are *raw* results only. The next step from here
is to determine how to filter these raw patches and how to present them
to Debian maintainers and other interested parties.

For the presentation side of things I am thinking one approach might be
to move UbuntuDiff[8] to the QA infrastructure, generalise it and
enhance it for this purpose. This will necessarily include mechanisms to
mark patches as having been dealt with or ignorable.

     8. http://ubuntudiff.debian.net/

For the filtering side of things I need some insight into the patches
themselves to determine ways to decide which ones are useful to present
to maintainers and which are not. I have taken a look at some of these
patches myself, but that just does not scale. So I invite the Debian
community to help me out here, please take a look at a few patches and
reply if you find one that could be filtered out *automatically* by some
code to be written in the future and give some indication of how to do
the automatic filtering. No need to mention changelog-only patches, I am
well aware of those. Since most of you will probably only want to look
at the patches for packages you are maintaining, I manually generated a
dd-list[9] of the patched packages. Since Ubuntu is the source of the
majority of these patches and Ubuntu patches are already presented on
the PTS I also generated a dd-list with Ubuntu patches excluded[10].

     9. http://dex.alioth.debian.org/census/patched-packages-dd-list
    10. http://dex.alioth.debian.org/census/patched-packages-except-ubuntu-dd-list

I would welcome any other feedback you have on this effort. In
particular I am interested if you find any instances of where the script
has made poor choices of what to diff.



Attachment: signature.asc
Description: This is a digitally signed message part

Reply to: