[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Mapping Reproducibility Bug Reports to Commits




I am a researcher at the University of Waterloo, conducting a project to study reproducibility issues in Debian packages.

The first step for me is to link each Reproducibility-related bug at this link: https://bugs.debian.org/cgi-bin/pkgreport.cgi?usertag=reproducible-builds@lists.alioth.debian.org to the corresponding commit that fixed the bug.

However, I am unable to find an explicit way of doing so programatically. Please assist.

There is no explicit link.

Most (but not all) debian packages are maintained in a VCS and there are fields in the source package
that identify the location and type of the VCS (almost certainly git nowadays), but there are multiple
different workflows used (git-buildpackage is the most common and normally uses a "patches-unapplied"
git tree, but there is also dgit which uses a "patches applied" git tree. Git trees may or may not
contain the upstream source. At least one language community uses a system where the git tree stores
files that are used to generate the Debian packaging rather than the final Debian packaging itself.

Also maintainer practices for strucuring commits vary, some maintainers update the changelog at the same
time as making the actual changes, others update the changelog in a batch later.

Sometimes bugs aren't even closed from the changelog at all but instead are closed by the maintainer
after the upload. Particularly if the maintainer is not sure whether a change will fix the bug.

With all that said, it's probably doable to develop heuristics that map bug numbers to commits in most
cases, an outline might be.

* Check if the package has a VCS and the relavent changelog can be found in said VCS, if there is no VCS give up and reffer the bug for human attention.
* Map the bug number to a changelog line (if there is no such mapping, give up and reffer the bug for human attention)
* Determine which commit added the changelog line (e.g. with git blame), see if there are actual code changes in that commit,
* if so take it as the probable commit, if not then search backwards a bit for a commit message that matches
the changelog line.

Another option having guessed a range of commits from the changelog and/or from comparing the VCS to the
source packages may be to run a bisection, this would likely require some effort to detect what workflow
is in use though.


Reply to: