Re: Include git commit id and git tree id in *.changes files when uploading?
Hi!
On Mon, 2025-12-15 at 20:26:51 -0800, Otto Kekäläinen wrote:
> In *.changes files we already have the Vcs-Git line as metadata
> showing where the packaging sources are maintained with an exact URL
> and a `-b <branch>` identified if the upload was not from the default
> branch.
>
> To be better able to audit the software supply-chain I have been
> thinking that we should have more git info in the changes file, namely
> the git commit id it was generated from, and just in case also the git
> tree id as well.
>
> The git commit id (`git rev-parse HEAD`) is derived from what the
> chain of contents+commit messages was, and the git tree id (`git
> rev-parse HEAD^{tree}`) is derived from the file contents, so we can't
> embed either into the packaging repository itself (e.g. as extra lines
> in d/changelog) as they would reference itself in a circular manner.
> They must be put in some file that describes the upload _after_ the
> final git commit was made, and I think the changes file would be
> ideal. It already has the Vcs-Git header anyway, and it is the file
> any system processing the upload will see and can then act upon as
> needed.
>
> Has somebody else already been thinking about the same? Do others see
> value in this?
I think integrating this somehow into dpkg-buildpackage (in its current
form) and in a universal way might be tricky, due to the multitude of
git workflows, and because it is not in charge of fetching the source,
so it can only trust what is in the source tree. But let me try to do
a shallow pass over it (which means I might miss stuff!), to see how
this could look like.
If this was to be added, I think .dsc would be the more appropriate
file, because .changes is a file that gets processed during uploads
(including binary-only ones) and its information then gets set aside.
Also the file that contains the Vcs-* fields is .dsc not .changes.
If dpkg-source were to add that kind of information, it should be
reliable and usable. But my hunch is that this tool cannot easily
guarantee that. Things that come to mind (some of which have already
been mentioned in the thread):
- If you keep your home under git, doing a «dpkg-source -x» under it
and then a «git rev-parse» will print an ID for a repo that has
nothing to do with the source package. I think this also means
that monorepos cannot be supported, because trying to find their
root, and not confuse it with something else it is going to be
tricky. And anything that is not going to end up as part of .dsc
(or its referenced files), cannot be validated.
(I guess the equivalent of --git-dir=srcpkg-root/ and/or
--git-dir=srcpkg-root/debian/ should be used.)
- If you do variants/equivalent of «apt source --download-only»,
«dpkg-source --skip-patches -x», «git init», «git add -A»,
«git commit -m Import», to avoid the mess that is dealing with
random git workflows. Then you'd get information for a local
throwaway repo.
(I guess the code should check whether there's a remote that
matches the Vcs-Git field, and whether the upstream branches
match the local one.)
- The code would need to check that the repo is clean, and that's
going to be annoying to do with a mix of patches applied/unapplied
git workflows, and dpkg-source only being called to build the
source (but obviously not to extract it).
(I guess repos with patches applied could be declared
unsupported, and then dpkg-source could check for cleanliness
before preparing the source tree and record that somewhere.)
So, barring other problems I might have missed (and happy to hear them
if someone can come up with new ones), I guess it might not be too
onerous after all to add this kind of information for a specific set of
git workflows, but certainly not in a universal way. I think it would
also need to be added in a new field, because the way the tag2upload
ones are specified they do not allow other such generators.
Thanks,
Guillem
Reply to: