On 01.07.24 12:46, Aigars Mahinovs wrote:
Yes and no. See what the git tag actually contains and what the GPG signature actually signs is just the one hash of the commit object. This commit object then refers to the other files of the repo, but the GPG signature does not directly sign those.
So it signs them indirectly instead. I don't consider that to be a problem.There's no material difference whether the tag signs a commit that hashes a tree that (eventually) hashes the files, or a list of the files plus their hashes, or a tarball of the files in question (except that the way we do the latter is too brittle – it depends on the file order and compression used).
The single advantage of including a file list would be if it included the files' SHA256-or-better hashes, but given the difficulty of finding *and* exploiting a SHA1 collision it's a judgment call whether that's worth the effort.
Creating an oversized tag object isn't a good idea IMHO. For reference, the list of files in the Linux kernel is 3.2 MBytes (git would compress that down to 450kB or so), plus 10 MBytes of sha512sums (compressible to 5MB by definition); granted that the kernel is an extreme example but it's not the only one.
Also, these tags are not just pushed to Salsa, pulled by t2u server, and subsequently ignored. Anybody who clones or pulls from an archive is likely to also pull its tags. I suspect (but would need to verify) that git does not do delta encoding when it sends tag objects, and we shouldn't depend on that in order to be reasonably efficient.
If we do decide that a second hash is worth the effort, I *strongly* recommend to simply add an (optional) field with the output of "git ls-files -z | xargs -0 sha512sum | sort | sha512sum" to the tag. This has the exact same security implications as a list of paths and their sha512sum but is a heap of orders of magnitude smaller.
One slight disadvantage of this scheme is that you'd need the full list on both sides to figure out exactly what went wrong, but if that ever happens people need to look at the situation on both ends anyway; repeating this command without the trailing pipe to "sha512sum" would be the least of our problems. Alternately the originator could keep the output around for a few days.
I agree that we might want a git-the-C-binary-independent way of working with git objects, but gitoxide[0] already exists (written in Rust).doing things this way provides for a rather simple verification pathway that does not involve running git or parsing git objects.
If we really want to do the bare-bones thing, the above "git ls-files -z | …" can easily be replaced with something like "find . -path ./.git -prune -o -type f -printf '%P\0' | …".
[0] https://github.com/Byron/gitoxide -- -- regards -- -- Matthias Urlichs
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature