[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: t2u in the archive



On 01.07.24 12:46, Aigars Mahinovs wrote:
Yes and no. See what the git tag actually contains and what the GPG
signature actually signs is just the one hash of the commit object.
This commit object then refers to the other files of the repo, but the
GPG signature does not directly sign those.
So it signs them indirectly instead. I don't consider that to be a problem.

There's no material difference whether the tag signs a commit that hashes a tree that (eventually) hashes the files, or a list of the files plus their hashes, or a tarball of the files in question (except that the way we do the latter is too brittle – it depends on the file order and compression used).

The single advantage of including a file list would be if it included the files' SHA256-or-better hashes, but given the difficulty of finding *and* exploiting a SHA1 collision it's a judgment call whether that's worth the effort.


Creating an oversized tag object isn't a good idea IMHO. For reference, the list of files in the Linux kernel is 3.2 MBytes (git would compress that down to 450kB or so), plus 10 MBytes of sha512sums (compressible to 5MB by definition); granted that the kernel is an extreme example but it's not the only one.

Also, these tags are not just pushed to Salsa, pulled by t2u server, and subsequently ignored. Anybody who clones or pulls from an archive is likely to also pull its tags. I suspect (but would need to verify) that git does not do delta encoding when it sends tag objects, and we shouldn't depend on that in order to be reasonably efficient.

If we do decide that a second hash is worth the effort, I *strongly* recommend to simply add an (optional) field with the output of "git ls-files -z | xargs -0 sha512sum | sort | sha512sum" to the tag. This has the exact same security implications as a list of paths and their sha512sum but is a heap of orders of magnitude smaller.

One slight disadvantage of this scheme is that you'd need the full list on both sides to figure out exactly what went wrong, but if that ever happens people need to look at the situation on both ends anyway; repeating this command without the trailing pipe to "sha512sum" would be the least of our problems. Alternately the originator could keep the output around for a few days.

doing things this way provides for a rather simple
verification pathway that does not involve running git or parsing git
objects.

I agree that we might want a git-the-C-binary-independent way of working with git objects, but gitoxide[0] already exists (written in Rust).

If we really want to do the bare-bones thing, the above "git ls-files -z | …" can easily be replaced with something like "find . -path ./.git -prune -o -type f -printf '%P\0' | …".

[0] https://github.com/Byron/gitoxide

--
-- regards
--
-- Matthias Urlichs

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


Reply to: