kpcyrd <kpcyrd@archlinux.org> writes: > On 1/8/26 10:04 AM, Simon Josefsson wrote: >>> Right, we need to keep that in mind, even if git-archive happens to >>> behave like this right now. >> It doesn't behave like that right now. >> If you run 'git archive' from a set of common distributions release >> in >> the past 5 years you will have several different variants: >> 1) RHEL8/9, Ubuntu 24.04+, Debian 12+, Guix: modern variant. >> 2) RHEL 10 eco-system: zlib-ng, different compression. I'm trying >> to >> ignore this, but it is becoming harder as RHEL10 spreads. >> 3) Ubuntu 22.04 eco-system: export-subst has a long git describe >> substitution. >> 4) Debian 11 eco-system: no export-subst support. >> Comparing GitHub, GitLab, Codeberg etc generated archives (which may >> or >> may not use 'git archive' internally) over the last 5 years also gives >> different outputs. >> I don't think we can view 'git archive' as a stable output format. >> It >> is a temporary snapshot mechanism, and the format is in continous a >> moving target, and documented to be that. > > It boils down to two problems: > > - compression > - some niche git feature Debian may not have to support I disagree that upstream .gitattributes is something Debian will be able to ignore. Often this can be ignored, but there are situations where consideration is needed. Several upstreams rely on export-subst to embed versioning information for GitHub/etc git-archive release tarballs, and this is becoming increasingly common, and seems like a reasonable practice to me. If Debian import a upstream package without any consideration of this, this may lead to build failures and possibly worse. The example that was identified with golang-github-smallstep-certificates is an early indication. I suspect in that example it would not lead to a build failure or test failure. For several packages, like libtasn1 that was just released, I believe ignoring this may lead to both build failure and self-test failures. Or even worse things, sometimes versioning information has bearing on code logic. The reason we are not seeing any fallout is that Debian's libtasn1 package uses upstream 'make dist' tarballs. Moving to upstream 'git archive' tarballs, that where generated with export-subst enabled, would work. Then the Debian libtasn1 package maintainer won't have to read through and review all automake/autoconf-generated files when doing the update. Moving to upstream git import without setting up the tags in the proper way will lead to a build and self-test failure. That's because the build logic either will use the export-subst value, or rely on git to provide it with correct information about the version, which is inherited from the tags. If the Debian git repository is using some other set of tags than upstream, things will break. I do not suggest anything involving tag2upload or dgit is changed. I believe they are right to ignore .gitattributes and export-subst, otherwise there is just madness to pristine source changes. However, maintainers needs to be aware of this problem when packaging upstreams that make use of it, and work around any problem of tag2upload source tarballs not respecting .gitattributes when generating *.orig.tar.*. > This approach has worked well for Arch Linux since 2024. :) Arch Linux does not import upstream source code into git and build from those sources, does it? So this complexity is not relevant for Arch. I find ArchLinux, and the majority of other source packaging practices, better than Debian's approach. I don't think there is any intrinsic value in importing upstream git sources into a modified Debian variant of upstream, and building from that. The approach leads to all these complexities, for no apparent real practical gain. Most other distributions are faster at getting upstream releases into their archives, and they are doing things differently. Debian is continuing to do this because we all started down this road and don't know how to do anything else. We need more roads to different places, though, so I think this is fine. > [1]: I looked up when SHAttered was published (the point in time that > computer people essentially agreed that SHA-1 is insecure) and it was > February 2017 apparently, almost 9 years ago. Time flies. You should consider 2004 for that consensus: https://www.schneier.com/essays/archives/2004/08/cryptanalysis_of_md5.html https://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html I'm surprised people carry a notion that SHA1 is fine. It is not. I would agree that there are many larger problems to worry about first, but saying SHA1 is fine is turning a blind eye to reality. >> It was reproducible unless the git repository see further commits. >> Pruning later commits somehow from the git bundle should be possible, >> and then things would be reproducible again. But I don't know how. >> There is some advice from git people how to do it: >> https://lore.kernel.org/git/871pv2jx4a.fsf@josefsson.org/t/#md469596b6b95790efe045e408b1d2f19503048cd >> However it looked so hacky I really didn't want to go down that >> road, >> hoping someone else would come up with a better way to do this. > > I was about to suggest `-c core.abbrev=no`, then realized it's based > on git pack files (that I'd rather consider git-internal). If it's > just for "convert a git tree-ish to some binary blob we can reason > about" then `git archive` might be our best option still. Reproducible git archives is a relatively well understood and solved problem, I think. We need reproducible git bundles for some packages, like gnulib, though. /Simon
Attachment:
signature.asc
Description: PGP signature