Re: Upstreams with "official" tarballs differing from their git
On 2/15/25 12:10 PM, Stéphane Glondu wrote:
I realize my previous email was a bit short: I was wondering if this
.tbz still source code because in the autotools world, package sources
come with configure scripts ready to run, but the good practice in
Debian is to regenerate those from configure.ac.
Well, we enter a philosophical debate that is not specific to OCaml and 
probably should be discussed elsewhere... Adding debian-devel to get 
more opinions.
Summary to other debian-devel readers: we are facing some upstreams that 
publish "official" tarballs that differ from what is in their git. The 
differences may include: variable substitutions, generated files... I 
guess this is pretty common (cf. autotools). Moreover, the build system 
behaves differently if it is called from git or not, or from extracted 
official tarballs or not.
IMHO, traditionnaly, "source code" from Debian point of view is whatever 
upstream releases as "official" tarballs (i.e. elpi-2.0.7.tbz), which 
may differ from what is in upstream git (i.e. v2.0.7.tar.gz). What makes 
me think that is the special care that is taken in keeping upstream 
tarballs pristine (with their signatures...).
[...]
What do you think about the topic?
My e-mail is very opinionated, I would really like to hear other opinions.
hello! ✨
disclaimer upfront, I know pretty much nothing about ocaml, this is 
based on my experience with C/Rust/Go/etc.
I think the concept of "building the source code into source code" [sic] 
that is common with autotools, is just the regular build in a trenchcoat 
and should happen on Debian build servers. This is to avoid forcing a 
gap between the VCS and Reproducible Builds that nobody feels 
responsible for. Coincidentally this topic was also discussed in 
#reproducible-builds irc yesterday.
With regards to signatures, quoting from an email I wrote briefly after 
the XZ incident[0]:
> It's from the old mindset of code signing being the only way of 
securely getting code from upstream. Recent events have shown (instead 
of bothering upstream for signatures) it's much more important to have 
clarity and transparency what's in the code that is compiled into 
binaries and executed on our computers, instead of who we got it from. 
The entire reproducible builds effort is based on the idea of the source 
code in Debian being safe and sound to use.
[0]: https://lists.debian.org/debian-devel/2024/04/msg00125.html
I know Debian attempts to regenerate the autotools files, but there is 
no way to tell if this actually worked, I vaguely remembered XZ was 
specifically one of the cases where it didn't.
In other news, note there's currently a push within Arch Linux to move 
away from upstream custom tarballs towards VCS snapshots:
https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/46
Also because people found this interesting yesterday, Arch Linux and 
Debian disagree on "what's the source code of curl 8.12.1":
Arch Linux: 
https://whatsrc.org/artifact/sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130
Debian: 
https://whatsrc.org/artifact/sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac
Diff between those two:
https://whatsrc.org/diff-right-trimmed/sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130/sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac
Even if we got some kind human to review the source code in entirety for 
us, which one should they review?
sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130?
sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac?
Both?
cheers,
kpcyrd
Reply to: