[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Validating tarballs against git repositories



Colin Watson <cjwatson@debian.org> writes:

> On Mon, Apr 01, 2024 at 11:33:06AM +0200, Simon Josefsson wrote:
>> Running ./bootstrap in a tarball may lead to different results than the
>> maintainer running ./bootstrap in pristine git.  It is the same problem
>> as running 'autoreconf -fvi' in a tarball does not necessarily lead to
>> the same result as the maintainer running 'autoreconf -fvi' from
>> pristine git.  The different is what is pulled in from the system
>> environment.  Neither tool was designed to be run from within a tarball,
>> so this is just bad practice that never worked reliable and without a
>> lot of complexity it will likely not become reliable either.
>
> The practice of running "autoreconf -fi" or similar via dh-autoreconf
> has worked extremely well at scale in Debian.  I'm sure there are
> complex edge cases where it's caused problems, but it's far from being a
> disaster area.

Agreed.  I'm saying it doesn't fix the problem that I perceive that some
people appear to believe, i.e., that running 'autoreconf -fi' solves the
re-bootrapping problem.  Only some files get re-generated, such as the
./configure script, which is good, but not all files.  It wouldn't have
solved the xz case: build-to-host.m4 wouldn't have been re-generated.

With a *-src.tar.gz approach [1], the build-to-host.m4 file shouldn't
even be part of the tarball.  That would be easier to detect during an
audit of list of files compared to git repository, rather than waiting
for code review of file content (which usually only happens when
debugging some real-world problem).

[1] https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/

> I don't think running ./bootstrap can be generalized as easily as
> running autoreconf can, and it's definitely going to be tough to apply
> to all packages that use gnulib; but I think the blanket statement that
> it's bad practice is painting with too broad a brush.  For the packages
> where I've applied it so far (most of which I'm upstream for,
> admittedly), it's fine.

I'm not saying autoreconf -fi is bad practice, I'm saying it is
incomplete and leads to a feeling of having solved the re-bootstrapping
problem that isn't backed by facts.

>> I have suggested before that upstream's (myself included) should publish
>> PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
>> checkout including submodules,
>
> A while back I contributed support to Gnulib's bootstrap script to allow
> pinning particular commits without using submodules.  I would recommend
> this mode; submodules have very strange UI.

I never liked git submodules generally, so I would be happy to work on
getting that to be supported -- do you have pointers for earlier works
here?

What is necessary, I think, is having something like this in
bootstrap.conf:

gnulib_commit_id = 123abc567...

and it would then use the external git repository pointed to by
--gnulib-refdir and locate that commit, and extract the gnulib files
from that gnulib commit.  And refuse to continue if it can't find that
particular commit.

This is essentially the same as a git submodule -- encoding the gnulib
commit to use in the project's own git history -- but without the bad
git submodule user experience.

I use different approaches to gnulib in projects. In OATH Toolkit I
still put all gnulib-generated content in git because running
./bootstrap otherwise used to take several minutes.  In most projects I
have given up and use git submodules.  In some I rely on running
gnulib-tool from git, and the exact gnulib git commit to use is only
whatever I happened to have checked out on my development machine.

>> *.po translations,
>
> As I noted in a comment on your blog, I think there is a case to be made
> for .po files being committed to upstream git, and I'm not fond of the
> practice of pulling them in only at bootstrap time (although I can
> understand why that's come to be popular as a result of limited
> maintainer time).  I have several reasons to believe this:

Those are all good arguments, but it still feels backwards to put these
files into git.  It felt so good to externalize all the translation
churn outside of my git (or then, CVS...) repositories many years ago.

I would prefer to maintain a po/SHA256SUMS in git and continue to
download translations but have some mechanism to refuse to continue if
the hashes differ.

/Simon

Attachment: signature.asc
Description: PGP signature


Reply to: