[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

gzip --rsyncable (was: Re: De-vendoring gnulib in Debian packages)



"Theodore Ts'o" <tytso@mit.edu> writes:

> On Sun, May 12, 2024 at 04:27:06PM +0200, Simon Josefsson wrote:
>> Going into detail, you use 'gzip -9n' but I use git-archive defaults
>> which is the same as -n aka --no-name.  I agree adding -9 aka --best is
>> an improvement.  Gnulib's maint.mk also add --rsyncable, would you agree
>> that this is also an improvement?
>
> I'm not convinced --rsyncable is an improvement.  It makes the
> compressed object slightly larger, and in exchange, if the compressed
> object changes slightly, it's possible that when you rsync the changed
> file, it might be more efficient.  But in the case of PGP signed
> release tarballs, the file is constant; it's never going to change,
> and even if there are slight changes between say, e2fsprogs v1.47.0
> and e2fsprogs v1.47.1, in practice, this is not something --rsyncable
> can take advantage of, unless you manually copy
> e2fsprogs-v1.47.0.tar.gz to e2fsprogs-v1.47.1.tar.bz, and then rsync
> e2fsprogs-v1.471.tar.g.... and I don't think anyone is doing this,
> either automatically or manually.
>
> That being said, --rsyncable is mostly harmless, so I don't have
> strong feelings about changing it to add or remove in someone's
> release workflow.

Your example had me convinced, and I thought some more about why we
really should keep using it as it consumes a small percentage more CPU
and disk space.  I have realized that another common operation is
storing and transfering _several_ different releases of e2fsprogs.  I
would suspect that most releases of software is fairly similar to the
previous release when uncompressed.  With gzip --rsyncable, the tarballs
should then be mostly similar.  Without --rsyncable, they will largely
be different if I understand correctly.  This affects dedup-able storage
and transfer methods, and some anecdotical evidence suggests this
improvement is significant - going from 215GB to 176GB vs 13GB:

https://gitlab.archlinux.org/archlinux/infrastructure/-/merge_requests/429

Maybe someone could do some experiment to see if there is substance to
this argument, its not clear to me that the example is comparable.
Storing/transferring several releases for the same software could add
significant savings for larger set of archives.

As the downside seems fairly small, and the potential upside may be
significant, I will use and recommend --rsyncable for git-archive
release tarballs:

   git archive --format=tar --prefix=$PACKAGE-$VERSION/ HEAD | \
      env GZIP= gzip --no-name --best --rsyncable \
      > $PACKAGE-$VERSION-src.tar.gz

/Simon

Attachment: signature.asc
Description: PGP signature


Reply to: