Re: Idea: rsync-based source format
Guillem Jover writes ("Re: Idea: rsync-based source format"):
> On Fri, 2015-08-21 at 16:32:09 +0100, Ian Jackson wrote:
> > (I spoke to Guillem about this at Debconf and promised to write it up
> > so he could think about it properly at his leisure.)
> [ Checked this over DebConf, but then I could not find you on the
> venue anymore. :) ]
Thanks for taking the time to read it. I left the venue on Saturday
night (at about 4am...) and didn't come back on Sunday.
> I did some preliminary quick pondering and got some concerns, and I think
> perhaps a workable alternative solution that might cover your needs?
> Actually you should be able to represent at least these with a git format
> patch, which are already supported by the latest patch program (its only
> current limitation AFAIR is binary file deltas), and which is required by
> dpkg-dev to be able to properly handle them at extraction time.
I did think about this, but: I absolutely want binary file deltas too.
And the use of patch files with these kind of features is very new,
so I'm not sure I want to trust it. Also, it would make it hard to
backport support for the new format, which is definitely something we
would want to do.
> > It would be more like a successor to 1.0 with diff, than 3.0 (quilt)
> > is, in that it wouldn't represent a patch stack, merely a tree.
> (From the code PoV, and from the properties you describe it would
> probably be more a successor of 2.0 than 1.0, but sure.)
Heh. I don't really care what we call it ...
> > It also contains an rsync batchfile P_V-R.rsync.Z.
> This is what triggers my concerns. I was not aware of rsync batchfiles!
Many people aren't.
> So I took a quick look at the man page only (I've not dug further),
> and I've got the impression this might not be a good format for long
> term storage, given that it seems to rely on the rsync protocol itself
When building the batchfile, dpkg-source would specify the protocol
version to use. I imagine we would fix it at 28 or 30.
> (it is already at version 28; does the program remove support for
> ancient protocol versions for example)?
This is a reasonable question. I think that it would be a good idea
to talk to rsync upstream before using rsync batchfiles as an archival
format for long term (decades) storage.
According to the rsync OLDNEWS file, protocol version 28 was released
in 2.6.1 in April 2004. The minimum version supported by sid's rsync
is protocol version 20, from April 1999.
According to the manual the batchfile format changed in 2.6.3 (Sep
2004), but (according to the OLDNEWS) at that stage batch mode was
still experimental. (The OLDNEWS file doesn't seem to clearly say
when batchmode became non-experimental.)
Looking at OLDNEWS I think we would probably want require rsync >=
2.6.6 (Jul 2005), because we would need the --only-write-batch option
that was introduced the.
Overall, rsync has an absolutely stellar record for reliability,
stability, and compatibility. Many many people have been using it for
many years. I think it almost inconceivable that rsync would
deprecate an old protocol version on a timescale that would be a
problem for Debian releases. If they did, you would also find that
you couldn't do normal (non-batch) rsync between the relevant
versions of Debian, either.
> It also ties the implentation of the format to the rsync tool,
> because I assume we'd not want to reimplement it ourselves(?), and
> keep in sync with upstream over time. And as such it would require
> pulling rsync into the build-essential set practically forever,
> because once there are such source packages around dpkg-source
> should be able to at least extract them (well it could get demoted
> to Recommends in case we switched to something else).
I don't see that adding rsync to the build-essential set is a problem.
rsync is extremely portable and has very limited build-dependencies.
libacl and libattr are surely already in the needed-for-essential set,
let alone needed-for-build-essential. I'm not sure whether libpopt is
already in the needed-for-boostrap-to-build-essential set, but its
only build dependencies are debhelper, dh-autoreconf, and gettext.
> I'd recommend looking into git format patches, which should be a
> stable interchange format, are already supported by our dpkg tools
> (although by delegating the work to GNU patch), and should be able
> to represent the changes you mentioned before. Not sure if they would
> take more space, although I'd assume that should not make much of a
> difference once compressed with something like xz.
I think the difference between our perspectives is entirely due to our
different view of rsync. Perhaps you just haven't got as much value
out of rsync as I have.
I find it difficult to say how awesome I have found rsync to be. It
is software of extraordinary quality.
> In case we'd still wanted for whatever reason to distinguish this new
> format from a quilt one, I guess we could always add a new one such as
> «3.0 (delta)» or similar.
> Or would that not work for you for some reason or I've missed something
> very obvious?
Well, I am wary of the new patch features. They aren't widely used.
While patch is a good program with a reasonable history, it does not
have rsync's excellent record.
rsync's ability to reproduce an identical tree, via an rsync
protocolstream, is tested and verified very frequently on wide range
of trees by people around the world.
Many people rely utterly on rsync for on their backups, without even
performing a verification step - and, while not ideal, this is not
even very foolish! I _do_ verify that the backup client tree and the
backed up data are identical and I have found two harmless and
extremely obscure bugs in a decade and a half (FTR, bugs in
--link-dest, which wouldn't affect dpkg-source's use of rsync).
rsync batchmode is not tested to the same degree, but it uses the same
protocol stream infrastructure, so the complex code paths we would be
using are the same ones as everyone else is using.
Ultimately, if you're worried about format stability and software
quality, I would suggest that picking a ten-year-old rsync feature is
a better idea than a brand new (or maybe not even implemented yet)