[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Rsync and incremental updates on top of Original CDs



On Sun, 25 Feb 2001, Richard Atterer wrote:

> On Sat, Feb 24, 2001 at 01:18:26AM +0100, J.A. Bezemer wrote:
> > 
> > On Fri, 23 Feb 2001, Richard Atterer wrote:
> 
> ["jido-file copy-matching" command]
> > > True, something like that would be useful. I think you could actually
> > > get away with appending to potato.iso the information regarding which
> > > files have already been copied into the image, and remove that
> > > "trailer" when you're finally done. (It might be wise to use a name
> > > "potato.iso.tmp" and only rename it to "potato.iso" when finished.)
> > 
> > Nice idea, but AFAIK shortening a file is impossible (at least C
> > doesn't seem to have a function for it). And besides, it would be
> > very helpful if the "already-done" file was just plain ascii text
> > (md5sum with filename as comment?) that could be changable easily,
> > for example to force a file to be downloaded again (for whatever
> > reason).
> 
> True, shortening of files was (erroneously?) omitted from ANSI C - but
> it *is* present in POSIX. Linux has truncate(2) and ftruncate(2), the
> latter is POSIX.

Ah, indeed! I appear to have a manpage for it, but it isn't mentioned in the
glibc documentation where I searched initially. But why did they call it
_f_truncate when in fact it does _not_ use a FILE* stream?!

However, is this operation supported by all filesystems/OSes? For example 
dosfs on Linux/*BSD and Windows 95 (only NT is posix certified IIRC)? 

> I wonder - will it be necessary to, say, download a file again? 
> jido-file will only write it to the image if its MD5 sum matches the
> one in its list - that's a pretty strong check!

One example that comes to mind is untrustworthy memory in the disk write
cache. cdimage.d.o had this problem when the 2.2 rev0 images were created,
resulting in three (of 28) CD images with one single-bit error each.

(Since positions and md5sums of individual files in the image are known, I
could imagine some "jido-file check-all-files-in-this-image" tool ;-) 

In general, people (like me) just want to be able to mess around with things,
which is much easier if files are in plain ASCII format.

> > The .tmp name for an unfinished image surely would make things a bit
> > clearer to unexperienced users. I can imagine some "jido-finalcheck"
> > tool that checks the md5sum (gotten from...where? 
> > --md5sumfile=MD5SUMS ? image=okay if matches any listed md5sum?) and
> > if okay renames the .tmp to .iso.
> 
> Hm, shouldn't jido-file be doing this itself? I.e., print a message
> "Not finished yet" and return 1, until the last missing file has been
> supplied, then print "Finished", rename the file and return 0.

Printing the message is okay (and desirable), but people shouldn't burn the
image until it's md5-checked, so keeping the .tmp filename for a while
wouldn't hurt.

[..]
> WRT this whole scheme, there's one further thing we need to think
> about: Unlike the current pseudo-image-kit, it is assumed that all the
> "puzzle pieces" that the CD image is assembled from are in fact
> available. Is this guaranteed with the current way the Debian archive
> is handled? For example, do security updates immediately replace
> packages in the current stable release? They mustn't for this to work!
> 
> I know, you can always use rsync for any remaining missing bits, but
> ideally, all this ought to work without rsync.

The stable packages archive is not supposed to change in any way until a next
revision is released. So there's only a small period of time that jidoing
would be impossible (without rsync), namely until new CD images for that
revision are released. In general, people will understand this and won't mind
waiting a little while for the latest and greatest.

Of course we can keep people even more happy by (manually/automatically) 
copying anything that's not on the FTP sites any longer directly from the (now
outdated) CD images to some special weblocation, and letting users download
from there. But thanks to the package pools this can be solved much more
cleanly by creating an "old-stable" (or "stable-last-cd-images")
distribution/"suite" that keeps referring to everything needed for the old CDs
as long as new images are not yet available. And as long as any suite refers
to a file, that file doesn't get deleted. Of course this requires some
coordination with the FTP maintainers, but that shouldn't be a big problem. 

Which reminds me: the doc/ directory on FTP is constantly changing. This isn't
very much (3-4 MB), but it's replicated on many CDs. One option would be to
include it literally in every template that needs it, but a separate .tgz
would probably be wiser (just "download; untar; jido-file copy-matching" 
before anything else).


Regards,
  Anne Bezemer



Reply to: