[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: IDEA to SERIOUSLY reduce download times!

On Thu, Jul 08, 1999 at 11:39:04PM -0400, Daniel Burrows wrote:
> On Fri, Jul 09, 1999 at 12:05:12PM +1000, Brian May was heard to say:
> > Sounds like a good idea. Just one point though:
>   [quotage of me snipped :-) ]
> > I would seriously consider adding some sort of check (eg encode the
> > MD5SUM of the orginal file somewhere) so that an error can be produced
> > if the system operator altered the original file (something I have done
> > from time to time). In this case, the system operator knows to download
> > the full version, and not just the patched version.
> > 
> > -- 
> > Brian May <bam@snoopy.apana.org.au>
>   Excellent thought.  This prompted me to look up something that I
> remembered, and I was correct.  (whew! :) ) This is not (mostly) a problem,
> thanks to xdelta.  From the xdelta man page...
>      Patch
>        The patch subcommand has the following synopsis:
>        xdelta patch [ option...  ] patch [ from [ to ]]
>        Applies patch  to  from.   If  from  was  omitted,  XDelta
>        attempts to use the file with the original from file name.
>        The from file must be identical to the one used to  create
>        the delta.  Its MD5 checksum is used to verify this condi­
>        tion.  The constructed file will be written to  to  unless
>        to  is  named  "-"  or  the original to file name if to is
>        omitted.
>   It looks like xdelta does something like this already, checking before it
> patches that the file is correct.
>   The one thing I'm unsure about is how this interacts with gzipped files (I
> plan to test this :) ) -- xdelta has 'intelligent' handling of gzipped files,
> meaning that it performs a patch on the contents rather than on the file
> itself.  This gives better patches I suppose (I haven't personally tested it
> but it makes sense to me :) ) but could be a slight problem, in that it can
> change the md5sum of the file (due to changes in the compression level).  xdelta
> probably handles this correctly (doing the md5sum on the uncompressed data) but
> it could confuse tripwire and schemes that compare md5sums against the
> 'official' Debian archive.

Forcing a GZIP=-9 environment variable should be ok since the policies
ask this for all docs and other items /in extension/. bzip2 compression
should be more problematic however. I think the best solution should be
to make tripwire/schemes more intelligent about this things. I don't
know how they work so, that just speculations.

BTW, I think that having a deb-diff should not replace orig-deb packages.
First, it's ridiculous to ask for a complete repacking of all packages for
people who have a full internet connection. It's a waste of disk space.
It's also increase download time when you install new packages: you'll
have to download both the orig-deb and the deb-diff.

Finally, about my comment that we can always get back to download the
full packages:

The patching will always be done on the download phase, before
installation. That's important for the prerm and preinst scripts.
If the patching fails, we can optionnaly decided to go further and
download the full package version instead.

[Patches only in diff not in full pack.]
Also, the Patches: field should only stay in the deb-diff. Why?
It's for apt when it has to check if he can simply download a deb-diff
instead of the deb file. My english isn't quite good, here an

User select package foo-bar 1.1-4 which is available both has a diff
(with Patches: foo-bar (=1.1-1) ) and has a deb (without the Patches
Apt check if foo-bar (=1.1-1) is in status. No. However, the user
has foo-bar 1.1-3 with a tag Patches: foo-bar (=1.1-1). So apt know
that he can remove the patches from foo-bar 1.1-3 to have a 1.1-1
and then patches it again. However, if the user install directly
foo-bar 1.1-3 without having foo-bar 1.1-1 previously install, apt
doesn't have choice to install the full foo-bar_1.1-4.deb.

That's why I want to versioning the Patches: field. Look at the
way package version appears:
First you have a new packages with, most of the time, some little
patches applied to it in a short period of time:
day1: foo-bar 1.0-1
day2: foo-bar 1.0-2 Patches: foo-bar 1.0-1
day5: foo-bar 1.0-3 Patches: foo-bar 1.0-1

Then, a somewhat longer time pass where the debian packages seems
stable enough. At this time, most users have time to upgrade their
packages to foo-bar 1.0-3. Suddenly, a new upstream version appears,
say 1.1. What should be the better choice for the package maintainer
to do? First thoughts can lead to:

foo-bar 1.1-1 Patches: foo-bar 1.0-1
foo-bar 1.1-2 Patches: foo-bar 1.1-1

However, more chance that a lot of people have already upgrade to
1.0-3 so, a more economic way should be to start with
foo-bar 1.1-1 Patches: foo-bar 1.0-3

But what for next? Inexperience maintainers will tend to think,
with an abuse of confidence that the best thing to do should be
to patches against the new version 1.1-1. But that's is to
forget that most people will still be at 1.0-3 and don't even
have the time to upgrade between the short term first debian revision.
So, I think a more pratical should be to suggest this:

dayX   foo-bar 1.1-1 Patches: foo-bar 1.0-3
dayX+1 foo-bar 1.1-2 Patches: foo-bar 1.0-3
dayX+5 foo-bar 1.1-3 Patches: foo-bar 1.0-3
dayX+50 foo-bar 1.1-4 Patches: foo-bar 1.1-3

where foo-bar 1.1-4 is a late appearing security patche in the
postrm script (for example).

>   Daniel
> -- 
>   Using a metaphor in front of Ridcully was like a red rag to a--was like
> using something very annoying in the presence of someone who was very annoyed
> by it.
>               -- Terry Pratchett, _Lords and Ladies_

I will try to make a resume of all this and what need to be done
(proposal for policies, changes need by policies, apt, etc.).

I think I will keep the idea of a separate directory/site cause is the most
scalable one.



PS: I just run into your last mail (w Python). Looks great. Maybe we can
try to go further and make a C version so we can have both a Perl and
a Python one, and utimately incorporate it in libdpkg/libapt... :-)

Have a good day.

Fabien Ninoles                                             GULUS founder
aka Corbeau aka le Veneur Gris               Debian GNU/Linux maintainer
E-mail:                                                    fab@tzone.org
WebPage:                      http://www.callisto.si.usherb.ca/~94246757
RSA PGP KEY [E3723845]: 1C C1 4F A6 EE E5 4D 99  4F 80 2D 2D 1F 85 C1 70

Reply to: