[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: New method for Packages/Sources file updates



Goswin von Brederlow wrote:
[snip]
> >> - preformance penalty for repeated patching of the same package
> >>   (e.g. the zsh-beta upload every odd day)
> >>
> >> - compression penalty due to lots of small files instead of one big
> >>   one from gzip, even worse with bzip2
> >>
> >> - performance penalty due to lots of small files instead of one big
> >>   one from apt-method, forking gunzip, forking patch
> >
> > Client-side performance is mostly irrelevant. Also, this particular
> > set of problems can be solved by using cumulative diffs instead of
> > several incremental ones.
> 
> The number of ftp connections needed is highly relevant. For http the
> penalty isn't that big but still adds up.
> 
> With cumulative patches you run into the problem that you need a new
> cummulative patch for every day that contains most of what the
> previous one did. That realy quickly becomes a space issue.

Errm, no, it doesn't need _one_ new cumulative patch. All the
previously made cumulative diffs need to be updated.

If we assume to hold 14 update cycles, have a cutoff if the size of
the cumulative diff exceeds the size of the Packages file, and have
linear growth of the diffs, then the additional space used is at most
seven times the size of the Packages file. Normally it will be much
less, because large archives don't thend to change that quickly.

[snip]
> >> - extra space needed for the diff files
> >
> > Which is minimal in comparision to the archive size.
> 
> Not for something like snapshots.debian.net. They do have a tad more
> Packages files than debian has. And why waste even a byte if it is
> absolutely not needed to achive the same?

Again, snapshots shouldn't have any need for updating a snapshot.

> > Rather a heuristics based on patch sizes << Packages size and the
> > number of update cycles. The absolute timespan isn't a good measure,
> > just think about the typical update cycles for unstable, stable and
> > security/stable.
> 
> Think about unstable main. That is where most of the updating (user
> and archive) happens and most of the benefit will come from.
> 
> The amount of new packages for October is 691Kb as gzip. That is still
> less than 20% of the full file. Providing update intervals of over a
> month for unstable is still worth it. That is over 30 diff files in
> your case and then multiple updates of the same packages will
> cummulate in the diffs.

No, they won't if cumulative diffs are used.

> For stable and especially security the amount of change will be even
> less and even more diff files would still be worth it. The size would
> be smaller but the number of files higher.

I can't follow you. stable would have three additional diffs by now.

For stable-security I assume it's either tracked closely or very
infrequently. Providing a slightly faster update in the latter case
doesn't seem to be worthwile.

> >> - not applicable (due to number of files) to archives with hourly
> >>   updates (like amd64, and we might even do 15m updates to prevent
> >>   Build-Depends stalls)
> >
> > This suggests interested parties do frequent updates anyway. This
> > eventually allows to shorten the timespan covered, which means the
> > number of files won't increase much.
> 
> Not realy. The buildd will do an update before each package build
> (usualy just getting a HIT). That does not mean that users will do any
> more frequent updates than now.

Then a "newly built" archive should probably be used for the buildd,
sparing users/mirrors from the inconvenience of an archive which is
almost always update_in_progress.

I think the official buildds use incoming.d.o for that.

> >> - probably unusable on snapshots.debian.net like archives with tons of
> >>   Packages files due to too many tiny files
> >
> > Which is a good thing, since archived Packages files aren't supposed
> > to get updated. :-)
> 
> I ment those files:
> 
> | apt-get specific package(s)
> | 
> |  deb http://snapshot.debian.net/archive pool package ...
> |  deb-src http://snapshot.debian.net/archive pool package ...
> | 
> | where package is source package name as debian/pool directories.
> 
> They have quite a lot of those. :)

Any reason why those tiny files should ever get any download
optimization?

> >> Need any more? :)
> >
> > Yes.
> >
> >
> > Thiemo
> 
> Do you have any benefits for diffs apart from applying them is
> simpler?

They keep a backward compatible Packages file which is proven to
work with old tools. Furthermore, updating on the server side
can be done by a simple script which invokes diff a few times.

The latter is especially interesting for partial mirror scripts
which usually fail to implement a decent parser for Packages files.


Thiemo



Reply to: