[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: New method for Packages/Sources file updates



Goswin von Brederlow wrote:
[snip]
> >> Following Seteves idea of marking removals in the Packages file itself
> >> I've come up with the following:
> >
> > I think the whole approach is needlessly complicated. Could you
> > read the idea covered in the thread starting at
> > http://lists.debian.org/debian-devel/2004/07/msg00128.html
> > and explain what's wrong with it in your opinion?
> >
> >
> > Thiemo
> 
> + no performance difference for daily updates (+- a few bytes)

+ Full backward compatibility. Changing the Packages format is likely
  to break some tools.

> - preformance penalty for repeated patching of the same package
>   (e.g. the zsh-beta upload every odd day)
>
> - compression penalty due to lots of small files instead of one big
>   one from gzip, even worse with bzip2
>
> - performance penalty due to lots of small files instead of one big
>   one from apt-method, forking gunzip, forking patch

Client-side performance is mostly irrelevant. Also, this particular
set of problems can be solved by using cumulative diffs instead of
several incremental ones.

> - multi pass method where a failure in any one of them is fatal

Failure isn't fatal, it just triggers fallback to the full
Packages file. Btw, this can also be solved by cumulative diffs.

> - timestamp on package is timezone/clock dependent but the index
>   should protect that

You haven't read the complete thread. The timestamp problem can be
avoided.

> - extra space needed for the diff files

Which is minimal in comparision to the archive size.

> - limited update interval to stop the extra space from exploding,
>   2 weeks suggested

Rather a heuristics based on patch sizes << Packages size and the
number of update cycles. The absolute timespan isn't a good measure,
just think about the typical update cycles for unstable, stable and
security/stable.

>   while my method can cover full releases for a few
>   K extra

True. OTOH, covering many update cycles isn't that useful for typical
use.

> - new files that mirrors won't pick up for a long time,
>   can only be used on mirrors that are reconfigured to mirror diffs too

I think full mirrors alredy cover the whole directory contents, so this
is only a problem for partial mirrors. It's also non-fatal, as it falls
back to the current method (failing index file accesses are the only
difference).

Reduced server load provides an incentive to those mirror admins to
change their scripts.

> - no benefit for rsync or zsync

True. OTOH, low bandwith users are unlikely to update their machines
via rsync.

> - not applicable (due to number of files) to archives with hourly
>   updates (like amd64, and we might even do 15m updates to prevent
>   Build-Depends stalls)

This suggests interested parties do frequent updates anyway. This
eventually allows to shorten the timespan covered, which means the
number of files won't increase much.

> - probably unusable on snapshots.debian.net like archives with tons of
>   Packages files due to too many tiny files

Which is a good thing, since archived Packages files aren't supposed
to get updated. :-)

> Need any more? :)

Yes.


Thiemo



Reply to: