[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: New method for Packages/Sources file updates



Goswin von Brederlow wrote:
[snip]
> >> With cumulative patches you run into the problem that you need a new
> >> cummulative patch for every day that contains most of what the
> >> previous one did. That realy quickly becomes a space issue.
> >
> > Errm, no, it doesn't need _one_ new cumulative patch. All the
> > previously made cumulative diffs need to be updated.
> 
> I was thinking of a
> 
> -1day.diff
> -2day.diff
> -3day.diff
> ...
> 
> So every day a new file appears at the end and contains most of what
> all the others already contain.
> 
> Updating those cummulative diffs is also either inefficient (cat the
> daily diffs together),

That wouldn't be a cumulative diff.

> very hard (figure out how to make a minimal diff
> from the daylies) or you need every days Packages file (apt-dupdate
> does that).

It is not "very hard" to re-diff a few files to incorporate the changes
between old and new Packages file.

> Having to store and diff every past days Packages file is a huge
> resource drain and can't be done for more than a couple of days, maybe
> up to 2 weeks.

You don't need to store it.

> Ask the apt-dupdate author for how long it takes every night and how
> much disk space it uses.

If that's true, then apt-dupdate is an example how to not do it.

> > If we assume to hold 14 update cycles, have a cutoff if the size of
> > the cumulative diff exceeds the size of the Packages file, and have
> > linear growth of the diffs, then the additional space used is at most
> > seven times the size of the Packages file. Normally it will be much
> > less, because large archives don't thend to change that quickly.
> 
> 14 update cycles is a limitation on the process and isn't needed with
> sorted Packages files.

It is not a hard limit, and to speedup exorbitant numbers of update
cycles isn't needed except for pathologic cases.

> Also how do you get 'seven times'?

... linear growth of the diffs ...

> Say every day one package changes
> bt on the last nearly ever package changes. That means all 14
> cummulative diffs will be the size of the Packages file (change as
> many packages as possible but so that all stay below the cutoff).
> 
> That would be nearly 14 times the space.

Which is the (unlikely) worst case.

> > [snip]
> >> >> - extra space needed for the diff files
> >> >
> >> > Which is minimal in comparision to the archive size.
> >> 
> >> Not for something like snapshots.debian.net. They do have a tad more
> >> Packages files than debian has. And why waste even a byte if it is
> >> absolutely not needed to achive the same?
> >
> > Again, snapshots shouldn't have any need for updating a snapshot.
> 
> Yes they do. Every time a new version of a Package is released the
> Packages file updates. And it never gets smaller. Those would be
> perfect for date sorted.

This makes no sense at all for a single package (which was at least the
example you cited).

[snip]
> > No, they won't if cumulative diffs are used.
> 
> Tell me how you plan to create the 30 cumulative diffs each
> day. Storing the Packages files as plain text wastes too much
> space. bunzip2ing them every night takes too long. Just diffing them
> is also not that fast.

Sorry, but if your mirror server is that slow, then you can't afford
to do anything.

> Or for 60 days, which would still be <50% the size.
> 
> >> For stable and especially security the amount of change will be even
> >> less and even more diff files would still be worth it. The size would
> >> be smaller but the number of files higher.
> >
> > I can't follow you. stable would have three additional diffs by now.
> 
> stable-proposed-updates
> 
> What I mean is that each change is very small. So the diff files don't
> grow much and a large amount of diffs is still below the size of the
> Packages file.

So you would have the normal Packages.gz and probably 30 small diffs.
That's quite ok for tracking stable-proposed-updates.

> It is not like sid where you have 100+ package changes every day.
> 
> > For stable-security I assume it's either tracked closely or very
> > infrequently. Providing a slightly faster update in the latter case
> > doesn't seem to be worthwile.
> 
> The date sorted method gets it for free.

No, munging archive state in the Packages file isn't "for free".

[snip]
> >> Do you have any benefits for diffs apart from applying them is
> >> simpler?
> >
> > They keep a backward compatible Packages file which is proven to
> > work with old tools. Furthermore, updating on the server side
> > can be done by a simple script which invokes diff a few times.
> >
> > The latter is especially interesting for partial mirror scripts
> > which usually fail to implement a decent parser for Packages files.
> 
> How would a diff be better for a mirror script that doesn't parse
> Packages files? You still need a Packages file parser. You lost me
> there.

... fail to implement a _decent_ parser ...


Thiemo



Reply to: