[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: New mirror scripts for Debian mirrors



>  >Now, I would *love* to have hourly runs. Yes, I realize that *currently*
>  >this is no option, thanks to the amount of files rsync has to check.
>  >We need something better here first. I dont know yet what. Maybe the
>  >"batch-mode" from rsync (never really tried it), maybe something totally
>  >different. If someone has good ideas, I'm happy to hear them. (Even if
>  >they go as far as changing the archive structure).
> What an enthusiastic mirror/archive boss! :-) Count me in for hourly
> updates. I just think that the probability of you convincing the other
> bosses to radically change the archive structure for the sake of so
> short updates is of the order of the inverse of Avogadro's number :-)

Don't underestimate the power of those that do the work. :)

I don't see a too big problem in changing the archive structure, if that
is done early in the release process.... (And if the changes make sense).

If we find a nice and clearly better way to lay out pool/ and stuff
- fine, we sure will consider it.
(I dont think the old way of having it sorted by release was better than
what we have now.)

> Seriously, the structure of the archive is very bad for updates
> because all files from all releases are mixed. This forces the useless
> stat of *many* files that never change. Besides, Debian is the largest
> distro. The combination of these two factors make it by far the
> heaviest distro to update.

Yes. What we need most is a more efficient "Counting files" in rsync. Or
a way to avoid this.

> A method that is efficient enough for hourly updates is to use a
> change file provided by the master. The mirrors pull this file, parse
> it and create a --files-from that rsync uses to pull just new stuff,
> without having to do any scavenging. This doesn't work for hardlinks
> but can be used for pool, doc and project dirs, because they don't
> have hard links. Then the mirrors do a standard sync of dists and
> indices, which is fast because they only have about 10,000 files.

Why doesn't it work for hardlinks? It would be a changed file compared
to the old list. rsync using -H should then figure out itself "Ah, well,
hardlink, fine, keep it a hardlink".

> The change file has to include info on removed files as well. They
> have to be removed "by hand" because rsync doesn't remove files
> without a directory scan, which is exactly what we want to avoid. Of
> course the master could provide the ready --files-from, which rsync
> can pull automatically, and the list of deletions to be done after the
> sync of dists and indices. The more centralized the process the more
> reliable it is.

This process only works if every mirror always updates when we push. If
you miss one you have to do a "normal" sync run. Unless we keep multiple
of those "change files" around. And then you have to ignore errors if
you parse an old one (missing files for example), but have to bail out
if the latest still gives you errors. Seems a little bit error prone.


> This method would also significantly reduce the load on all mirrors,
> so it'd be very welcome even without hourly updates (hint, hint) :-)

Feel free to provide code :)

-- 
bye, Joerg
[Talking about Social Contract]:
We will not discriminate noone[...]
[So we discriminate anyone?]


Reply to: