[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: apt-get wrapper for maintaining Partial Mirrors



On Saturday 20 June 2009 03:16:33 Goswin von Brederlow wrote:
> Joseph Rawson <umeboshi3@gmail.com> writes:
> > On Friday 19 June 2009 12:57:25 Goswin von Brederlow wrote:
> >> Or have a proxy that adds packages that are requested.
> >
> > When I woke up this morning, I was thinking that it might be interesting
> > to have an apt method that talks directly to reprepro.  It's just a vague
> > idea now, but I'll give it some more thought later.
>
> Way too much latency to mirror a deb when requested and you need to
> run apt-get update for it to show up.
>
> The best you can do is add the package to the filter list and then
> fetch it directly. Then the next night the mirror will pick it up for
> future updates.
>
What I had in mind would eliminate a large part of the latency, and also keep 
from downloading the deb twice.

Use a server application (I'll call it repserve for now) on the machine that 
hosts the reprepro repository.  

apt-get update
The apt method talks to repserve, then repserve tells reprepro to run either 
update or checkupdate, then repserve feeds the appropriate files from the 
reprepro lists/ director(y/ies) back to the apt-get process on the local 
machine.  This would probably use a bit more bandwidth (at least for the 
first update) since apt-get will download .pdiff files, where reprepro just 
grabs the whole Packages.gz files.

apt-get install, upgrade, build-dep
The apt method determines which source in it's apt lists to retrieve the 
package from, then sends that info to repserve.  Repserve looks in it's 
repositor(y/ies) to determine where those packages are (or if they aren't yet 
mirrored), probably by scanning the filter lists.  Repserve then tells 
reprepro to update in the appropriate repositories (if necessary).  Then 
repserve signals the local client (or local client polls repserve), and the 
debs are then transferred from reprepro repos to local client.  After that, 
the repserve process could instruct reprepro to retrieve the sources, if it's 
configured to do that.  Also, it could try and determine build deps for those 
packages, and retrieve them and the sources, if it's configured to do that as 
well.  With retrieving builddeps enabled, there might be a problem in having 
to explicitly list preferred alternatives, but this is mainly for packages 
that have drop-in replacements for libfoo-dev, like libgamin-dev provides 
libfam-dev.

This is still just a rough idea.  One of the interesting things about using an 
idea like this, is that it can still allow reprepro to be used in the normal 
way, so you can have a couple of machines that instruct repserve to help 
maintain the repository, and other machines on the network can just use 
reprepro directly through apache, ftp, etc.  The "controlling" machines would 
have a sources.list like:

deb repserve://myhost/debrepos/debian lenny main contrib non-free

The repserve method on the client would send that line to the repserve server.  
The server would parse the line and match it to the appropriate repository 
from its configuration.

The other hosts would just have this in sources.list:

deb http://myhost/debrepos/debian lenny main contrib non-free

The hosts using repserve could be the only ones with filter lists in reprepro, 
but it may be desired to have filter lists from the other machines, also.  
This would help keep packages from disappearing from the pool when they are 
still needed.  It may also be nice to use reprepro's snapshotting each time a 
repserve method updates a repository, although this may require using those 
snapshot urls on the hosts that aren't using repserve.


>
> But now you made me think about this too. So here is what I think:
>
> - My bandwidth at home is fast enough to fetch packages directly. No
>   need to mirror at all.
>
> - I don't want to download a package multiple times (once per host) so
>   some shared proxy would be good.
>
My idea would keep that from happening, at the expense of latency.  The 
latency would be minimal, as it would just be dependant on reprepro 
retrieving the package(s) and signalling the client that the package is 
ready.  Using reprepro to add extra packages to the repository from upstream 
without doing a full update may not be possible, but if it were, the latency 
would certainly be minimum, and the bandwidth to the internet would also be 
minimum.  I just looked at the manpage again, and this may be possible by 
using the --nolistsdownload option with the update/checkupdate command.


> - Bootstraping a chroot still benefits from local packages but a
>   shared proxy would do there too.
>
> - When I'm not at home I might not have network access or only a slow
>   one so then I need a mirror. And my parents computer has a Linux that
>   only I use and that needs a major update every time I vistit.
>
> So the ideal setup would be an apt proxy that stores the packages in
> the normal pool structure and has a simple command to create
> Packages.gz, Sources.gz, Release and Release.gpg files so the cache
> directory can be copied onto a USB disk and used as a repository of
> its own.
>
Getting reprepro to do this would save a lot of the hassle, but getting 
reprepro to act as an apt proxy is also tricky.  The current cache and proxy 
methods in the apt-proxy and apt-cache packages don't work as well in making 
a good repository, as opposed to reprepro.

The Release could be signed using an rsign method with the machine(s) that 
manage the repository, or it could be done locally on the server using 
gpg-agent, or an unencrypted private key, depending on how the administrator 
prefers to manage it.

> Optional the apt proxy could prefetch package versions but for me that
> wouldn't be a high priority.
>
> Nice would be that it fetches sources along with binaries. When I find
> a bug in some software while traveling I would hate to not have the
> source available to fix it. But then it also needs to fetch
> Build-depends and their depends. So that would complicate matters a
> lot.
I mentioned that part above.
>
> MfG
>         Goswin

Overall, I think that reprepro does a good job of maintaining a local 
repository, and we shouldn't reimplement what it does.  Reprepro also seems 
flexible enough to implement most of the backend with simple commands and 
options.  I've never tried to implement a new apt-method before, so I think 
that would take a bit more research from me.

My uses:

- I have an automated installer that I test and improve frequently.  Using a 
local mirror is a requirement for this.  A partial mirror would help to keep 
me from using as much space, and keep from downloading packages I'll never 
use.

- I've been using full mirrors, but I need a partial mirror that I can carry 
with me, so I can use the installer on location, instead of having to bring a 
machine back with me.

- I have a mirror of lenny-backports (source only).  When I need to backport a 
package, I install a builder machine (using the automated installer) with 
virtualbox, and send a .dsc from that mirror to the builder machine using 
cowpoke, then send the package to the local repository (in this case, 
separate from the source mirror, where the packages are set for auto-install, 
instead of having to use the -t option in apt).  It's also separate, since 
there are a few packages from sid in there as well, that aren't at 
backports.org.





-- 
Thanks:
Joseph Rawson

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: