[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: APT : Packages file to large to download everytime.



On Sat, 9 Jun 2001, Steven Hanley wrote:

> On Sat, Jun 09, 2001 at 10:05:21AM +0100, Edward Betts wrote:
> > How about putting the package file in an SQL database like mysql or postgres,
> > it would be harder to mirror, but nobody would be downloading complete package
> > files so it would not be a problem. Include a `Last Updated' field in the
> > package table, and run:
> > 
> >  select * from packages where updated > $last
> > 
> > Where $last is the date + time of the last update.
> > 
> > With the data from the server and data already on the machine it would be
> > possible to build the package file.
> > 
> > Or the same might be possible using ldap, but I do not have much experience
> > with ldap.
> > 
> > Of course this would require a client of some kind on every Debian machine to
> > access this it.
> 
> I must say this sounds like a pretty neat idea (well except for the one major
> flaw).

I agree, but not on the flaw. See below.

> Have a Packages.gz still hanging around, but get the majority of people to use
> the db based interface, and just have it send to the server the last UTC at
> which it checked the archive, the server can then send back the Packages file
> entries that have been added since then, the local machine can merge them into
> the Packages file it has locally.

indeed.

> This I assume would put a lot less load on the server than trying to rsync a
> packages file (assuming of course the packages file was compressed with the
> rsycnable gzip)
> 
> The major flaw I see is that you are not going to be able to run a database of
> some kind and the stuff required to do all this on most debian mirrors, which
> would restrict the usage a lot, and people would continue to just grab he
> latest Paackges file from their local mirror.

If you use LDAP for this, I don't think this would be much of a
problem; LDAP is optimized for a "many read, not that many
write" situation, and when setup well, can serve millions of clients
easily (at least, that's the intent ;-)

One could, for instance, add a "lastChanged" attribute to the LDAP-schema,
and then let clients do a search for entries where lastChanged > last time
client updated, so that only changed entries are looked at.
Then a few LDAP "packages" servers could be setup -- scattered over the
world, so that people can connect to a closer mirror of the LDAP-directory
-- that all contain information about available packages. The directory
will contain everything that can now be found in the Packages file, but
no single package. For that, you'd go to the FTP-mirrors.

There's one caveat in that the LDAP-directory will then most likely be out
of sync with the FTP-mirrors, but it wouldn't change that much: AFAICT,
these Packages files are built on one single FTP-server, and mirrored with
the rest; therefore, if you download a new file while the mirror is being
updated, it most surely is out of sync itself.
Perhaps an extra field like "lastVersion" would be appropriate, so that a
client, when it gets an error from his FTP-mirror that a certain file is
not available, is able to fall back to a previous version if necessary.

Of course, Packages & Packages.gz can still be provided, both for
convenience and backwards compatibility.

This is all hypothetical, but I don't think it's impossible. After all,
there's no real need to mirror a database, right?

-- 
wouter dot verhelst at advalvas in belgium

Try does not exist. Believe that you will do it, else you will fail.

       -- Luke Skywalker,
       in the trilogy "The Jedi Academy", Kevin J. Anderson



Reply to: