Re: RFC: implementation of package pools
Jason Gunthorpe wrote:
> > * Ideally, all packages would lie flat in a single directory. example
> > /package-pool
> No. This is not ideal, it impeads the ability of the ftp team to
> manipulate the archive. Packaging into sub dirs by source restores some of
> this ability - in fact organizing by source may be a big win for them, it
> is too early to tell for sure.
Yes, but isn't the idea to provide automated tools that will require
minimal manual intervention? I guess tools can check for consistency or
correctness better than people, and the admins can just use a set
of tools that will do all necessary tasks.
> > * You have to split the pools directory in order to preserve
> > file-system performance for a large number of packages. That is,
> > _performance_ is the only reason for doing this. The directory
> > names ultimately don't have to be human readable, since such _simple minded_
> > prefixing doesn't at all ease browsing the ftp archive.
> It is absolultely critical that someone who knows what they are looking
> for and is well versed in the archive structure (say, the FTP admins) can
> go directly to a package's directory without having to run a hashing
> function. Any scheme which does not allow this must be rejected.
Well, a tool can run a hashing function. Once the tools are stabilized,
they will have no problems.
A supporting argument: I'm using apt-cache search, or apt-get but avoiding
manipulation of apt or dpkg databases manually. Because I have reliable
If writing these tools require extra developer work, I'm willing to
contribute. I'm a CS bsc, btw, and know a lot about interpreted / symbolic
languages and have experience using python. It'll take just a few days
to get started the code base.
> > * Then, you must use a REAL static hash function for determining how
> > this split is going to happen. If you don't know hash functions well,
> > someone else surely does. Feel free to ask for advice!
> Actually the hashing results from this function are well within the
> limits for good ext2 performance. A more even distribution is not
> important for this application. See past discussions on this list for some
For the current set of packages, this may be true. What I'm questioning
is whether this will hold for 20.000 or 50.000 packages? What's the
asymptotic time complexity? ;) If you say that we can patch that when
it crawls down to a halt, okay. But if we want to do things right the
first time, then we should determine this.
BTW, I really want to pack together:
* current dinstall
* and _all_ archive related "server-side" tools
and make these available in a package or packages.
I know that current dinstall's on public CVS, but can you help
me with finding the rest?
Thanks for your attention!
Eray (exa) Ozkural
Comp. Sci. Dept., Bilkent University, Ankara