Re: RFC: implementation of package pools
Jason Gunthorpe wrote:
>
> > * Ideally, all packages would lie flat in a single directory. example
> > /package-pool
>
> No. This is not ideal, it impeads the ability of the ftp team to
> manipulate the archive. Packaging into sub dirs by source restores some of
> this ability - in fact organizing by source may be a big win for them, it
> is too early to tell for sure.
>
Yes, but isn't the idea to provide automated tools that will require
minimal manual intervention? I guess tools can check for consistency or
correctness better than people, and the admins can just use a set
of tools that will do all necessary tasks.
> > * You have to split the pools directory in order to preserve
> > file-system performance for a large number of packages. That is,
> > _performance_ is the only reason for doing this. The directory
> > names ultimately don't have to be human readable, since such _simple minded_
> > prefixing doesn't at all ease browsing the ftp archive.
>
> Wrong.
>
> It is absolultely critical that someone who knows what they are looking
> for and is well versed in the archive structure (say, the FTP admins) can
> go directly to a package's directory without having to run a hashing
> function. Any scheme which does not allow this must be rejected.
>
Well, a tool can run a hashing function. Once the tools are stabilized,
they will have no problems.
A supporting argument: I'm using apt-cache search, or apt-get but avoiding
manipulation of apt or dpkg databases manually. Because I have reliable
automated tools.
If writing these tools require extra developer work, I'm willing to
contribute. I'm a CS bsc, btw, and know a lot about interpreted / symbolic
languages and have experience using python. It'll take just a few days
to get started the code base.
> > * Then, you must use a REAL static hash function for determining how
> > this split is going to happen. If you don't know hash functions well,
> > someone else surely does. Feel free to ask for advice!
>
> Actually the hashing results from this function are well within the
> limits for good ext2 performance. A more even distribution is not
> important for this application. See past discussions on this list for some
> numbers.
For the current set of packages, this may be true. What I'm questioning
is whether this will hold for 20.000 or 50.000 packages? What's the
asymptotic time complexity? ;) If you say that we can patch that when
it crawls down to a halt, okay. But if we want to do things right the
first time, then we should determine this.
BTW, I really want to pack together:
* current dinstall
* and _all_ archive related "server-side" tools
and make these available in a package or packages.
I know that current dinstall's on public CVS, but can you help
me with finding the rest?
Thanks for your attention!
--
Eray (exa) Ozkural
Comp. Sci. Dept., Bilkent University, Ankara
e-mail: erayo@cs.bilkent.edu.tr
www: http://www.cs.bilkent.edu.tr/~erayo
Reply to: