Re: Package Pool Proposal
Raul Miller wrote:
> How about first three characters of package name?
Well, let's see what distribution this gives us:
joey@gumdrop:~>grep ^Package: /var/lib/dpkg/available | cut -d ' ' -f 2 | \
	perl -pe 'sub hash { substr(shift, 0, 3) } chomp; $_=hash($_)."\n"' | \
	sort | uniq -c |sort -rn | head
    746 lib
     59 ker
     56 gno
     55 pyt
     54 net
     41 lg-
     32 doc
     31 com
     29 xfo
     28 php
(With 1591 hash buckets being used in all.)
Clearly not a good idea. In fact, I'd say this is a worse distribution than
you get by just hashing with the first letter of the package's name. Feel
free to replace hash() with a subroutine that implements another hashing
method. For example, using the last 2 letters of package name is a superior
hash because it gives:
    401 ev
    172 oc
    149 rl
    140 er
     76 ls
     75 es
     72 on
     58 in
     56 nt
     47 ck
(With 597 hash buckets being used.)
> Also, I like the implicit idea having a separate, distinct directory
> for each package, underneath the "hash bucket" directories.
So do I.
> Finally, I'll note that we've not really discussed source packages, and
> whether they should be stored under the same directory structure as the
> binary packages, or a different one.  So far, I'd say that implicitly
> we're talking about putting them in the same hierarchy, but people's
> comments about numbers seems to indicate otherwise.
If we use the package name as a directory, the majority of souce package
will go in the same subdirectory as the binary package, so it doesn't change
the numbers much.
-- 
see shy jo
Reply to: