Re: Package Pool Proposal
Raul Miller wrote:
> How about first three characters of package name?
Well, let's see what distribution this gives us:
joey@gumdrop:~>grep ^Package: /var/lib/dpkg/available | cut -d ' ' -f 2 | \
perl -pe 'sub hash { substr(shift, 0, 3) } chomp; $_=hash($_)."\n"' | \
sort | uniq -c |sort -rn | head
746 lib
59 ker
56 gno
55 pyt
54 net
41 lg-
32 doc
31 com
29 xfo
28 php
(With 1591 hash buckets being used in all.)
Clearly not a good idea. In fact, I'd say this is a worse distribution than
you get by just hashing with the first letter of the package's name. Feel
free to replace hash() with a subroutine that implements another hashing
method. For example, using the last 2 letters of package name is a superior
hash because it gives:
401 ev
172 oc
149 rl
140 er
76 ls
75 es
72 on
58 in
56 nt
47 ck
(With 597 hash buckets being used.)
> Also, I like the implicit idea having a separate, distinct directory
> for each package, underneath the "hash bucket" directories.
So do I.
> Finally, I'll note that we've not really discussed source packages, and
> whether they should be stored under the same directory structure as the
> binary packages, or a different one. So far, I'd say that implicitly
> we're talking about putting them in the same hierarchy, but people's
> comments about numbers seems to indicate otherwise.
If we use the package name as a directory, the majority of souce package
will go in the same subdirectory as the binary package, so it doesn't change
the numbers much.
--
see shy jo
Reply to: