[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Package Pool Proposal



Raul Miller wrote:
> How about first three characters of package name?

Well, let's see what distribution this gives us:

joey@gumdrop:~>grep ^Package: /var/lib/dpkg/available | cut -d ' ' -f 2 | \
	perl -pe 'sub hash { substr(shift, 0, 3) } chomp; $_=hash($_)."\n"' | \
	sort | uniq -c |sort -rn | head
    746 lib
     59 ker
     56 gno
     55 pyt
     54 net
     41 lg-
     32 doc
     31 com
     29 xfo
     28 php

(With 1591 hash buckets being used in all.)

Clearly not a good idea. In fact, I'd say this is a worse distribution than
you get by just hashing with the first letter of the package's name. Feel
free to replace hash() with a subroutine that implements another hashing
method. For example, using the last 2 letters of package name is a superior
hash because it gives:

    401 ev
    172 oc
    149 rl
    140 er
     76 ls
     75 es
     72 on
     58 in
     56 nt
     47 ck

(With 597 hash buckets being used.)

> Also, I like the implicit idea having a separate, distinct directory
> for each package, underneath the "hash bucket" directories.

So do I.

> Finally, I'll note that we've not really discussed source packages, and
> whether they should be stored under the same directory structure as the
> binary packages, or a different one.  So far, I'd say that implicitly
> we're talking about putting them in the same hierarchy, but people's
> comments about numbers seems to indicate otherwise.

If we use the package name as a directory, the majority of souce package
will go in the same subdirectory as the binary package, so it doesn't change
the numbers much.

-- 
see shy jo


Reply to: