# Re: Too Many Packages [was Re: Splitting locales...]

```> On Mon, Oct 09, 2000 at 08:35:07AM -0500, David Starner wrote:
> > Which changes nothing that I said. Directories are still going to be O(n)
> > in total size, dselect is still going to be O(n) in memory, O(n log n) in
> > sorting stuff. It's not a solution here.
>
> length is O(n).

I have seen that claim many times, but I always feel it's a bit
disengenuous...

Sorting N distict items by radix sort is O(N log N).  Period.  Saying
that is O(N) is lying.

How do I come to this conclusion?

Because a proper analysis of radix sort will say that sorting n strings
is O(n*m), where m is the length of the strings (you have to do m
passes of distribution and collection of the n items to sort, so you
pass through all the data 2*m times).  But if you a) have a total of b
buckets, and b) have n distinct strings, then the string length m > log
n (base b).  So for arbitrary length strings, you have O(n log n).

That does -not- mean that for this problem space radix sort is
inappropriate.  It may be.  Since we are dealing with a large number of
buckets (96? 256?), we are gonna have to have a pretty big n before the
log n portion starts to exceed any feasible choice for n.

> c.f.
> 	Knuth, "The Art of Computer Programming", Vol 3: "Sorting & Searching"
> 	Cormen, Leiserson, & Rivest, "Introduction to Algorithms"
> 	Baase, "Computer Algorithms: Introduction to Design & Analysis"

My Knuth is at work, but I do have CLR in front of me.  Note that the
bucket sort described assumes that the distribution of items is
uniform, which Debian package names aren't.  It also mentiones that for
large amounts of data, the ability to sort in place can become very
important and radix sort fails there -- leaving traditional O(n lg n)
solutions as the best.

I'd be sort of interested in knowing what we are sorting, and why.  We
have under 10,000 packages, and my /var/lib/dpkg/available is under 5MB
in size.  Sorting time doesn't seem like it should be our biggest
concern.

>
> Look for "radix sort" and "bucket sort" in the indices in any of these
> three books, and please, bear that information in mind when you post to
> the list about the running time of sorting.
>
> Bill
>
>
> --
> To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>

--
Buddha Buck                             bmbuck@14850.com
"Just as the strength of the Internet is chaos, so the strength of our
liberty depends upon the chaos and cacophony of the unfettered speech
the First Amendment protects."  -- A.L.A. v. U.S. Dept. of Justice

```