*To*: William Lee Irwin III <wli@holomorphy.com>, debian-devel@lists.debian.org*Subject*: Re: Too Many Packages [was Re: Splitting locales...]*From*: Buddha Buck <bmbuck@14850.com>*Date*: Mon, 09 Oct 2000 18:05:52 -0400*Message-id*: <20001009220553.0BBD715713@zaphod>*In-reply-to*: Your message of "Mon, 09 Oct 2000 09:25:22 PDT." <20001009092522.C7718@holomorphy.com>

> On Mon, Oct 09, 2000 at 08:35:07AM -0500, David Starner wrote: > > Which changes nothing that I said. Directories are still going to be O(n) > > in total size, dselect is still going to be O(n) in memory, O(n log n) in > > sorting stuff. It's not a solution here. > > Please, please, please read an algorithms book. Sorting strings of bounded > length is O(n). I have seen that claim many times, but I always feel it's a bit disengenuous... Sorting N distict items by radix sort is O(N log N). Period. Saying that is O(N) is lying. How do I come to this conclusion? Because a proper analysis of radix sort will say that sorting n strings is O(n*m), where m is the length of the strings (you have to do m passes of distribution and collection of the n items to sort, so you pass through all the data 2*m times). But if you a) have a total of b buckets, and b) have n distinct strings, then the string length m > log n (base b). So for arbitrary length strings, you have O(n log n). That does -not- mean that for this problem space radix sort is inappropriate. It may be. Since we are dealing with a large number of buckets (96? 256?), we are gonna have to have a pretty big n before the log n portion starts to exceed any feasible choice for n. > c.f. > Knuth, "The Art of Computer Programming", Vol 3: "Sorting & Searching" > Cormen, Leiserson, & Rivest, "Introduction to Algorithms" > Baase, "Computer Algorithms: Introduction to Design & Analysis" My Knuth is at work, but I do have CLR in front of me. Note that the bucket sort described assumes that the distribution of items is uniform, which Debian package names aren't. It also mentiones that for large amounts of data, the ability to sort in place can become very important and radix sort fails there -- leaving traditional O(n lg n) solutions as the best. I'd be sort of interested in knowing what we are sorting, and why. We have under 10,000 packages, and my /var/lib/dpkg/available is under 5MB in size. Sorting time doesn't seem like it should be our biggest concern. > > Look for "radix sort" and "bucket sort" in the indices in any of these > three books, and please, bear that information in mind when you post to > the list about the running time of sorting. > > Bill > > > -- > To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org > with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org > -- Buddha Buck bmbuck@14850.com "Just as the strength of the Internet is chaos, so the strength of our liberty depends upon the chaos and cacophony of the unfettered speech the First Amendment protects." -- A.L.A. v. U.S. Dept. of Justice

**Follow-Ups**:**Re: Too Many Packages [was Re: Splitting locales...]***From:*Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>

**Re: Too Many Packages [was Re: Splitting locales...]***From:*Anthony Towns <aj@azure.humbug.org.au>

**Re: Too Many Packages [was Re: Splitting locales...]***From:*Jason Gunthorpe <jgg@ualberta.ca>

**Re: Too Many Packages [was Re: Splitting locales...]***From:*tb@becket.net (Thomas Bushnell, BSG)

**References**:**Re: Too Many Packages [was Re: Splitting locales...]***From:*William Lee Irwin III <wli@holomorphy.com>

- Prev by Date:
**Re: Intent to Sponsor: Galeon** - Next by Date:
**Re: Too Many Packages [was Re: Splitting locales...]** - Previous by thread:
**Re: Too Many Packages [was Re: Splitting locales...]** - Next by thread:
**Re: Too Many Packages [was Re: Splitting locales...]** - Index(es):