On Mon, Oct 09, 2000 at 06:05:52PM -0400, Buddha Buck wrote: > > On Mon, Oct 09, 2000 at 08:35:07AM -0500, David Starner wrote: > > > Which changes nothing that I said. Directories are still going to be O(n) > > > in total size, dselect is still going to be O(n) in memory, O(n log n) in > > > sorting stuff. It's not a solution here. > > Please, please, please read an algorithms book. Sorting strings of bounded > > length is O(n). Eh? Comparison sorts are O(n lg(n)) in the limit. A comparison sort is one where your only base operator is a strcmp() like function that takes a constant amount of time. If you have unbounded strings, you should include the length of the string in your analysis, and you'll end up with O(n lg(n) k) as an upper bound, where k is the length of your longest string. If, on the other hand, you can do cleverer things than just a simple comparison (if you can create a set of O(n) buckets that will have an upper bound of O(1) (ie a constant number) of collisions), and filling those buckets takes O(1) (or O(k), k the maximum length of a string) time, then you can trivially do the sort in O(n) (or O(nk)) time: create buckets (O(n) time, there are O(n) buckets) for each string: (n * O(1)) add to bucket x (O(1)) for each bucket: (n * O(1) / n * O(k)) sort bucket (at most a constant number of elements, by assumption, so at most a constant amount of time for a comparison sort, so either O(1) or O(k)) print bucket contents (also O(1) or O(k)) However ensuring ou have at most a constant number of elements in each bucket is difficult and depends crucially on how much you know about your initial data set. > I'd be sort of interested in knowing what we are sorting, and why. We > have under 10,000 packages, and my /var/lib/dpkg/available is under 5MB > in size. Sorting time doesn't seem like it should be our biggest > concern. Definitely. (I don't have a clue what this thread is about, mind :) We're working with large, but not inordinately huge, datasets, and we have more than enough computing power to cope (albeit imperfectly) with even downright bad algoirthmic choices wrt it (cf bugs.debian.org :). And using qsort() from libc isn't a bad algorithmic choice. Cheers, aj -- Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/> I don't speak for anyone save myself. GPG signed mail preferred. ``We reject: kings, presidents, and voting. We believe in: rough consensus and working code.'' -- Dave Clark
Attachment:
pgpbCSzIyI1oC.pgp
Description: PGP signature