*To*: dbisnath@rave-tt.net, debian-user@lists.debian.org*Subject*: Re: Re : Re: Sorting data points*From*: DSC Siltec <dscpubl@siltec.lt>*Date*: Mon, 06 May 2002 09:13:00 +0200*Message-id*: <[🔎] 3CD62CFC.C72C156D@siltec.lt>*References*: <200205060017.g460HXF05746@rave-tt.net>

dbisnath@rave-tt.net wrote: > > Thanks DSC. Great site. > To the first person who replied, sorry the message was deleted and i > lost your name. No, this was not a class assignment - not that anything > was wrong if it were so! We were and are still are students, always > learning. I simply asked what I wanted to know without wasting my and > other people's time with a larger than required message. After researching > on bubble sort, quicksort, vector etc etc > I wanted to get second opinions on the best sort algorihtm. > Thanks for you reply. Okay, one other comment on identifying the best of the best in a lot of areas. The N*Log2(N) factor shows up in the maximum compression of random data, in the conversion of accessible states to temperature, and in general shows up in the 2nd law of thermodynamics. What this means is that (1) the 2nd law of thermo is not only a physical identity, it is a mathematical identity and (2) You are going to run into it a lot for data reduction/analysis/transformation. Thus, for a lot of things N*Log2(N) may be the mathematical optimum. Now, that being said, N*Log2(N) is optimum for the truly random case. A lot of things are not random, and so you can get better than that for the non-random case. Consider the JPG image vs. the Run-length-encoded TIFF for the case of a blank page. The Run-length-encoded TIFF of any size could be concievably compressed down to 2 bits. The first bit says "paint the entire page white." The second bit says "That's all there is." In reality, you will have headers, a specific RLE format, and such, so there is a minimum size for the TIFF. But you get the idea. So if there is anything you *do* know about the image, you compress that first. Then you incorporate into that the best random compression you are willing to pay for (in terms of time... just as the only truly reversible thermodynamic reaction takes forever, the only truly optimum compression of random data takes forever. Any increases in speed meet with loses in compression.). In the same way, if there is anything you *do* know about the order in a sort algorithm, you incorporate that into your sorting method. For that reason, for certain data sets, other sorts may be optimum. That includes the quick sort, or -- ideally for a mathematically predefined data set -- a mathematical algorithm that will yield the order instantly. If, for example, you had 12 different simple mathematical equations that covered 12 different cases, I would combine the heap sort with the 12 different equations. But to do that optimally, you still want to run each equation a minimum number of times. There your NLog2N will work its way back in again... or had better, or you'll be wasting a lot of unnecessary computation time. Then, too, if you want to find the Nth item quickly, there might be a whole different approach. Perhaps the quicksort might be better there. I think nr.com talks about that a bit. ---> Just a whole bunch of thoughts about numerical optimization. - Michael More on my own history behind these statements. One thing that I think was great for my understanding of this area was to try to program an "infinite compression routine." For arguments' sake, say I was using first of all variations of the LZW method, as described in GIF definition. So I ran the same program 20 times on the same data. I very carefully designed it so that the data would never grow, but could transform. My compression the first time was 30%. 2nd pass, I got 9% additional compression. 3rd pass, I got 2%. Fourth pass, about 1/2%. I wondered why it wasn't working, so I took a look at the data that was incoming. I then changed the routine to try to optimize based upon the data that was incoming -- especially all those 10..001 bit runs that came out of the LZW compression. I still didn't get much improvement. Oh, the second pass improved, but when I tracked the improvements due to my new algorithm, they went away just as quickly. I then wrote a program to put up a histogram of the data in many different forms, and took a look at the data. What I saw was something that (1) looked random (2) on each pass pushed closer and closer to what I would call a normalized blackbody radiation curve (or perhaps a bell curve, but there are many different bell curves possible). And the better my data approximated that curve, the less compression I could get out of the data on the next pass. More than that, no matter what form I was measuring the data [distribution of values, distribution of 0 runs, distribution of difference from previous byte, etc...], a compression run would push *all forms* closer to that blackbody curve. It was then that I realized I was seeing a numerical version of the 2nd law of thermo. The more compression you get, the lower the state of energy and the higher the state of entropy. And no spontaneous decrease in entropy can be expected. After all that, I have to say: there's nothing like actually *building* a perpetual motion machine to make a true believer in the 2nd law of thermo. The fact that I built mine in the world of data compression simply let me not waste a lifetime of effort, since with my computer I made hundreds of attempts in just a few days. --- Now, if you want a truly impressive algorithm, how about an algorithm that you can often do on paper--it's so conceptually simple -- and will give you a Taylor series for all variables for almost any set of equations, linear, nonlinear, partial-differential, or otherwise (initial value only... boundary conditions are not yet figured out.)? Go look at the Parker - Souchacki solution to the Picard iteration. You should be able to find stuff on the web. It's that powerful, it's new, and -- in my opinion -- it warrants an algorithm library in Debian. - Michael -- To UNSUBSCRIBE, email to debian-user-request@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

- Prev by Date:
**kernel recompile problems** - Next by Date:
**Sending keystrokes to a window from shell.** - Previous by thread:
**kernel recompile problems** - Next by thread:
**Sending keystrokes to a window from shell.** - Index(es):