From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
Cc: | "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Jerry Sievers" <jerry(at)jerrysievers(dot)com> |
Subject: | Re: qsort, once again |
Date: | 2006-03-21 18:52:17 |
Message-ID: | 18732.1142967137@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> Well, my point was that it is a snap to implement and test.
Well, having done this, I have to eat my words: it does seem to be a
pretty good idea.
The following test numbers are using Bentley & McIlroy's test framework,
but modified to test only the case N=10000 rather than the four smaller
N values they originally used. I did that because it exposes quadratic
behavior more obviously, and the variance in N made it harder to compare
comparison ratios for different cases. I also added a "NEARSORT" test
method, which sorts the input distribution and then exchanges two
elements chosen at random. I did that because I was concerned that
nearly sorted input would be the worst case for the presorted-input
check, as it would waste the most cycles before failing on such input.
With our existing qsort code, the results look like
distribution SAWTOOTH: max cratio 94.17, min 0.08, average 1.56 over 105 tests
distribution RAND: max cratio 1.06, min 0.08, average 0.51 over 105 tests
distribution STAGGER: max cratio 6.08, min 0.23, average 1.01 over 105 tests
distribution PLATEAU: max cratio 94.17, min 0.08, average 2.12 over 105 tests
distribution SHUFFLE: max cratio 94.17, min 0.23, average 1.92 over 105 tests
method COPY: max cratio 6.08, min 0.08, average 0.72 over 75 tests
method REVERSE: max cratio 5.34, min 0.08, average 0.69 over 75 tests
method FREVERSE: max cratio 94.17, min 0.08, average 5.71 over 75 tests
method BREVERSE: max cratio 3.86, min 0.08, average 1.41 over 75 tests
method SORT: max cratio 0.82, min 0.08, average 0.31 over 75 tests
method NEARSORT: max cratio 0.82, min 0.08, average 0.36 over 75 tests
method DITHER: max cratio 5.52, min 0.18, average 0.77 over 75 tests
Overall: average cratio 1.42 over 525 tests
("cratio" is the ratio of the actual number of comparison function calls
to the theoretical expectation, N log2(N))
That's pretty awful: there are several test cases that make it use
nearly 100 times the expected number of comparisons.
Removing the swap_cnt test to bring it close to B&M's original
recommendations, we get
distribution SAWTOOTH: max cratio 3.85, min 0.08, average 0.70 over 105 tests
distribution RAND: max cratio 1.06, min 0.08, average 0.52 over 105 tests
distribution STAGGER: max cratio 6.08, min 0.58, average 1.12 over 105 tests
distribution PLATEAU: max cratio 3.70, min 0.08, average 0.34 over 105 tests
distribution SHUFFLE: max cratio 3.86, min 0.86, average 1.24 over 105 tests
method COPY: max cratio 6.08, min 0.08, average 0.76 over 75 tests
method REVERSE: max cratio 5.34, min 0.08, average 0.75 over 75 tests
method FREVERSE: max cratio 4.56, min 0.08, average 0.73 over 75 tests
method BREVERSE: max cratio 3.86, min 0.08, average 1.41 over 75 tests
method SORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method NEARSORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method DITHER: max cratio 3.73, min 0.18, average 0.72 over 75 tests
Overall: average cratio 0.78 over 525 tests
which is a whole lot better as to both average and worst cases.
I then added some code to check for presorted input (just after the
n<7 insertion sort code):
#ifdef CHECK_SORTED
presorted = 1;
for (pm = (char *) a + es; pm < (char *) a + n * es; pm += es)
{
if (cmp(pm - es, pm) > 0)
{
presorted = 0;
break;
}
}
if (presorted)
return;
#endif
This gives
distribution SAWTOOTH: max cratio 3.88, min 0.08, average 0.62 over 105 tests
distribution RAND: max cratio 1.06, min 0.08, average 0.46 over 105 tests
distribution STAGGER: max cratio 6.15, min 0.08, average 0.98 over 105 tests
distribution PLATEAU: max cratio 3.79, min 0.08, average 0.31 over 105 tests
distribution SHUFFLE: max cratio 3.91, min 0.08, average 1.09 over 105 tests
method COPY: max cratio 6.15, min 0.08, average 0.72 over 75 tests
method REVERSE: max cratio 5.34, min 0.08, average 0.76 over 75 tests
method FREVERSE: max cratio 4.58, min 0.08, average 0.73 over 75 tests
method BREVERSE: max cratio 3.91, min 0.08, average 1.44 over 75 tests
method SORT: max cratio 0.08, min 0.08, average 0.08 over 75 tests
method NEARSORT: max cratio 0.89, min 0.08, average 0.39 over 75 tests
method DITHER: max cratio 3.73, min 0.18, average 0.72 over 75 tests
Overall: average cratio 0.69 over 525 tests
So the worst case seems only very marginally worse, and there is a
definite improvement in the average case, even for inputs that aren't
entirely sorted. Importantly, the "near sorted" case that I thought
might send it into quadratic behavior doesn't seem to do that.
So, unless anyone wants to do further testing, I'll go ahead and commit
these changes.
regards, tom lane
PS: Just as a comparison point, here are the results when testing HPUX's
library qsort:
distribution SAWTOOTH: max cratio 7.00, min 0.08, average 0.76 over 105 tests
distribution RAND: max cratio 1.11, min 0.08, average 0.53 over 105 tests
distribution STAGGER: max cratio 7.05, min 0.58, average 1.24 over 105 tests
distribution PLATEAU: max cratio 7.00, min 0.08, average 0.43 over 105 tests
distribution SHUFFLE: max cratio 7.00, min 0.86, average 1.54 over 105 tests
method COPY: max cratio 6.70, min 0.08, average 0.79 over 75 tests
method REVERSE: max cratio 7.05, min 0.08, average 0.78 over 75 tests
method FREVERSE: max cratio 7.00, min 0.08, average 0.77 over 75 tests
method BREVERSE: max cratio 7.00, min 0.08, average 2.11 over 75 tests
method SORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method NEARSORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method DITHER: max cratio 4.06, min 0.16, average 0.74 over 75 tests
Overall: average cratio 0.90 over 525 tests
and here are the results using glibc's qsort, which of course isn't
quicksort at all but some kind of merge sort:
distribution SAWTOOTH: max cratio 0.90, min 0.49, average 0.65 over 105 tests
distribution RAND: max cratio 0.91, min 0.49, average 0.76 over 105 tests
distribution STAGGER: max cratio 0.92, min 0.49, average 0.70 over 105 tests
distribution PLATEAU: max cratio 0.84, min 0.49, average 0.54 over 105 tests
distribution SHUFFLE: max cratio 0.64, min 0.49, average 0.52 over 105 tests
method COPY: max cratio 0.92, min 0.49, average 0.66 over 75 tests
method REVERSE: max cratio 0.92, min 0.49, average 0.68 over 75 tests
method FREVERSE: max cratio 0.92, min 0.49, average 0.67 over 75 tests
method BREVERSE: max cratio 0.92, min 0.49, average 0.68 over 75 tests
method SORT: max cratio 0.49, min 0.49, average 0.49 over 75 tests
method NEARSORT: max cratio 0.55, min 0.49, average 0.51 over 75 tests
method DITHER: max cratio 0.92, min 0.50, average 0.74 over 75 tests
Overall: average cratio 0.63 over 525 tests
PPS: final version of test framework attached for the archives.
Attachment | Content-Type | Size |
---|---|---|
sorttester.c | application/octet-stream | 6.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Hallgren | 2006-03-21 18:52:47 | Re: Question about MemoryContexts and functions that returns |
Previous Message | Jim C. Nasby | 2006-03-21 18:05:54 | Re: [GENERAL] A real currency type |