From: | Dann Corbit <DCorbit(at)connx(dot)com> |
---|---|
To: | 'Bruce Momjian' <bruce(at)momjian(dot)us>, Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Why do we still perform a check for pre-sorted input within qsort variants? |
Date: | 2013-03-08 19:43:10 |
Message-ID: | 87F42982BF2B434F831FCEF4C45FC33E5BD365B6@EXCHANGE.corporate.connx.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>-----Original Message-----
>From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Bruce Momjian
>Sent: Friday, March 08, 2013 11:22 AM
>To: Peter Geoghegan
>Cc: Robert Haas; Tom Lane; PG Hackers
>Subject: Re: [HACKERS] Why do we still perform a check for pre-sorted input within qsort variants?
>
>On Mon, Feb 25, 2013 at 02:31:21PM +0000, Peter Geoghegan wrote:
>> On 25 February 2013 11:49, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> > I did attempt to do some tinkering with this while I was playing
>> > with it, but I didn't come up with anything really compelling. You
>> > can reduce the number of comparisons on particular workloads by
>> > tinkering with the algorithm, but then somebody else ends up doing
>> > more comparisons, so it's hard to say whether you've really made
>> > things better. Or at least I found it so.
>>
>> Right.
>>
>> To be honest, the real reason that it bothers me is that everything
>> else that our qsort routine does that differs from classic quicksort
>> (mostly quadratic insurance, like the median-of-medians pivot
>> selection, but also the fallback to insertion sort when n < 7) is very
>> well supported by peer reviewed research. Like Tom, I find it
>> implausible that Sedgewick and others missed a trick, where we did
>> not, particularly with something so simple.
>
>Perhaps we are more likely to be fed sorted data than a typical qsort usecase.
>
Checking for pre-sorted input will not make the routine faster on average.
However, it prevents a common perverse case where the runtime goes quadratic, so sorting 10^6 elements will take K*10^12th operations when the bad condition does occur.
Checking for pre-sorted data can be thought of as an insurance policy. It's kind of a pain to pay the premiums, but you sure are glad you have it when an accident occurs.
Because the check is linear, and the algorithm is O(n*log(n)), the cost is not terribly significant.
Switching to insertion sort for small partitions is almost always a good idea. This idea was invented by Richard Singleton as ACM algorithm 347.
A different sort of safeguard, as compared to checking for pre-ordered data, is to watch recursion depth. If we find that we have already gone past a depth of log2(n) levels and we are not ready to shift to insertion sort, then there is some sort of perverse data set. A traditional fix is to switch to heap sort at this point. Since heap sort is also O(n*log(n)) {though the constant multiplier is larger than for quicksort on average} you never get O(n*n) behavior. This method is called introspective sort.
There are, of course, other clever things that can be tried. The relaxed weak heapsort of Edelkamp and Steigler, for instance, or using tries as in "Cache-Efficient String Sorting Using Copying" by Ranjan Sinha. There is also possibly significant benefit to using radix sort for fixed width data that can be easily binned.
I seem to recall that a year or two back some study was done on quicksort methodology as used in PostgreSQL. As I recall, the algorithm used in PostgreSQL fared well in the tests.
From | Date | Subject | |
---|---|---|---|
Next Message | 'Bruce Momjian' | 2013-03-08 19:48:15 | Re: Why do we still perform a check for pre-sorted input within qsort variants? |
Previous Message | Bruce Momjian | 2013-03-08 19:21:58 | Re: Why do we still perform a check for pre-sorted input within qsort variants? |