Re: Why do we still perform a check for pre-sorted input within qsort variants?

From: Dann Corbit <DCorbit(at)connx(dot)com>
To: 'Greg Stark' <stark(at)mit(dot)edu>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why do we still perform a check for pre-sorted input within qsort variants?
Date: 2013-03-09 22:22:50
Message-ID: 87F42982BF2B434F831FCEF4C45FC33E5BD36A4A@EXCHANGE.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Yes, you are right. I knew of a median of medians technique for pivot selection and I mistook that for the median of medians median selection algorithm (which it definitely isn't).
I was not aware of a true linear time selection of the median algorithm {which is what median of medians accomplishes). The fastest median selection algorithm that I was aware of was quickselect, which is only linear on average.
I think that you analysis is correct, at any rate.

I also think I will enjoy learning and experimenting with the median of medians algorithm.
I found a link about it here:
http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

-----Original Message-----
From: gsstark(at)gmail(dot)com [mailto:gsstark(at)gmail(dot)com] On Behalf Of Greg Stark
Sent: Saturday, March 09, 2013 1:21 PM
To: Dann Corbit
Cc: Bruce Momjian; Peter Geoghegan; Robert Haas; Tom Lane; PG Hackers
Subject: Re: Why do we still perform a check for pre-sorted input within qsort variants?

On Sat, Mar 9, 2013 at 8:52 PM, Dann Corbit <DCorbit(at)connx(dot)com> wrote:
> Median of medians selection of the pivot gives you O(n*log(n)).
>
> No. It does make O(n*n) far less probable, but it does not eliminate it. If it were possible, then introspective sort would be totally without purpose.

No really, quicksort with median of medians pivot selection is most definitely O(n*log(n)) worst case. This is textbook stuff. In fact even the introspective sort paper mentions it as one of the options to fail over to if the partition size isn't decreasing rapidly enough.

The problem is that median of medians is O(n) rather than O(1). That doesn't change the O() growth rate since there will be log(n) iterations. But it means it contributes to the constant factor and the end result ends up being a constant factor larger than heap sort or merge sort. That also explains how your reference on the quicksort adversary doesn't apply. It works by ordering elements you haven't compared yet and assumes that all but O(1) elements will still be eligible for reordering.

In any case I think you're basically right. What we have is basically a broken introspective sort that does more work than necessary and then handles fewer cases than it should. Implementing a better introspection that detects all perverse cases and does so with a lower overhead than the current check is a fine idea.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-03-10 00:08:14 Re: Call for Google Summer of Code mentors, admins
Previous Message Greg Stark 2013-03-09 21:21:12 Re: Why do we still perform a check for pre-sorted input within qsort variants?