Re: Why do we still perform a check for pre-sorted input within qsort variants?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why do we still perform a check for pre-sorted input within qsort variants?
Date: 2013-02-25 11:49:37
Message-ID: CA+TgmoY9QuVV5m90U0+VuhJais3yyq885te=5aipojXaXToafQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 25, 2013 at 5:43 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com> writes:
>> In the past, Robert and I have criticised the fact that our qsort
>> implementation (and the various specialisations thereof) each perform
>> a check for pre-sorted input. This check does not appear in the
>> original NetBSD qsort that we lifted our implementation from, and
>> presumably isn't described by the document 'Qsort routine from Bentley
>> & McIlroy's "Engineering a Sort Function"' that that implementation is
>> based on.
>
> FWIW, I've been suspicious of that pre-sorted check since the day it
> went in. Bentley was my faculty adviser for awhile in grad school,
> and I know him to be *way* too smart to have missed anything as simple
> as that. But I didn't have hard evidence on which to object to it
> at the time, and indeed testing seemed to say it was a good idea:
> http://www.postgresql.org/message-id/18732.1142967137@sss.pgh.pa.us
>
> If you can provide a refutation I will gladly see it out of there.

The thing that bothers me about that check (which may be different
than the thing that bothers Peter) is that we do it recursively at
every level. I have a sneaking suspicion that it only makes sense to
do that check for subpartitions above a certain size. I'm perfectly
willing to grant that presorted data is a sufficiently important
special case that we ought to check for it once, and I also suspect
it's a good idea to check large partitions so as to speed up the
sorting of almost-ordered data. But I'm skeptical that it makes sense
to check it at every recursive level.

I did attempt to do some tinkering with this while I was playing with
it, but I didn't come up with anything really compelling. You can
reduce the number of comparisons on particular workloads by tinkering
with the algorithm, but then somebody else ends up doing more
comparisons, so it's hard to say whether you've really made things
better. Or at least I found it so.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-02-25 12:56:40 Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
Previous Message Tom Lane 2013-02-25 11:44:16 Re: Why do we still perform a check for pre-sorted input within qsort variants?