From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
Cc: | "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Re: Which qsort is used |
Date: | 2005-12-17 05:03:25 |
Message-ID: | 3148.1134795805@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> Both of them are modified versions of Bentley's sort algorithm.
Jon Bentley of Bell Labs? Small world ... he was my thesis adviser
for awhile when he was at CMU. He's a good algorithms man, for sure.
> I just added the in-order check to both of them, and the reversed
> partition check for the second method (in the case of the subfiles
> because insertion sort is allergic to reversed partitions).
I've still got a problem with these checks; I think they are a net
waste of cycles on average. They would only be a win if you expected a
nontrivial percentage of perfectly-in-order or perfectly-reverse-order
inputs. What I think is much more probable in the Postgres environment
is almost-but-not-quite-ordered inputs --- eg, a table that was
perfectly ordered by key when filled, but some of the tuples have since
been moved by UPDATEs. This is the worst possible case for the in-order
checks, because they can then grovel over large percentages of the file
before failing ... and when they fail, those cycles are entirely wasted;
you have not advanced the state of the sort at all.
For the "usual" case of randomly ordered input, of course it doesn't
matter much at all because the in-order checks will fail after examining
not too many items. But to argue that the checks are worth making, you
have to assume that perfectly-ordered inputs are more common than
almost-ordered inputs, and I think that is exactly the wrong assumption
for the Postgres environment. I sure haven't seen any evidence that
it's a good assumption.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Dann Corbit | 2005-12-17 05:23:07 | Re: Re: Which qsort is used |
Previous Message | Bruce Momjian | 2005-12-17 04:10:43 | Re: Automatic function replanning |