| From: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
|---|---|
| To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Re: Which qsort is used |
| Date: | 2005-12-17 06:43:58 |
| Message-ID: | D425483C2C5C9F49B5B7A41F8944154757D38D@postal.corporate.connx.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: Friday, December 16, 2005 10:41 PM
> To: Dann Corbit
> Cc: Qingqing Zhou; Bruce Momjian; Luke Lonergan; Neil Conway; pgsql-
> hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Re: Which qsort is used
>
> "Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> >> I've still got a problem with these checks; I think they are a net
> >> waste of cycles on average.
>
> > The benchmarks say that they (order checks) are a good idea on
average
> > for ordered data, random data, and partly ordered data.
>
> There are lies, damn lies, and benchmarks ;-)
>
> The problem with citing a benchmark for this discussion is that a
> benchmark can't tell you anything about real-world probabilities;
> it only tells you about the probabilities occuring in the benchmark
> case. You need to make the case that the benchmark reflects the
> real world, which you didn't.
>
> > If you trace the algorithms in a debugger you will be surprised at
how
> > often the partitions are ordered, even with random sets as input.
>
> Well, I do agree that checking for orderedness on small partitions
would
> succeed more often than on larger partitions or the whole file --- but
> the code-as-given checks all the way down. Moreover, the argument
given
> for spending these cycles is that insertion sort sucks on
reverse-order
> input ... where "sucks" means that it spends O(N^2) time. But it
spends
> O(N^2) in the average case, too.
I agree that in general these checks are not important and they
complicate what is a simple and elegant algorithm.
The cases where the checks are important (highly ordered data sets) are
rare and so the value added is minimal.
I am actually quite impressed with the excellence of Bentley's sort out
of the box. It's definitely the best library implementation of a sort I
have seen.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Volkan YAZICI | 2005-12-17 11:50:49 | Re: number of loaded/unloaded COPY rows |
| Previous Message | Tom Lane | 2005-12-17 06:40:59 | Re: Re: Which qsort is used |