Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)

From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Dann Corbit" <DCorbit(at)connx(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ron <rjpeace(at)earthlink(dot)net>, pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Date: 2006-03-02 18:50:24
Message-ID: 36e682920603021050s4d8a6a02j3c0acf54f19e4881@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

My introsort is almost complete and its the fastest variant of quicksort I
can find, I'll submit it to -patches in the next couple days as-well.

On 3/2/06, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>
>
> Added to TODO:
>
> * Improve port/qsort() to handle sorts with 50% unique and 50%
> duplicate
> value [qsort]
>
> This involves choosing better pivot points for the quicksort.
>
>
>
> ---------------------------------------------------------------------------
>
> Dann Corbit wrote:
> >
> >
> > > -----Original Message-----
> > > From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> > > owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> > > Sent: Wednesday, February 15, 2006 5:22 PM
> > > To: Ron
> > > Cc: pgsql-performance(at)postgresql(dot)org; pgsql-hackers(at)postgresql(dot)org
> > > Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
> > Index
> > > behaviour)
> > >
> > > Ron <rjpeace(at)earthlink(dot)net> writes:
> > > > How are we choosing our pivots?
> > >
> > > See qsort.c: it looks like median of nine equally spaced inputs (ie,
> > > the 1/8th points of the initial input array, plus the end points),
> > > implemented as two rounds of median-of-three choices. With half of
> > the
> > > data inputs zero, it's not too improbable for two out of the three
> > > samples to be zeroes in which case I think the med3 result will be
> > zero
> > > --- so choosing a pivot of zero is much more probable than one would
> > > like, and doing so in many levels of recursion causes the problem.
> >
> > Adding some randomness to the selection of the pivot is a known
> > technique to fix the oddball partitions problem. However, Bentley and
> > Sedgewick proved that every quick sort algorithm has some input set that
> > makes it go quadratic (hence the recent popularity of introspective
> > sort, which switches to heapsort if quadratic behavior is detected. The
> > C++ template I submitted was an example of introspective sort, but
> > PostgreSQL does not use C++ so it was not helpful).
> >
> > > I think. I'm not too sure if the code isn't just being sloppy about
> > the
> > > case where many data values are equal to the pivot --- there's a
> > special
> > > case there to switch to insertion sort, and maybe that's getting
> > invoked
> > > too soon.
> >
> > Here are some cases known to make qsort go quadratic:
> > 1. Data already sorted
> > 2. Data reverse sorted
> > 3. Data organ-pipe sorted or ramp
> > 4. Almost all data of the same value
> >
> > There are probably other cases. Randomizing the pivot helps some, as
> > does check for in-order or reverse order partitions.
> >
> > Imagine if 1/3 of the partitions fall into a category that causes
> > quadratic behavior (have one of the above formats and have more than
> > CUTOFF elements in them).
> >
> > It is doubtful that the switch to insertion sort is causing any sort of
> > problems. It is only going to be invoked on tiny sets, for which it has
> > a fixed cost that is probably less that qsort() function calls on sets
> > of the same size.
> >
> > >It'd be useful to get a line-level profile of the behavior of
> > > this code in the slow cases...
> >
> > I guess that my in-order or presorted tests [which often arise when
> > there are very few distinct values] may solve the bad partition
> > problems. Don't forget that the algorithm is called recursively.
> >
> > > regards, tom lane
> > >
> > > ---------------------------(end of
> > broadcast)---------------------------
> > > TIP 3: Have you checked our extensive FAQ?
> > >
> > > http://www.postgresql.org/docs/faq
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
> >
>
> --
> Bruce Momjian http://candle.pha.pa.us
> SRA OSS, Inc. http://www.sraoss.com
>
> + If your life is a hard drive, Christ can be your backup. +
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas DCP SD 2006-03-02 19:07:53 Re: Automatic free space map filling
Previous Message Martijn van Oosterhout 2006-03-02 18:43:46 Re: [SQL] Interval subtracting

Browse pgsql-performance by date

  From Date Subject
Next Message Jozsef Szalay 2006-03-03 00:15:36 Like 'name%' is not using index
Previous Message Bruce Momjian 2006-03-02 18:17:49 Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)