From: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
---|---|
To: | Gregory Stark <stark(at)enterprisedb(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Gregory Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-patches <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: LIMIT/SORT optimization |
Date: | 2007-03-28 09:00:15 |
Message-ID: | 460A2E9F.7020508@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
Some comments on the patch below.
Gregory Stark wrote:
> + /* tuplesort_set_bound - External API to set a bound on a tuplesort
> + *
> + * Must be called before inserting any tuples.
> +
> + * Sets a maximum number of tuples the caller is interested in. The first
> + * <bound> tuples are maintained using a simple insertion sort and returned
> + * normally. Any tuples that lie after those in the sorted result set are
> + * simply thrown out
> + */
The "Must be called before inserting any tuples" is in contradiction
with the comment in the header file:
> + /* This can be called at any time before performsort to advise tuplesort that
> + * only this many tuples are interesting. If that many tuples fit in memory and
> + * we haven't already overflowed to disk then tuplesort will switch to a simple
> + * insertion sort or heap sort and throw away the uninteresting tuples.
> + */
The latter seems to be correct.
> ! /*
> ! * Convert the existing unordered list of sorttuples to a heap in either order.
> ! * This used to be inline but now there are three separate places we heap sort
> ! * (initializing the tapes, if we have a bounded output, and any time the user
> ! * says he doesn't want to use glibc's qsort).
> ! *
> ! * NOTE heapify passes false for checkIndex (and stores a constant tupindex
> ! * passed as a parameter) even though we use heaps for multi-run sources
> ! * because we only heapify when we're doing in-memory sorts or in inittapes
> ! * before there's any point in comparing tupindexes.
> ! */
> !
> ! static void
> ! tuplesort_heapify(Tuplesortstate *state, int tupindex, HeapOrder heaporder)
> ! {
The comment claims that we use heap sort when the user says he doesn't
want to use glibc's qsort. I recall that we always use our own qsort
implementation nowadays. And we never used the heap sort as a qsort
replacement, did we?
In performsort, you convert the in-memory heap to a sorted list in one
go. I wonder if it would be better to switch to a new TSS_ALLINHEAP
state that means "all tuples are now in the in-memory heap", and call
tuplesort_heap_siftup in gettuple. It probably doesn't make much
difference in most cases, but if there's another limit node in the plan
with a smaller limit or the client only fetches a few top rows with a
cursor you'd avoid unheapifying tuples that are just thrown away later.
There's a few blocks of code surrounded with "#if 0 - #endif". Are those
just leftovers that should be removed, or are things that still need to
finished and enabled?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2007-03-28 09:07:27 | Re: [PATCHES] Full page writes improvement, code update |
Previous Message | Magnus Hagander | 2007-03-28 08:36:54 | Re: O_DIRECT support for Windows |