Quick Links

Re: Progress on fast path sorting, btree index creation time

From:	Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, "Jim Decibel! Nasby" <decibel(at)decibel(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Progress on fast path sorting, btree index creation time
Date:	2012-02-09 14:37:33
Message-ID:	CAEYLb_Wg3qHH2zM0gpSU9MLMeN-VcLqch1YDk7efJg0TqNXwog@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 9 February 2012 13:50, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm also more than slightly concerned that we are losing sight of the
> forest for the trees. I have heard reports that sorting large amounts
> of data is many TIMES slower for us than for a certain other database
> product. I frankly wonder if this entire patch is a distraction.
> When inlining is the tool of choice for further performance
> improvement, it suggests that we're pretty much out of ideas, and that
> whatever we get out of inlining - be it 6% or 30% - will be the end.
> I don't like that idea very much.

If you're talking about the product I think you're talking about,
well, their contracts forbid any actual benchmarks, so we won't have
any hard figures, but I find it intuitively difficult to believe that
the difference is that large, mostly because I've demonstrated that
the performance of qsort is a pretty good proxy for the performance of
a query like "select * from foo order by bar", and qsort can't be too
far from optimal. Besides, didn't you yourself say "I've been
skeptical of this patch for a number of reasons, the major one of
which is that most workloads spend only a very small amount of time in
doing qucksorts"?

I have a new plan. I think you'll like it:

* drop the type specific specialisations entirely.

* drop qsort_arg (the original, unspecialised version). We now have
exactly the same number of source code copies of qsort as before: 2.

* gut the comparison of the leading sort key only from
comparetup_heap, comparetup_index_btree, etc (I collectively refer to
these functions as tuple-class encapsulating comparators, distinct
from their comparator proper).

* Instantiate two specialisations of qsort_arg: single key and
multiple key (exactly the same number as has already been agreed to,
since we lost the generic version). However, there is one key
difference now: we call one of the tuple class encapsulating functions
through a function pointer if the first comparison gives equality - .
Early indications are that this is almost as much of a win as the
single-key case. So the code effectively does this (but through a
function pointer):

#define MULTI_ADDITIONAL_CODE(COMPAR) \
{ \
return comparetup_heap(aT, bT, state); \
}

This works because most comparisons won't actually need to look at the
second key, and because the cut-down tuple-class encapsulating
comparator that is used in the last revision of the patch doesn't
actually make any assumption about the particular tuple-class
encapsulating comparator in use.

It may not even be worth-while having a separate copy of the qsort
function for the multiple key case, so the question of binary bloat
becomes entirely irrelevant, as there would be a net gain of zero
copies of qsort_arg.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Re: Progress on fast path sorting, btree index creation time at 2012-02-09 13:50:22 from Robert Haas

Responses

Re: Progress on fast path sorting, btree index creation time at 2012-02-09 14:51:05 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2012-02-09 14:42:21	Re: Vacuum rate limit in KBps
Previous Message	Robert Haas	2012-02-09 13:50:22	Re: Progress on fast path sorting, btree index creation time