Quick Links

Re: Parallel tuplesort (for parallel B-Tree index creation)

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Peter Geoghegan <pg(at)heroku(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject:	Re: Parallel tuplesort (for parallel B-Tree index creation)
Date:	2016-09-06 07:08:52
Message-ID:	286647c2-761a-0e8b-7db9-ea508c2316ba@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 08/16/2016 03:33 AM, Peter Geoghegan wrote:
> I attach a patch that changes how we maintain the heap invariant
> during tuplesort merging. I already mentioned this over on the
> "Parallel tuplesort, partitioning, merging, and the future" thread. As
> noted already on that thread, this patch makes merging clustered
> numeric input about 2.1x faster overall in one case, which is
> particularly useful in the context of a serial final/leader merge
> during a parallel CREATE INDEX. Even *random* non-C-collated text
> input is made significantly faster. This work is totally orthogonal to
> parallelism, though; it's just very timely, given our discussion of
> the merge bottleneck on this thread.

Nice!

> The patch makes tuplesort merging shift down and displace the root
> tuple with the tape's next preread tuple, rather than compacting and
> then inserting into the heap anew. This approach to maintaining the
> heap as tuples are returned to caller will always produce fewer
> comparisons overall. The new approach is also simpler. We were already
> shifting down to compact the heap within the misleadingly named [2]
> function tuplesort_heap_siftup() -- why not instead just use the
> caller tuple (the tuple that we currently go on to insert) when
> initially shifting down (not the heap's preexisting last tuple, which
> is guaranteed to go straight to the leaf level again)? That way, we
> don't need to enlarge the heap at all through insertion, shifting up,
> etc. We're done, and are *guaranteed* to have performed less work
> (fewer comparisons and swaps) than with the existing approach (this is
> the reason for my optimism about getting this stuff out of the way
> early).

Makes sense.

> This new approach is more or less the *conventional* way to maintain
> the heap invariant when returning elements from a heap during k-way
> merging. Our existing approach is convoluted; merging was presumably
> only coded that way because the generic functions
> tuplesort_heap_siftup() and tuplesort_heap_insert() happened to be
> available. Perhaps the problem was masked by unrelated bottlenecks
> that existed at the time, too.

Yeah, this seems like a very obvious optimization. Is there a standard
name for this technique in the literature? I'm OK with "displace", or
perhaps just "replace" or "siftup+insert", but if there's a standard
name for this, let's use that.

- Heikki

In response to

Re: Parallel tuplesort (for parallel B-Tree index creation) at 2016-08-16 00:33:38 from Peter Geoghegan

Responses

Re: Parallel tuplesort (for parallel B-Tree index creation) at 2016-09-06 19:39:39 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2016-09-06 07:11:52	Re: pgsql: Add putenv support for msvcrt from Visual Studio 2013
Previous Message	Abhijit Menon-Sen	2016-09-06 07:06:55	Re: Proposal for changes to recovery.conf API