From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Subject: | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Date: | 2016-09-06 19:39:39 |
Message-ID: | CAM3SWZRMhJqFarRyFD0tAXcnZ0xf5ESuK6_QLZC70QxPBBP0Xw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 6, 2016 at 12:08 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> I attach a patch that changes how we maintain the heap invariant
>> during tuplesort merging.
> Nice!
Thanks!
>> This new approach is more or less the *conventional* way to maintain
>> the heap invariant when returning elements from a heap during k-way
>> merging. Our existing approach is convoluted; merging was presumably
>> only coded that way because the generic functions
>> tuplesort_heap_siftup() and tuplesort_heap_insert() happened to be
>> available. Perhaps the problem was masked by unrelated bottlenecks
>> that existed at the time, too.
>
>
> Yeah, this seems like a very obvious optimization. Is there a standard name
> for this technique in the literature? I'm OK with "displace", or perhaps
> just "replace" or "siftup+insert", but if there's a standard name for this,
> let's use that.
I used the term "displace" specifically because it wasn't a term with
a well-defined meaning in the context of the analysis of algorithms.
Just like "insert" isn't for tuplesort_heap_insert(). I'm not
particularly attached to the name tuplesort_heap_root_displace(), but
I do think that whatever it ends up being called should at least not
be named after an implementation detail. For example,
tuplesort_heap_root_replace() also seems fine.
I think that tuplesort_heap_siftup() should be called something like
tuplesort_heap_compact instead [1], since what it actually does
(shifting down -- the existing name is completely backwards!) is just
an implementation detail involved in compacting the heap (notice that
it decrements memtupcount, which, by now, means the k-way merge heap
gets one element smaller). I can write a patch to do this renaming, if
you're interested. Someone should fix it, because independent of all
this, it's just wrong.
[1] https://www.postgresql.org/message-id/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2016-09-06 19:42:42 | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Previous Message | Tomas Vondra | 2016-09-06 19:38:35 | Re: Speed up Clog Access by increasing CLOG buffers |