From: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Subject: | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Date: | 2016-09-07 00:50:42 |
Message-ID: | CAGTBQpY9esgo4+qC=v3coMd_-uG1VCGQ27h5aLh+L6Uq0GfvfQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 6, 2016 at 9:19 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Tue, Sep 6, 2016 at 4:55 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> I noticed, but here n = state->memtupcount
>>
>> + Assert(memtuples[0].tupindex == newtup->tupindex);
>> +
>> + CHECK_FOR_INTERRUPTS();
>> +
>> + n = state->memtupcount; /* n is heap's size,
>> including old root */
>> + imin = 0; /*
>> start with caller's "hole" in root */
>> + i = imin;
>
> I'm fine with using "n" in the later assertion you mentioned, if
> that's clearer to you. memtupcount is broken out as "n" simply because
> that's less verbose, in a place where that makes things far clearer.
>
>> In fact, the assert on the patch would allow writing memtuples outside
>> the heap, as in calling tuplesort_heap_root_displace if
>> memtupcount==0, but I don't think that should be legal (memtuples[0]
>> == memtuples[imin] would be outside the heap).
>
> You have to have a valid heap (i.e. there must be at least one
> element) to call tuplesort_heap_root_displace(), and it doesn't
> directly compact the heap, so it must remain valid on return. The
> assertion exists to make sure that everything is okay with a
> one-element heap, a case which is quite possible.
More than using "n" or "memtupcount" what I'm saying is to assert that
memtuples[imin] is inside the heap, which would catch the same errors
the original assert would, and more.
Assert(imin < state->memtupcount)
If you prefer.
The original asserts allows any value of imin for memtupcount>1, and
that's my main concern. It shouldn't.
On Tue, Sep 6, 2016 at 9:19 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> Sure, that's a weird enough case (that assert up there already reads
>> memtuples[0] which would be equally illegal if memtupcount==0), but it
>> goes on to show that the assert expression just seems odd for its
>> intent.
>>
>> BTW, I know it's not the scope of the patch, but shouldn't
>> root_displace be usable on the TSS_BOUNDED phase?
>
> I don't think it should be, no. With a top-n heap sort, the
> expectation is that after a little while, we can immediately determine
> that most tuples do not belong in the heap (this will require more
> than one comparison per tuple when the tuple that may be entered into
> the heap will in fact go in the heap, which should be fairly rare
> after a time). That's why that general strategy can be so much faster,
> of course.
I wasn't proposing getting rid of that optimization, but just
replacing the siftup+insert step with root_displace...
> Note that that heap is "reversed" -- the sort order is inverted, so
> that we can use a minheap. The top of the heap is the most marginal
> tuple in the top-n heap so far, and so is the next to be removed from
> consideration entirely (not the next to be returned to caller, when
> merging).
...but I didn't pause to consider that point.
It still looks like a valid optimization, instead rearranging the heap
twice (siftup + insert), do it once (replace + relocate).
However, I agree that it's not worth the risk conflating the two
optimizations. That one can be done later as a separate patch.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2016-09-07 00:52:27 | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Previous Message | Peter Geoghegan | 2016-09-07 00:19:05 | Re: Parallel tuplesort (for parallel B-Tree index creation) |