From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: pgsql: Optimize btree insertions for common case of increasing values |
Date: | 2018-03-29 18:05:01 |
Message-ID: | CAH2-WznpGDHev3QTeFA2XO-51czgLt=61C+fWvHAe+V7K97NGA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
On Thu, Mar 29, 2018 at 7:59 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Another issue that I have with the main test of the
>> RelationSetTargetBlock() page is that nobody explains *why* we check
>> that we have enough space on the page to proceed. Why would it not be
>> okay if there was a page split?
>
> ...because we need the Stack to bubble up the parent insert.
As I said, strictly speaking we do not.
>> Although it's subtle, I'm pretty confident that it actually would be
>> okay from a correctness point of view to allow an insertion to go
>> ahead that would result in a page split -- it's possible to pass a
>> NULL stack to _bt_findinsertloc() and _bt_insertonpg() while still
>> getting correct behavior.
> It's a linear scan, so quite clearly not going to perform well on big indexes.
>
> Why would that be worth pursuing?
I must have been unclear. I agree that it isn't worth pursuing. My
point is that the fact that that actually works (e.g. it won't
segfault) could mask bugs where the "we must avoid a pagesplit" logic
breaks in the future. Relying on that working from a distance is
something that we can and should verify works out, with an assertion.
>> Suggested next steps to deal with this:
>>
>> * A minor point: I don't think you should call
>> RelationSetTargetBlock() when the page P_ISROOT(), which, as I
>> mentioned, is a condition that can coexist with P_ISLEAF() with very
>> small B-Trees. There can be no point in doing so. No need to add
>> P_ISROOT() to the main "Is cached page stale?" test within
>> _bt_doinsert(), though.
>
> True. A better test would be to not use the optimization at all on
> smaller btrees by checking the level is >= 3.
That might be a better idea, though I would probably go with >= 2. You
can have a reasonably big B-Tree with only one layer of internal pages
between the root and the leaf level. Though that's not worth quibbling
about.
>> * An assertion would make me feel a lot better about depending on not
>> having a page split from a significant distance.
>>
>> Your optimization should continue to not be used when it would result
>> in a page split, but only because that would be slow. The comments
>> should say so, IMV. Also, _bt_insertonpg() should have an assertion
>> against a page split actually occurring when the optimization was
>> used, just in case. When !BufferIsValid(cbuf), we know that we're
>> being called from _bt_doinsert() (see existing assertion at the top of
>> _bt_insertonpg() as an example of this), so at the point where it's
>> clear a page split is needed, we should assert that there is no target
>> block that we must have been passed as the target page.
>
> This is the same issues as the NULL stack. The page split would need
> to insert into parent.
But, to repeat myself, it actually can do that. Try removing the space
check in the main _bt_doinsert() test for yourself -- the regression
tests continue to pass. I would like there to be an assertion failure
instead.
> I don't see any need to change any of these things. This is a tuning
> patch and none of the above affects correctness of what has been
> committed.
I think that there has been a misunderstanding. I'm only asking for a
new assertion when I talk about the stack. That's all.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-03-29 18:16:52 | Re: pgsql: Add documentation for the JIT feature. |
Previous Message | Andres Freund | 2018-03-29 18:01:16 | Re: pgsql: Add documentation for the JIT feature. |