Re: Failure while inserting parent tuple to B-tree is not fun

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failure while inserting parent tuple to B-tree is not fun
Date: 2014-03-18 18:54:52
Message-ID: 5328967C.5040706@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/06/2014 06:42 AM, Peter Geoghegan wrote:
> I'm not sure about this:
>
>> *************** _bt_findinsertloc(Relation rel,
>> *** 675,680 ****
>> --- 701,707 ----
>> static void
>> _bt_insertonpg(Relation rel,
>> Buffer buf,
>> + Buffer cbuf,
>> BTStack stack,
>> IndexTuple itup,
>> OffsetNumber newitemoff,
>
> Wouldn't lcbuf be a better name for the new argument? After all, in
> relevant contexts 'buf' is the parent of both of the pages post-split,
> but there are two children in play here. You say:
>
>> + * When inserting to a non-leaf page, 'cbuf' is the left-sibling of the
>> + * page we're inserting the downlink for. This function will clear the
>> + * INCOMPLETE_SPLIT flag on it, and release the buffer.
>
> I suggest also noting in comments at this point that cbuf/the
> left-sibling is the "old half" from the split.
>
> Even having vented about cbuf's name, I can kind of see why you didn't
> call it lcbuf: we already have an "BTPageOpaque lpageop" variable for
> the special 'buf' metadata within the _bt_insertonpg() function; so
> there might be an ambiguity between the possibly-soon-to-be-left page
> (the target page) and the *child* left page that needs to have its
> flag cleared. Although I still think you should change the name of
> "lpageop" (further details follow).
>
> Consider this:
>
>> /* release buffers */
>> + if (!P_ISLEAF(lpageop))
>> + _bt_relbuf(rel, cbuf);
>
> (Rebased, so this looks a bit different to your original version IIRC).
>
> This is checking that the _bt_insertonpg() target page (whose special
> data lpageop points to) is not a leaf page, and releasing the *child*
> left page if it isn't. This is pretty confusing. So I suggest naming
> "lpageop" "tpageop" ('t' for target, a terminology already used in the
> comments above the function).

I don't know, the buf/page/lpageop variable names are used in many
functions in nbtinsert.c. Perhaps they should all be renamed, but I
think it's fine as it is, and not this patch's responsibility anyway.
(The lpageop name makes sense when the page has to be split, and the
page becomes the left page. Otherwise, admittedly, not so much)

> Also, I suggest you change this existing code comment:
>
> * On entry, we must have the right buffer in which to do the
> * insertion, and the buffer must be pinned and write-locked. On return,
> * we will have dropped both the pin and the lock on the buffer.
>
> Change "right" to "correct" here. Minor point, but "right" is a loaded word.

Fixed.

> But why are you doing new things while checking P_ISLEAF(lpageop) in
> _bt_insertonpg() to begin with? Would you not be better off doing
> things when there is a child buffer passed (e.g. "if
> (BufferIsValid(cbuf))..."), and only then asserting
> !P_ISLEAF(lpageop)? That seems to me to more naturally establish the
> context of "_bt_insertonpg() is called to insert downlink after
> split". Although maybe that conflates things within "XLOG stuff". Hmm.

Changed that way for the places where we deal with the incomplete-split
flag.

I committed the patch now. Thanks for the review!

Before committing, I spotted a bug in the way the new page-deletion code
interacts with this: there was a check that the page that's about to be
deleted was not marked with the INCOMPLETE_SPLIT flag, it was possible
that the page was the right half of an incomplete split, and so didn't
have a downlink. Vacuuming that failed with an "failed to re-find parent
key" error. I fixed that by checking that the left sibling is also not
marked with the flag.

It would be fairly easy to delete a page that is the right half of an
incomplete split. Such a page doesn't have a downlink pointing to it, so
just skip removing it, and instead clear the INCOMPLETE_SPLIT flag in
the left sibling. But deleting incompletely split pages isn't important
from a disk-space point of view, they should be extremely rare, so I
decided to just punt.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-03-18 18:55:40 Re: pg_archivecleanup bug
Previous Message Simon Riggs 2014-03-18 18:47:25 Re: pg_archivecleanup bug