From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Failure while inserting parent tuple to B-tree is not fun |
Date: | 2014-01-31 17:09:00 |
Message-ID: | 52EBD8AC.8070609@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 01/30/2014 12:46 AM, Peter Geoghegan wrote:
> On Mon, Jan 27, 2014 at 10:54 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com> wrote:
>>>> I think I see some bugs in _bt_moveright(). If you examine
>>>> _bt_finish_split() in detail, you'll see that it doesn't just drop the
>>>> write buffer lock that the caller will have provided (per its
>>>> comments) - it also drops the buffer pin. It calls _bt_insert_parent()
>>>> last, which was previously only called last thing by _bt_insertonpg()
>>>> (some of the time), and _bt_insertonpg() is indeed documented to drop
>>>> both the lock and the pin. And if you look at _bt_moveright() again,
>>>> you'll see that there is a tacit assumption that the buffer pin isn't
>>>> dropped, or at least that "opaque" (the BTPageOpaque BT special page
>>>> area pointer) continues to point to the same page held in the same
>>>> buffer, even though the code doesn't set the "opaque' pointer again
>>>> and it may not point to a pinned buffer or even the appropriate
>>>> buffer. Ditto "page". So "opaque" may point to the wrong thing on
>>>> subsequent iterations - you don't do anything with the return value of
>>>> _bt_getbuf(), unlike all of the existing call sites. I believe you
>>>> should store its return value, and get the held page and the special
>>>> pointer on the page, and assign that to the "opaque" pointer before
>>>> the next iteration (an iteration in which you very probably really do
>>>> make progress not limited to completing a page split, the occurrence
>>>> of which the _bt_moveright() loop gets "stuck on"). You know, do what
>>>> is done in the non-split-handling case. It may not always be the same
>>>> buffer as before, obviously.
>>>
>>> Yep, fixed.
>>
>> Can you explain what the fix was, please?
>
> Ping? I would like to hear some details here.
I refactored the loop in _bt_moveright to, well, not have that bug
anymore. The 'page' and 'opaque' pointers are now fetched at the
beginning of the loop. Did I miss something?
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Vik Fearing | 2014-01-31 17:16:18 | Re: CREATE FOREIGN TABLE ( ... LIKE ... ) |
Previous Message | Robert Haas | 2014-01-31 17:08:41 | Re: pg_basebackup and pg_stat_tmp directory |