From: | Daniel Wood <hexexpert(at)comcast(dot)net> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: BTP_DELETED leaf still in tree |
Date: | 2019-10-10 23:44:46 |
Message-ID: | 1430933023.970983.1570751086396@connect.xfinity.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On October 10, 2019 at 1:18 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
>
> On Thu, Oct 10, 2019 at 12:48 PM Daniel Wood <hexexpert(at)comcast(dot)net> wrote:
> > Update query stuck in a loop. Looping in _bt_moveright().
>
> You didn't say which PostgreSQL versions were involved, and if the
> database was ever upgraded using pg_upgrade. Those details could
> matter.
PG_VERSION says 10. I suspect we are running 10.9. I have no idea if pg_upgrade was ever done.
> > ExecInsertIndexTuples->btinsert->_bt_doinsert->_bt_search->_bt_moveright
> >
> > Mid Tree Node downlink path taken by _bt_search points to a BTP_DELETED Leaf.
>
> This should hardly ever happen -- it is barely possible for an index
> scan to land on a BTP_DELETED leaf page (or a half-dead page) when
> following a downlink in its parent. Recall that nbtree uses Lehman &
> Yao's design, so _bt_search() does not "couple" buffer locks on the
> way down. It would probably be impossible to observe this happening
> without carefully setting breakpoints in multiple sessions.
>
> If this happens reliably for you, which it sounds like, then you can
> already assume that the index is corrupt.
>
> > btpo_next is also DELETED but not in the tree.
> >
> > btpo_next->btpo_next is NOT deleted but in the mid tree as a lesser key value.
> >
> > Thus creating an endless loop in moveright.
>
> Offhand, these other details sound normal. The side links are still
> needed in fully deleted (BTP_DELETED) pages. And, moving right and
> finding lesser key values (not greater key values) is normal with
> deleted pages, since page deletion makes the keyspace move right, not
> left (moving the keyspace left is how the source Lanin & Shasha paper
> does it, though).
>
> Actually, I take it back -- the looping part is not normal. The
> btpo_next->btpo_next page has no business linking back to the
> original/first deleted page you mentioned. That's just odd.
btpo_next->btpo_next does NOT link directly back to the 1st deleted page. It simply links to some in-use page which is 50 or so leaf pages back in the tree. Eventually we do reach the two deleted pages again. Only the first one is in the 'tree'.
> Can you provide me with a dump of the page images? The easiest way of
> getting a page dump is described here:
Customer data. Looks like meaningless customer data (5 digit key values). But too much paperwork. :-)
The hard part for me to understand isn't just why the DELETED leaf node is still referenced in the mid tree node.
It is that the step which sets BTP_DELETED should have also linked its leaf and right siblings together. But this hasn't been done.
Could the page have already have been dirty, but because of "target != leafblkno", we didn't stamp a new LSN on it. Could this allow us to write the DELETED dirty page without the XLOG_BTREE_MARK_PAGE_HALFDEAD and XLOG_BTREE_UNLINK_PAGE being flushed? Of course, I don't understand the "target != leafblkno".
In any case, thanks.
> https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump
>
> If I had to guess, I'd guess that this was due to a generic storage problem.
>
> --
> Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2019-10-11 00:22:07 | Re: BRIN index which is much faster never chosen by planner |
Previous Message | Tomas Vondra | 2019-10-10 23:13:09 | Re: BRIN index which is much faster never chosen by planner |