Re: BUG #16582: Logical index corruption leading to apparent index scan infinite loop

From: James Lucas <jlucasdba(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16582: Logical index corruption leading to apparent index scan infinite loop
Date: 2020-08-17 16:21:42
Message-ID: CAAFmbbOnCtds-Q5vOAmTMBm5sAvBpQhc474zq+LMCidSjgt11A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Forgot to say, I don't think I can run bt_index_parent_check() right
now due to the broader locks required. I will try to get a run in if
I get an opportunity.

Thanks,
James

On Mon, Aug 17, 2020 at 10:51 AM James Lucas <jlucasdba(at)gmail(dot)com> wrote:
>
> Hi Peter,
>
> I re-ran with DEBUG2 messages enabled. Got a bunch of output, but the
> last few lines are like this for each index:
>
> DEBUG: level 965868789 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> DEBUG: level 966240004 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> ERROR: cross page item order invariant violated for index "xxxxx"
> DETAIL: Last item on page tid=(xx,xx) page lsn=xxxxxxxxxx
>
> DEBUG: level 967745369 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> DEBUG: level 967746918 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> ERROR: cross page item order invariant violated for index "xxxxx"
> DETAIL: Last item on page tid=(xx,xx) page lsn=xxxxxxxxxx
>
>
> Not sure if pageinspect might be able to tell anything else useful?
> I'd like to find the root cause of the corruption if possible, so this
> doesn't happen in other databases.
>
> Also wanted to see if it might be a good idea to add a
> CHECK_FOR_INTERRUPTS call to _bt_moveright() so if this does happen
> again, at least the session would be killable. I don't have enough
> background in the code to know where it's safe to add, or I'd submit a
> patch.
>
> Thanks,
> James
>
> On Fri, Aug 14, 2020 at 4:33 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> >
> > On Fri, Aug 14, 2020 at 2:03 PM PG Bug reporting form
> > <noreply(at)postgresql(dot)org> wrote:
> > > The table has two indexes, so I decided to scan both indexes on all
> > > partitions with the bt_index_check function from the amcheck extension. I
> > > identified one partition where both indexes throw the following result:
> > > ERROR: cross page item order invariant violated for index "xxxxx"
> > > DETAIL: Last item on page tid(xx,xx) page lsn=xxxxxxxxxx
> >
> > This sounds very much like an index with sibling pages that are in the
> > wrong order relative to each other. That's totally consistent with
> > what you describe with _bt_moveright() -- circular sibling links can
> > cause it to just keep going.
> >
> > It's possible that you'll get a better error with
> > bt_index_parent_check(), which might be worth trying. But it probably
> > won't give you any additional information.
> >
> > Note that there is DEBUG1 and DEBUG2 output from amcheck, which might
> > give you a few more details. You can "set client_min_messages =
> > 'debug2'" in an interactive session that runs bt_index_check() to see
> > some additional context. Again, this is unlikely to make all that much
> > difference.
> >
> > --
> > Peter Geoghegan

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2020-08-17 16:55:09 Re: Weird behaviour after update from 12.2 to 12.3 version
Previous Message James Lucas 2020-08-17 15:51:35 Re: BUG #16582: Logical index corruption leading to apparent index scan infinite loop