Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tender Wang <tndrwang(at)gmail(dot)com>
Cc: exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index
Date: 2025-04-04 11:04:54
Message-ID: a1701e0f-935a-4560-9a25-a40fd45cb35e@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 15/11/2024 11:10, Tender Wang wrote:
> Heikki Linnakangas <hlinnaka(at)iki(dot)fi <mailto:hlinnaka(at)iki(dot)fi>> 于2024年3
> 月19日周二 22:26写道:
>
> On 19/03/2024 14:07, Tender Wang wrote:
> > Thanks for your report. I can reproduce this issue.
> > I try to delete the Assert, no coredump anymore.
> > I need some time to learn GiST to find the root cause.
> At first glance, I think the assertion is too strict. In
> gistFindCorrectParent(), if we walk right, we update the parent's block
> number and reset its memorized downlinkoffnum to
> InvalidOffsetNumber. If
> we later call gistFindCorrectParent() again with the same stack,
> because
> the parent also needs to be split, we hit that assertion. But it's
> OK in
> that case, we don't know the downlink because we had moved right.
> Attached patch relaxes that.
>
>
> I picked up this issue again, and I took some time to dig into it.
> The patch works for me.
> Can we update the comments above the Assert together?  If we go to the
> Assert() code,
> The comments say that the page has changed or we are in index build process.
> But in the issue, the page didn't change(because we created temp table,
> no concurrent), but we were not in
> index build process, too. So the comments cannot reflect this case.
>
>
> But now I'm having some second thoughts. gistFindCorrectParent() looks
> like this:
>
> >       /*
> >        * Scan the page to re-find the downlink. If the page was
> split, it might
> >        * have moved to a different page, so follow the right
> links until we find
> >        * it.
> >        */
> >       while (true)
> >       {
> >               OffsetNumber i;
> >
> >               maxoff = PageGetMaxOffsetNumber(parent->page);
> >               for (i = FirstOffsetNumber; i <= maxoff; i =
> OffsetNumberNext(i))
> >               {
> >                       iid = PageGetItemId(parent->page, i);
> >                       idxtuple = (IndexTuple) PageGetItem(parent-
> >page, iid);
> >                       if (ItemPointerGetBlockNumber(&(idxtuple-
> >t_tid)) == child->blkno)
> >                       {
> >                               /* yes!!, found */
> >                               child->downlinkoffnum = i;
> >                               return;
> >                       }
> >               }
> >
> >               parent->blkno = GistPageGetOpaque(parent->page)-
> >rightlink;
> >               parent->downlinkoffnum = InvalidOffsetNumber;
> >               UnlockReleaseBuffer(parent->buffer);
> >               if (parent->blkno == InvalidBlockNumber)
> >               {
> >                       /*
> >                        * End of chain and still didn't find
> parent. It's a very-very
> >                        * rare situation when the root was split.
> >                        */
> >                       break;
> >               }
> >               parent->buffer = ReadBuffer(r, parent->blkno);
> >               LockBuffer(parent->buffer, GIST_EXCLUSIVE);
> >               gistcheckpage(r, parent->buffer);
> >               parent->page = (Page) BufferGetPage(parent->buffer);
> >       }
>
> When we step right and update parent->blkno, shouldn't we also update
> parent->lsn? Otherwise, we might fail to detect a concurrent page split
> with the LSN-NSN interlock checks. Or maybe it's correct, and we should
> indeed not update the memorized LSN. Or maybe it doesn't matter because
> we retry from the parent anyway, thanks to the 'retry_from_parent'
> flag?
> I'm not sure..
>
> I agree that it doesn't matter. If internal page split due to adjust key
> consistent with the key we're inserting.
> The 'retry_from_parent' flag can help us get corrent downlinkoffnum.

Ok, thanks for checking! Committed.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Manika Singhal 2025-04-04 12:30:13 Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows
Previous Message PG Bug reporting form 2025-04-04 03:05:12 BUG #18876: HINT messages for mxid wrap-around say "drop stale slots", but that may not be appropriate