Re: collect_corrupt_items_vacuum.patch

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Dmitry Koval <d(dot)koval(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: collect_corrupt_items_vacuum.patch
Date: 2024-08-14 21:40:07
Message-ID: CAPpHfduFB-hsCYiMCHxdU-L7z15FZXd-RNgaMWvAw1sLy_SX9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 14, 2024 at 10:20 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 14/08/2024 04:51, Alexander Korotkov wrote:
> > On Tue, Aug 13, 2024 at 10:15 PM Alexander Korotkov
> > <aekorotkov(at)gmail(dot)com> wrote:
> >> On Tue, Aug 13, 2024 at 9:39 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >>>
> >>> This causes an assertion failure when executed in a hot standby server:
> >>>
> >>> select * from pg_check_visible('pg_database');
> >>>
> >>> TRAP: failed Assert("!RecoveryInProgress()"), File:
> >>> "../src/backend/storage/ipc/procarray.c", Line: 2710, PID: 1142572
> >>>
> >>> GetStrictOldestNonRemovableTransactionId does this:
> >>>
> >>>> if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
> >>>> {
> >>>> /* Shared relation: take into account all running xids */
> >>>> runningTransactions = GetRunningTransactionData();
> >>>> LWLockRelease(ProcArrayLock);
> >>>> LWLockRelease(XidGenLock);
> >>>> return runningTransactions->oldestRunningXid;
> >>>> }
> >>>
> >>> And GetRunningTransactionData() has this:
> >>>
> >>>> Assert(!RecoveryInProgress());
> >>>
> >>> So it's easy to see that you will hit that assertion.
> >>
> >> Oh, thank you!
> >> I'll fix this and add a test for recovery!
> >
> > Attached patch fixes the problem and adds the corresponding test. I
> > would appreciate if you take a look at it.
>
> The code changes seem fine. I think the "Ignore KnownAssignedXids"
> comment above the function could be made more clear. It's not wrong, but
> I think it doesn't explain the reasoning very well:
>
> * We are now doing no effectively no checking in a standby, because we
> always just use nextXid. It's better than nothing, I suppose it will
> catch very broken cases where an XID is in the future, but that's all.
>
> * We *could* use KnownAssignedXids for shared catalogs, because with
> shared catalogs, the global horizon is used, not a database-aware one.
>
> * Then again, there might be rare corner cases that a transaction has
> crashed in the primary without writing a commit/abort record, and hence
> it looks like it's still running in the standby but has already ended in
> the primary. So I think it's good we ignore KnownAssignedXids for shared
> catalogs anyway.

Thank you for the detailed explanation. I've updated the
GetStrictOldestNonRemovableTransactionId() header comment accordingly.
I'm going to push this if no objections.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v2-0001-Fix-GetStrictOldestNonRemovableTransactionId-on-s.patch application/octet-stream 4.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-08-14 22:40:29 Opinion poll: Sending an automated email to a thread when it gets added to the commitfest
Previous Message Imseih (AWS), Sami 2024-08-14 21:05:59 Re: query_id, pg_stat_activity, extended query protocol