Re: collect_corrupt_items_vacuum.patch

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Dmitry Koval <d(dot)koval(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: collect_corrupt_items_vacuum.patch
Date: 2024-08-14 07:20:52
Message-ID: fefa1a4c-19b2-44c8-b1ac-2bc14a0f97c7@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14/08/2024 04:51, Alexander Korotkov wrote:
> On Tue, Aug 13, 2024 at 10:15 PM Alexander Korotkov
> <aekorotkov(at)gmail(dot)com> wrote:
>> On Tue, Aug 13, 2024 at 9:39 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>>
>>> This causes an assertion failure when executed in a hot standby server:
>>>
>>> select * from pg_check_visible('pg_database');
>>>
>>> TRAP: failed Assert("!RecoveryInProgress()"), File:
>>> "../src/backend/storage/ipc/procarray.c", Line: 2710, PID: 1142572
>>>
>>> GetStrictOldestNonRemovableTransactionId does this:
>>>
>>>> if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
>>>> {
>>>> /* Shared relation: take into account all running xids */
>>>> runningTransactions = GetRunningTransactionData();
>>>> LWLockRelease(ProcArrayLock);
>>>> LWLockRelease(XidGenLock);
>>>> return runningTransactions->oldestRunningXid;
>>>> }
>>>
>>> And GetRunningTransactionData() has this:
>>>
>>>> Assert(!RecoveryInProgress());
>>>
>>> So it's easy to see that you will hit that assertion.
>>
>> Oh, thank you!
>> I'll fix this and add a test for recovery!
>
> Attached patch fixes the problem and adds the corresponding test. I
> would appreciate if you take a look at it.

The code changes seem fine. I think the "Ignore KnownAssignedXids"
comment above the function could be made more clear. It's not wrong, but
I think it doesn't explain the reasoning very well:

* We are now doing no effectively no checking in a standby, because we
always just use nextXid. It's better than nothing, I suppose it will
catch very broken cases where an XID is in the future, but that's all.

* We *could* use KnownAssignedXids for shared catalogs, because with
shared catalogs, the global horizon is used, not a database-aware one.

* Then again, there might be rare corner cases that a transaction has
crashed in the primary without writing a commit/abort record, and hence
it looks like it's still running in the standby but has already ended in
the primary. So I think it's good we ignore KnownAssignedXids for shared
catalogs anyway.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-08-14 07:46:38 Re: AIX support
Previous Message Peter Eisentraut 2024-08-14 07:03:55 macOS prefetching support