Re: Snapshot related assert failure on skink

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Snapshot related assert failure on skink
Date: 2025-03-24 14:56:18
Message-ID: 82a7e769-8189-401c-a666-248e9cef2a04@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/23/25 17:43, Heikki Linnakangas wrote:
> On 21/03/2025 17:16, Andres Freund wrote:
>> Am I right in understanding that the only scenario (when in
>> STANDBY_SNAPSHOT_READY), where ExpireOldKnownAssignedTransactionIds()
>> would
>> "legally" remove a transaction, rather than the commit / abort records
>> doing
>> so, is if the primary crash-restarted while transactions were ongoing?
>>
>> Those transactions won't have a commit/abort records and thus won't
>> trigger
>> ExpireTreeKnownAssignedTransactionIds(), which otherwise would have
>> updated
>> ->xactCompletionCount?
>
> Correct.
>
>> When writing the snapshot caching patch, I tried to make sure that all
>> the
>> places that maintain ->latestCompletedXid also update
>> ->xactCompletionCount. Afaict that's still the case. Which, I think,
>> means
>> that we're also missing calls to MaintainLatestCompletedXidRecovery()?
>
> Yep, I was just looking at that too.
>
>> If latestCompletedXid is incorrect visibility determinations end up
>> wrong...
>
> I think it happens to work, because the transactions are effectively
> aborted. latestCompletedXid is used to initialize xmax in
> GetSnapshotData. If, for example, latestCompletedXid is incorrectly set
> to 1000 even though XID 1001 already aborted, a snapshot with xmax=1000
> still correctly considers XID 1001 as "not visible". As soon as a
> transaction commits, latestCompletedXid is fixed.
>
> AFAICS we could skip updating latestCompletedXid on aborts altogether
> and rename it to latestCommittedXid. But it's hardly worth optimizing
> for aborts.
>
> For the same reason, I believe the assertion failure we're discussing
> here is also harmless. Even though the reused snapshot has a smaller
> xmin than expected, all those transactions aborted and are thus not
> visible anyway.
>
> In any case, let's be tidy and fix both latestCompletedXid and
> xactCompletionCount.
>

Thanks for looking into this and pushing the fix.

Would it make sense to add a comment documenting this reasoning about
not handling aborts? Otherwise someone will get to rediscover this in
the future ...

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Shaplov 2025-03-24 15:11:32 Re: vacuum_truncate configuration parameter and isset_offset
Previous Message Robert Haas 2025-03-24 14:48:25 Re: vacuum_truncate configuration parameter and isset_offset