Re: Snapshot related assert failure on skink

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Snapshot related assert failure on skink
Date: 2025-03-24 15:25:54
Message-ID: 76bcbe27-4408-4f01-8623-4a6eff782312@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24/03/2025 16:56, Tomas Vondra wrote:
>
>
> On 3/23/25 17:43, Heikki Linnakangas wrote:
>> On 21/03/2025 17:16, Andres Freund wrote:
>>> Am I right in understanding that the only scenario (when in
>>> STANDBY_SNAPSHOT_READY), where ExpireOldKnownAssignedTransactionIds()
>>> would
>>> "legally" remove a transaction, rather than the commit / abort records
>>> doing
>>> so, is if the primary crash-restarted while transactions were ongoing?
>>>
>>> Those transactions won't have a commit/abort records and thus won't
>>> trigger
>>> ExpireTreeKnownAssignedTransactionIds(), which otherwise would have
>>> updated
>>> ->xactCompletionCount?
>>
>> Correct.
>>
>>> When writing the snapshot caching patch, I tried to make sure that all
>>> the
>>> places that maintain ->latestCompletedXid also update
>>> ->xactCompletionCount. Afaict that's still the case. Which, I think,
>>> means
>>> that we're also missing calls to MaintainLatestCompletedXidRecovery()?
>>
>> Yep, I was just looking at that too.
>>
>>> If latestCompletedXid is incorrect visibility determinations end up
>>> wrong...
>>
>> I think it happens to work, because the transactions are effectively
>> aborted. latestCompletedXid is used to initialize xmax in
>> GetSnapshotData. If, for example, latestCompletedXid is incorrectly set
>> to 1000 even though XID 1001 already aborted, a snapshot with xmax=1000
>> still correctly considers XID 1001 as "not visible". As soon as a
>> transaction commits, latestCompletedXid is fixed.
>>
>> AFAICS we could skip updating latestCompletedXid on aborts altogether
>> and rename it to latestCommittedXid. But it's hardly worth optimizing
>> for aborts.
>>
>> For the same reason, I believe the assertion failure we're discussing
>> here is also harmless. Even though the reused snapshot has a smaller
>> xmin than expected, all those transactions aborted and are thus not
>> visible anyway.
>>
>> In any case, let's be tidy and fix both latestCompletedXid and
>> xactCompletionCount.
>>
>
> Thanks for looking into this and pushing the fix.
>
> Would it make sense to add a comment documenting this reasoning about
> not handling aborts? Otherwise someone will get to rediscover this in
> the future ...

Well, we're currently _not_ relying on the reasoning about aborts not
changing visibility. So it's not like someone will forget the reasoning
and have a bug as a result. I guess someone could rediscover that
reasoning and think that they can skip advancing those counters on
aborts as an optimization, re-introducing the assertion failure. But I
believe that was not why we had this bug, it was just a simple omission.

So the comment would look like this:

"In principle, we would not need to advance xactCompletionCount and
latestCompletedXid because aborting transactions don't change
visibility. But that could make xmin within a transaction go backwards,
if a snapshot with an older xmin but same xactCompletionCount is reused,
triggering the assertion in GetSnapshotDataReuse()."

If we add that, I guess it should go into ProcArrayEndTransaction()
which currently just notes that it's used for commits and aborts
interchangeably. Do you think it's worth it? Want to propose a patch?

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2025-03-24 15:30:47 Re: vacuum_truncate configuration parameter and isset_offset
Previous Message Robert Haas 2025-03-24 15:25:44 Re: vacuum_truncate configuration parameter and isset_offset