From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: Snapshot related assert failure on skink |
Date: | 2025-03-19 07:17:23 |
Message-ID: | 605d6217-1050-43c8-83f5-7c52598c54cc@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 19/03/2025 04:22, Tomas Vondra wrote:
> I kept stress-testing this, and while the frequency massively increased
> on PG18, I managed to reproduce this all the way back to PG14. I see
> ~100x more corefiles on PG18.
>
> That is not a proof the issue was introduced in PG14, maybe it's just
> the assert that was added there or something. Or maybe there's another
> bug in PG18, making the impact worse.
>
> But I'd suspect this is a bug in
>
> commit 623a9ba79bbdd11c5eccb30b8bd5c446130e521c
> Author: Andres Freund <andres(at)anarazel(dot)de>
> Date: Mon Aug 17 21:07:10 2020 -0700
>
> snapshot scalability: cache snapshots using a xact completion counter.
>
> Previous commits made it faster/more scalable to compute snapshots.
> But not
> building a snapshot is still faster. Now that GetSnapshotData() does not
> maintain RecentGlobal* anymore, that is actually not too hard:
>
> ...
Looking at the code, shouldn't ExpireAllKnownAssignedTransactionIds()
and ExpireOldKnownAssignedTransactionIds() update xactCompletionCount?
This can happen during hot standby:
1. Backend acquires snapshot A with xmin 1000
2. Startup process calls ExpireOldKnownAssignedTransactionIds(),
3. Backend acquires snapshot B with xmin 1050
4. Backend releases snapshot A, updating TransactionXmin to 1050
5. Backend acquires new snapshot, calls GetSnapshotDataReuse(), reusing
snapshot A's data.
Because xactCompletionCount is not updated in step 2, the
GetSnapshotDataReuse() call will reuse the snapshot A. But snapshot A
has a lower xmin.
--
Heikki Linnakangas
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Dolgov | 2025-03-19 07:31:15 | Re: pg_stat_statements and "IN" conditions |
Previous Message | Zhijie Hou (Fujitsu) | 2025-03-19 07:14:14 | RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility. |