Assert(TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin));

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Assert(TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin));
Date: 2025-03-15 17:49:32
Message-ID: 3aee3535-184d-423e-a5eb-05161e8b5617@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

while stress-testing a physical replication to test another patch, I've
repeatedly hit this assert in GetSnapshotDataReuse:

Assert(TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin));

AFAICS the backtrace always looks exactly the same - a couple frames
from the top:

#4 0x0000000000b5af36 in ExceptionalCondition (conditionName=0xe2a470
"TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)",
fileName=0xe29768 "procarray.c",
lineNumber=2132) at assert.c:66
#5 0x000000000094c2c3 in GetSnapshotDataReuse (snapshot=0x1079380
<CatalogSnapshotData>) at procarray.c:2132
#6 0x000000000094c4b0 in GetSnapshotData (snapshot=0x1079380
<CatalogSnapshotData>) at procarray.c:2233
#7 0x0000000000bb31f8 in GetNonHistoricCatalogSnapshot (relid=2615) at
snapmgr.c:412
#8 0x0000000000bb31a2 in GetCatalogSnapshot (relid=2615) at snapmgr.c:385
#9 0x0000000000493073 in systable_beginscan
(heapRelation=0x76cee3dc01d8, indexId=2684, indexOK=true, snapshot=0x0,
nkeys=1, key=0x7ffcaddd2f30) at genam.c:414
#10 0x0000000000b383c0 in SearchCatCacheMiss (cache=0xa99a880, nkeys=1,
hashValue=2798280003, hashIndex=3, v1=130630962093252, v2=0, v3=0, v4=0)
at catcache.c:1533
#11 0x0000000000b38290 in SearchCatCacheInternal (cache=0xa99a880,
nkeys=1, v1=130630962093252, v2=0, v3=0, v4=0) at catcache.c:1464
#12 0x0000000000b37f69 in SearchCatCache (cache=0xa99a880,
v1=130630962093252, v2=0, v3=0, v4=0) at catcache.c:1318
#13 0x0000000000b53df5 in SearchSysCache (cacheId=37,
key1=130630962093252, key2=0, key3=0, key4=0) at syscache.c:217
#14 0x0000000000b54318 in GetSysCacheOid (cacheId=37, oidcol=1,
key1=130630962093252, key2=0, key3=0, key4=0) at syscache.c:459

At first I assumed it's something my patch broke, but then I tried the
stress test on master, and I hit that assert again, so it's a
pre-existing issue.

The stress test is fairly simple - it creates a primary/standby cluster,
and then does pgbench on both nodes (read-write on primary, read-only on
standby), and also randomly restarts both nodes (in fast or immediate
modes). The script I use is attached.

It takes a while to hit it - on my laptop it takes an hour or so, but I
guess it's more about the random sleeps in the script.

I've only ever seen this on the standby, never on the primary.

regards

--
Tomas Vondra

Attachment Content-Type Size
backtrace.txt text/plain 5.3 KB
test2.sh application/x-shellscript 6.7 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2025-03-15 17:49:44 Re: Update Unicode data to Unicode 16.0.0
Previous Message Tomas Vondra 2025-03-15 17:32:57 Re: Changing the state of data checksums in a running cluster