Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae

From: Bowen Shi <zxwsbg12138(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date: 2024-05-17 03:35:53
Message-ID: CAM_vCuex_ideb_CQmi-gzkW7o3WaMx-eWLA-2GaGvug4Wew1pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On Fri, May 17, 2024 at 12:49 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Thu, May 16, 2024 at 12:38 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I'm wondering if there was index processing, due to the number of
> tuples. And
> > if so, what type of indexes. There'd need to be something that could
> lead to
> > new snapshots being acquired...
>
> Did you ever see this theory of mine, about B-Tree page deletion +
> recycling? See:
>
>
> https://www.postgresql.org/message-id/flat/CAH2-Wz%3DzLcnZO8MqPXQLqOLY%3DCAwQhdvs5Ncg6qMb5nMAam0EA%40mail.gmail.com#d058a6d4b8c8fa7d1ff14349b3a50c3c
>
> (And related nearby emails from me.)
>
> It looked very much like index vacuuming was involved in some way when
> I actually had the opportunity to use gdb against an affected
> production instance that ran into the problem.
>

In my case, the call stack is following:

#0 0x000000000050d50e in heap_page_prune (relation=relation(at)entry=0x2403028,

buffer=buffer(at)entry=67521, vistest=<optimized out>,
old_snap_xmin=old_snap_xmin(at)entry=0,
old_snap_ts=old_snap_ts(at)entry=0,
nnewlpdead=nnewlpdead(at)entry=0x7fff8be98ecc,

off_loc=off_loc(at)entry=0x234e3cc)
#1 0x0000000000510678 in lazy_scan_prune (vacrel=vacrel(at)entry=0x234e348,
buf=buf(at)entry=67521,
blkno=blkno(at)entry=349565, page=page(at)entry=0x7fa25b728000 "'",
prunestate=prunestate(at)entry=0x7fff8be9a0d0)
#2 0x0000000000511a70 in lazy_scan_heap (vacrel=0x234e348)
#3 heap_vacuum_rel (rel=0x2403028, params=0x2358064, bstrategy=<optimized
out>)
#4 0x00000000006767e7 in table_relation_vacuum (bstrategy=0x2368e28,
params=0x2358064,
rel=0x2403028)
#5 vacuum_rel (relid=18930, relation=<optimized out>,
params=params(at)entry=0x2358064,

bstrategy=bstrategy(at)entry=0x2368e28)
#6 0x0000000000677be0 in vacuum (relations=0x23652f8,
relations(at)entry=0x2363310,

params=params(at)entry=0x2358064, bstrategy=bstrategy(at)entry=0x2368e28,
vac_context=vac_context(at)entry=0x23651a0, isTopLevel=isTopLevel(at)entry
=true)
#7 0x0000000000778080 in autovacuum_do_vac_analyze (bstrategy=0x2368e28,
tab=0x2358060)
#8 do_autovacuum ()
#9 0x0000000000778510 in AutoVacWorkerMain (argv=0x0, argc=0)
#10 0x00000000007785eb in StartAutoVacWorker ()
#11 0x000000000077efe1 in StartAutovacuumWorker ()
#12 process_pm_pmsignal ()
#13 ServerLoop ()
#14 0x0000000000780328 in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry
=0x22583a0)
#15 0x00000000004bc368 in main (argc=3, argv=0x22583a0)

(gdb) p * MyProc
$4 = {links = {prev = 0x0, next = 0x0}, procgloballist = 0x7fa34153ecb8,
sem = 0x7fa237311138,
waitStatus = PROC_WAIT_STATUS_OK, procLatch = {is_set = 1, maybe_sleeping
= 0,
is_shared = true, owner_pid = 2303}, xid = 0, xmin = 1079, lxid =
237689, pid = 2303,
pgxactoff = 13, pgprocno = 2050, backendId = 13, databaseId = 16425,
roleId = 0,
tempNamespaceId = 0, isBackgroundWorker = false, recoveryConflictPending
= false,
lwWaiting = 0 '\000', lwWaitMode = 0 '\000', lwWaitLink = {next = 0, prev
= 0}, cvWaitLink = {
next = 0, prev = 0}, waitLock = 0x0, waitProcLock = 0x0, waitLockMode =
0, heldLocks = 0,
waitStart = {value = 0}, delayChkptFlags = 0, statusFlags = 3 '\003',
waitLSN = 0,
syncRepState = 0, syncRepLinks = {prev = 0x0, next = 0x0}, myProcLocks =
{{head = {
prev = 0x7fa3416f7518, next = 0x7fa3416f7518}}, {head = {prev =
0x7fa33e401d60,
next = 0x7fa33e401d60}}, {head = {prev = 0x7fa33e453db0, next =
0x7fa33e453db0}}, {
head = {prev = 0x7fa3416f7548, next = 0x7fa3416f7548}}, {head = {prev
= 0x7fa33ea10060,
next = 0x7fa33e4f7220}}, {head = {prev = 0x7fa33e548cd0, next =
0x7fa33e548cd0}}, {
head = {prev = 0x7fa33eab22b0, next = 0x7fa33eab2350}}, {head = {prev
= 0x7fa33e5f9020,
next = 0x7fa33eb11960}}, {head = {prev = 0x7fa3416f7598, next =
0x7fa3416f7598}}, {
head = {prev = 0x7fa33eba74f0, next = 0x7fa33eba74f0}}, {head = {prev
= 0x7fa3416f75b8,
next = 0x7fa3416f75b8}}, {head = {prev = 0x7fa3416f75c8, next =
0x7fa3416f75c8}}, {
head = {prev = 0x7fa33eca8c60, next = 0x7fa33eca89e0}}, {head = {prev
= 0x7fa3416f75e8,
next = 0x7fa3416f75e8}}, {head = {prev = 0x7fa3416f75f8, next =
0x7fa3416f75f8}}, {
head = {prev = 0x7fa33e884de0, next = 0x7fa33e884de0}}}, subxidStatus
= {count = 0 '\000',
overflowed = false}, subxids = {xids = {0 <repeats 64 times>}},
procArrayGroupMember = false, procArrayGroupNext = {value = 2147483647},
procArrayGroupMemberXid = 0, wait_event_info = 0, clogGroupMember =
false, clogGroupNext = {
value = 2147483647}, clogGroupMemberXid = 0, clogGroupMemberXidStatus =
0,
clogGroupMemberPage = -1, clogGroupMemberLsn = 0, fpInfoLock = {tranche =
81, state = {
value = 536870912}, waiters = {head = 2147483647, tail =
2147483647}}, fpLockBits = 0,
fpRelId = {2840, 2662, 2659, 3379, 2841, 2840, 2662, 27770, 26889, 26008,
24246, 23365, 2659,
1249, 2690, 53019}, fpVXIDLock = true, fpLocalTransactionId = 237689,
lockGroupLeader = 0x0,
lockGroupMembers = {head = {prev = 0x7fa3416f77b0, next =
0x7fa3416f77b0}}, lockGroupLink = {
prev = 0x0, next = 0x0}}

From the current situation, I feel that the scenario I encountered is
different from yours because it doesn't seem to be the first page of the
vacuum scanning.

--
Regards
Bowen Shi

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2024-05-17 04:28:57 Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Previous Message Tender Wang 2024-05-17 03:26:26 Re: BUG #18468: CREATE TABLE ... LIKE leaves orphaned column reference in extended statistics