From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, robertmhaas(at)gmail(dot)com, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() |
Date: | 2024-04-15 17:39:13 |
Message-ID: | 20240415173913.4zyyrwaftujxthf2@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
I've tried a couple times to catch up with this thread. But always kinda felt
I must be missing something. It might be that this is one part of the
confusion:
On 2024-01-06 12:24:13 -0800, Noah Misch wrote:
> Fair enough. While I agree there's a decent chance back-patching would be
> okay, I think there's also a decent chance that 1ccc1e05ae creates the problem
> Matthias theorized. Something like: we update relfrozenxid based on
> OldestXmin, even though GlobalVisState caused us to retain a tuple older than
> OldestXmin. Then relfrozenxid disagrees with table contents.
Looking at the state as of 1ccc1e05ae, I don't see how - in lazy_scan_prune(),
if heap_page_prune() spuriously didn't prune a tuple, because the horizon went
backwards, we'd encounter the tuple in the loop below and call
heap_prepare_freeze_tuple(), which would error out with one of
/*
* Process xmin, while keeping track of whether it's already frozen, or
* will become frozen iff our freeze plan is executed by caller (could be
* neither).
*/
xid = HeapTupleHeaderGetXmin(tuple);
if (!TransactionIdIsNormal(xid))
xmin_already_frozen = true;
else
{
if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmin %u from before relfrozenxid %u",
xid, cutoffs->relfrozenxid)));
or
if (TransactionIdPrecedes(update_xact, cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains update XID %u from before relfrozenxid %u",
multi, update_xact,
cutoffs->relfrozenxid)));
or
/* Raw xmax is normal XID */
if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmax %u from before relfrozenxid %u",
xid, cutoffs->relfrozenxid)));
I'm not saying that spuriously erroring out would be ok. But I guess I just
don't understand the data corruption theory in this subthread, because we'd
error out if we encountered a tuple that should have been frozen but wasn't?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2024-04-15 18:04:44 | Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae |
Previous Message | Robert Haas | 2024-04-15 16:35:59 | Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae |