From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE |
Date: | 2023-01-09 22:07:51 |
Message-ID: | 20230109220751.7quf2kfzfdvd76hw@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-01-09 13:55:02 -0800, Mark Dilger wrote:
> > On Jan 9, 2023, at 11:34 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > 1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in
> > update_cached_xid_range(), we end up using the xid 0 (or an outdated value in
> > subsequent calls) to determine whether epoch needs to be reduced.
>
> Can you say a bit more about your analysis here, preferably with pointers to
> the lines of code you are analyzing? Does the problem exist in amcheck as
> currently committed, or are you thinking about a problem that arises only
> after applying your patch? I'm a bit fuzzy on where xid 0 gets used.
The problems exist in the code as currently committed. I'm not sure what
exactly the consequences are, the result is that oldest_fxid will be, at least
temporarily, bogus.
Consider the first call to update_cached_xid_range():
/*
* Update our cached range of valid transaction IDs.
*/
static void
update_cached_xid_range(HeapCheckContext *ctx)
{
/* Make cached copies */
LWLockAcquire(XidGenLock, LW_SHARED);
ctx->next_fxid = ShmemVariableCache->nextXid;
ctx->oldest_xid = ShmemVariableCache->oldestXid;
LWLockRelease(XidGenLock);
/* And compute alternate versions of the same */
ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
}
The problem is that the call to FullTransactionIdFromXidAndCtx() happens
before ctx->next_xid is assigned, even though FullTransactionIdFromXidAndCtx()
uses ctx->next_xid.
static FullTransactionId
FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
{
uint32 epoch;
if (!TransactionIdIsNormal(xid))
return FullTransactionIdFromEpochAndXid(0, xid);
epoch = EpochFromFullTransactionId(ctx->next_fxid);
if (xid > ctx->next_xid)
epoch--;
return FullTransactionIdFromEpochAndXid(epoch, xid);
}
Because ctx->next_xid is 0, due to not having been set yet, "xid > ctx->next_xid"
will always be true, leading to epoch being reduced by one.
In the common case of there never having been an xid wraparound, we'll thus
underflow epoch, generating an xid far into the future.
The tests encounter the issue today. If you add
Assert(TransactionIdIsValid(ctx->next_xid));
Assert(FullTransactionIdIsValid(ctx->next_fxid));
early in FullTransactionIdFromXidAndCtx() it'll be hit in the
amcheck/pg_amcheck tests.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2023-01-09 22:18:21 | Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans |
Previous Message | Andres Freund | 2023-01-09 21:58:42 | Show various offset arrays for heap WAL records |