Quick Links

Re: using an end-of-recovery record in all cases

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc:	Julien Rouhaud <rjuju123(at)gmail(dot)com>, Amul Sul <sulamul(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: using an end-of-recovery record in all cases
Date:	2022-04-20 13:26:07
Message-ID:	CA+TgmoZZDL_2E_zuahqpJ-WmkuxmUi8+g7=dLEny=18r-+c-iQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 19, 2022 at 4:38 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> Shouldn't latestCompletedXid be set to MaxTransactionId in this case? Or
> is this related to the logic in FullTransactionIdRetreat() that avoids
> skipping over the "actual" special transaction IDs?

The problem here is this code:

/* also initialize latestCompletedXid, to nextXid - 1 */
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
FullTransactionIdRetreat(&ShmemVariableCache->latestCompletedXid);
LWLockRelease(ProcArrayLock);

If nextXid is 3, then latestCompletedXid gets 2. But in
GetRunningTransactionData:

Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));

> Your reasoning seems sound to me.

I was talking with Thomas Munro yesterday and he thinks there is a
problem with relfilenode reuse here. In normal running, when a
relation is dropped, we leave behind a 0-length file until the next
checkpoint; this keeps that relfilenode from being used even if the
OID counter wraps around. If we didn't do that, then imagine that
while running with wal_level=minimal, we drop an existing relation,
create a new relation with the same OID, load some data into it, and
crash, all within the same checkpoint cycle, then we will be able to
replay the drop, but we will not be able to restore the relation
contents afterward because at wal_level=minimal they are not logged.
Apparently, we don't create tombstone files during recovery because we
know that there will be a checkpoint at the end.

With the existing use of the end-of-recovery record, we always know
that wal_level>minimal, because we're only using it on standbys. But
with this use that wouldn't be true any more. So I guess we need to
start creating tombstone files even during recovery, or else do
something like what Dilip coded up in
http://postgr.es/m/CAFiTN-u=r8UTCSzu6_pnihYAtwR1=esq5sRegTEZ2tLa92fovA@mail.gmail.com
which I think would be a better solution at least in the long term.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: using an end-of-recovery record in all cases at 2022-04-19 20:37:59 from Nathan Bossart

Responses

Re: using an end-of-recovery record in all cases at 2022-04-20 17:02:24 from Nathan Bossart

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2022-04-20 13:39:25	Re: Bad estimate with partial index
Previous Message	Peter Eisentraut	2022-04-20 13:09:31	Re: [RFC] building postgres with meson -v8