Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-28 06:25:33
Message-ID: CAD21AoDq0fPE=DGtmLoXXu1c+jDo+Qu_ceh2HViSUvRss7s93Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

() an

On Thu, Jul 28, 2022 at 12:21 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jul 28, 2022 at 7:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Jul 27, 2022 at 8:33 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> >
> > > I have changed accordingly in the attached
> > > and apart from that slightly modified the comments and commit message.
> > > Do let me know what you think of the attached?
> >
> > It would be better to remember the initial running xacts after
> > SnapBuildRestore() returns true? Because otherwise, we could end up
> > allocating InitialRunningXacts multiple times while leaking the old
> > ones if there are no serialized snapshots that we are interested in.
> >
>
> Right, this makes sense. But note that you can no longer have a check
> (builder->state == SNAPBUILD_START) which I believe is not required.
> We need to do this after restore, in whichever state snapshot was as
> any state other than SNAPBUILD_CONSISTENT can have commits without all
> their changes.

Right.

>
> Accordingly, I think the comment: "Remember the transactions and
> subtransactions that were running when xl_running_xacts record that we
> decoded first was written." needs to be slightly modified to something
> like: "Remember the transactions and subtransactions that were running
> when xl_running_xacts record that we decoded was written.". Change
> this if it is used at any other place in the patch.

Agreed.

>
> > ---
> > + if (builder->state == SNAPBUILD_START)
> > + {
> > + int nxacts =
> > running->subxcnt + running->xcnt;
> > + Size sz = sizeof(TransactionId) * nxacts;
> > +
> > + NInitialRunningXacts = nxacts;
> > + InitialRunningXacts =
> > MemoryContextAlloc(builder->context, sz);
> > + memcpy(InitialRunningXacts, running->xids, sz);
> > + qsort(InitialRunningXacts, nxacts,
> > sizeof(TransactionId), xidComparator);
> > + }
> >
> > We should allocate the memory for InitialRunningXacts only when
> > (running->subxcnt + running->xcnt) > 0.
> >
>
d > There is no harm in doing that but ideally, that case would have been
> covered by an earlier check "if (running->oldestRunningXid ==
> running->nextXid)" which suggests "No transactions were running, so we
> can jump to consistent."

You're right.

While editing back branch patches, I realized that the following
(parsed->xinfo & XACT_XINFO_HAS_INVALS) and (parsed->nmsgs > 0) are
equivalent:

+ /*
+ * If the COMMIT record has invalidation messages, it could have catalog
+ * changes. It is possible that we didn't mark this transaction as
+ * containing catalog changes when the decoding starts from a commit
+ * record without decoding the transaction's other changes. So, we ensure
+ * to mark such transactions as containing catalog change.
+ *
+ * This must be done before SnapBuildCommitTxn() so that we can include
+ * these transactions in the historic snapshot.
+ */
+ if (parsed->xinfo & XACT_XINFO_HAS_INVALS)
+ SnapBuildXidSetCatalogChanges(ctx->snapshot_builder, xid,
+ parsed->nsubxacts, parsed->subxacts,
+ buf->origptr);
+
/*
* Process invalidation messages, even if we're not interested in the
* transaction's contents, since the various caches need to always be
* consistent.
*/
if (parsed->nmsgs > 0)
{
if (!ctx->fast_forward)
ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
parsed->nmsgs, parsed->msgs);
ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
}

If that's right, I think we can merge these if branches. We can call
ReorderBufferXidSetCatalogChanges() for top-txn and in
SnapBuildXidSetCatalogChanges() we mark its subtransactions if top-txn
is in the list. What do you think?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2022-07-28 06:26:42 Re: Refactoring postgres_fdw/connection.c
Previous Message Kyotaro Horiguchi 2022-07-28 06:24:54 Re: Improve description of XLOG_RUNNING_XACTS