RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>
Cc: "bdrouvot(at)amazon(dot)com" <bdrouvot(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "minsoo(at)amazon(dot)com" <minsoo(at)amazon(dot)com>
Subject: RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2021-10-19 02:45:24
Message-ID: OSBPR01MB48886D77AC8AD25B16C28B5CEDBD9@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, October 14, 2021 11:21 AM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Mon, 11 Oct 2021 15:27:41 +0900, Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote in
> >
> > On Fri, Oct 8, 2021 at 4:50 PM Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > I came up with the third way. SnapBuildCommitTxn already properly
> > > handles the case where a ReorderBufferTXN with
> > > RBTXN_HAS_CATALOG_CHANGES. So this issue can be resolved by
> create
> > > such ReorderBufferTXNs in SnapBuildProcessRunningXacts.
> >
> > Thank you for the idea and patch!
> >
> > It's much simpler than mine. I think that creating an entry of a
> > catalog-changed transaction in the reorder buffer before
> > SunapBuildCommitTxn() is the right direction.
> >
> > After more thought, given DDLs are not likely to happen than DML in
> > practice, probably we can always mark both the top transaction and its
> > subtransactions as containing catalog changes if the commit record has
> > XACT_XINFO_HAS_INVALS? I believe this is not likely to lead to
> > overhead in practice. That way, the patch could be more simple and
> > doesn't need the change of AssertTXNLsnOrder().
> >
> > I've attached another PoC patch. Also, I've added the tests for this
> > issue in test_decoding.
>
> Thanks for the test script. (I did that with TAP framework but isolation tester
> version is simpler.)
>
> It adds a call to ReorderBufferAssignChild but usually subtransactions are
> assigned to top level elsewherae. Addition to that
> ReorderBufferCommitChild() called just later does the same thing. We are
> adding the third call to the same function, which looks a bit odd.
It can be odd. However, we
have a check at the top of ReorderBufferAssignChild
to judge if the sub transaction is already associated or not
and skip the processings if it is.

> And I'm not sure it is wise to mark all subtransactions as "catalog changed"
> always when the top transaction is XACT_XINFO_HAS_INVALS.
In order to avoid this,
can't we have a new flag (for example, in reorderbuffer struct) to check
if we start decoding from RUNNING_XACTS, which is similar to the first patch of [1]
and use it at DecodeCommit ? This still leads to some extra specific codes added
to DecodeCommit and this solution becomes a bit similar to other previous patches though.

[1] - https://www.postgresql.org/message-id/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com

Best Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-10-19 02:47:13 RE: Data is copied twice when specifying both child and parent table in publication
Previous Message Michael Paquier 2021-10-19 02:08:06 Re: BUG #17220: ALTER INDEX ALTER COLUMN SET (..) with an optionless opclass makes index and table unusable