Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-05-24 02:27:32
Message-ID: CAD21AoAn9T6mS8sMF5=tFV66Wru92s933ihRVunLPEU_+0=Xiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 23, 2022 at 2:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, May 23, 2022 at 10:03 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Sat, 21 May 2022 15:35:58 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> > > I think if we don't have any better ideas then we should go with
> > > either this or one of the other proposals in this thread. The other
> > > idea that occurred to me is whether we can somehow update the snapshot
> > > we have serialized on disk about this information. On each
> > > running_xact record when we serialize the snapshot, we also try to
> > > purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
> > > that we can check if there are committed xacts to be purged and if we
> > > have previously serialized the snapshot for the prior running xact
> > > record, if so, we can update it with the list of xacts that have
> > > catalog changes. If this is feasible then I think we need to somehow
> > > remember the point where we last serialized the snapshot (maybe by
> > > using builder->last_serialized_snapshot). Even, if this is feasible we
> > > may not be able to do this in back-branches because of the disk-format
> > > change required for this.
> > >
> > > Thoughts?

It seems to work, could you draft the patch?

> >
> > I didn't look it closer, but it seems to work. I'm not sure how much
> > spurious invalidations at replication start impacts on performance,
> > but it is promising if the impact is significant.
> >
>
> It seems Sawada-San's patch is doing at each commit not at the start
> of replication and I think that is required because we need this each
> time for replication restart. So, I feel this will be an ongoing
> overhead for spurious cases with the current approach.
>
> > That being said I'm
> > a bit negative for doing that in post-beta1 stage.
> >
>
> Fair point. We can use the do it early in PG-16 if the approach is
> feasible, and backpatch something on lines of what Sawada-San or you
> proposed.

+1.

I proposed two approaches: [1] and [2,] and I prefer [1].
Horiguchi-san's idea[3] also looks good but I think it's better to
somehow deal with the problem he mentioned:

> One problem with this is that change creates the case where multiple
> ReorderBufferTXNs share the same first_lsn. I haven't come up with a
> clean idea to avoid relaxing the restriction of AssertTXNLsnOrder..

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoAn-k6OpZ6HSAH_G91tpTXR6KYvkf663kg6EqW-f6sz1w%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAD21AoD00wV4gt-53ze%2BZB8n4bqJrdH8J_UnDHddy8S2A%2Ba25g%40mail.gmail.com
[3] https://www.postgresql.org/message-id/20211008.165055.1621145185927268721.horikyota.ntt%40gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-05-24 02:33:36 Re: Patch: Don't set LoadedSSL unless secure_initialize succeeds
Previous Message Michael Paquier 2022-05-24 02:17:05 Re: Enforce "max_wal_size/ min_wal_size must be at least twice wal_segment_size" limit while setting GUCs