Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-26 05:18:19
Message-ID: CAA4eK1LefopSMb6XQ7b6hw+GmZw-wDAOWhq+YXM9iC4Y6Mdtjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 26, 2022 at 7:00 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jul 25, 2022 at 7:57 PM shiy(dot)fnst(at)fujitsu(dot)com
> <shiy(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Hi,
> >
> > I did some performance test for the master branch patch (based on v6 patch) to
> > see if the bsearch() added by this patch will cause any overhead.
>
> Thank you for doing performance tests!
>
> >
> > I tested them three times and took the average.
> >
> > The results are as follows, and attach the bar chart.
> >
> > case 1
> > ---------
> > No catalog modifying transaction.
> > Decode 800k pgbench transactions. (8 clients, 100k transactions per client)
> >
> > master 7.5417
> > patched 7.4107
> >
> > case 2
> > ---------
> > There's one catalog modifying transaction.
> > Decode 100k/500k/1M transactions.
> >
> > 100k 500k 1M
> > master 0.0576 0.1491 0.4346
> > patched 0.0586 0.1500 0.4344
> >
> > case 3
> > ---------
> > There are 64 catalog modifying transactions.
> > Decode 100k/500k/1M transactions.
> >
> > 100k 500k 1M
> > master 0.0600 0.1666 0.4876
> > patched 0.0620 0.1653 0.4795
> >
> > (Because the result of case 3 shows that there is a overhead of about 3% in the
> > case decoding 100k transactions with 64 catalog modifying transactions, I
> > tested the next run of 100k xacts with or without catalog modifying
> > transactions, to see if it affects subsequent decoding.)
> >
> > case 4.1
> > ---------
> > After the test steps in case 3 (64 catalog modifying transactions, decode 100k
> > transactions), run 100k xacts and then decode.
> >
> > master 0.3699
> > patched 0.3701
> >
> > case 4.2
> > ---------
> > After the test steps in case 3 (64 catalog modifying transactions, decode 100k
> > transactions), run 64 DDLs(without checkpoint) and 100k xacts, then decode.
> >
> > master 0.3687
> > patched 0.3696
> >
> > Summary of the tests:
> > After applying this patch, there is a overhead of about 3% in the case decoding
> > 100k transactions with 64 catalog modifying transactions. This is an extreme
> > case, so maybe it's okay.
>
> Yes. If we're worried about the overhead and doing bsearch() is the
> cause, probably we can try simplehash instead of the array.
>

I am not sure if we need to go that far for this extremely corner
case. Let's first try your below idea.

> An improvement idea is that we pass the parsed->xinfo down to
> SnapBuildXidHasCatalogChanges(), and then return from that function
> before doing bearch() if the parsed->xinfo doesn't have
> XACT_XINFO_HAS_INVALS. That would save calling bsearch() for
> non-catalog-modifying transactions. Is it worth trying?
>

I think this is worth trying and this might reduce some of the
overhead as well in the case presented by Shi-San.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-07-26 05:57:28 Re: making relfilenodes 56 bits
Previous Message Thomas Munro 2022-07-26 05:16:48 Re: Cygwin cleanup