Re: [BUG] "FailedAssertion" reported when streaming in logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG] "FailedAssertion" reported when streaming in logical replication
Date: 2021-04-26 13:29:16
Message-ID: CAA4eK1KzrsMYFH4H_1ZGpvNdQMrTsPJsYJwsuC0t8ixDQD=pnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 26, 2021 at 5:55 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Apr 26, 2021 at 1:26 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Mon, 26 Apr 2021 at 12:45 PM, tanghy(dot)fnst(at)fujitsu(dot)com <tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
> >>
> >> Hi
> >>
> >> I think I may found a bug when using streaming in logical replication. Could anyone please take a look at this?
> >>
> >> Here's what I did to produce the problem.
> >> I set logical_decoding_work_mem and created multiple publications at publisher, created multiple subscriptions with "streaming = on" at subscriber.
> >> However, an assertion failed at publisher when I COMMIT and ROLLBACK multiple transactions at the same time.
> >>
> >> The log reported a FailedAssertion:
> >> TRAP: FailedAssertion("txn->size == 0", File: "reorderbuffer.c", Line: 3465, PID: 911730)
> >>
> >> The problem happens both in synchronous mode and asynchronous mode. When there are only one or two publications, It doesn't seem to happen. (In my case, there are 8 publications and the failure always happened).
> >>
> >> The scripts and the log are attached. It took me about 4 minutes to run the script on my machine.
> >> Please contact me if you need more specific info for the problem.
> >
> >
> >
> > Thanks for reporting. I will look into it.
>
> I am able to reproduce this and I think I have done the initial investigation.
>
> The cause of the issue is that, this transaction has only one change
> and that change is XLOG_HEAP2_NEW_CID, which is added through
> SnapBuildProcessNewCid. Basically, when we add any changes through
> SnapBuildProcessChange we set the base snapshot but when we add
> SnapBuildProcessNewCid this we don't set the base snapshot, because
> there is nothing to be done for this change. Now, this transaction is
> identified as the biggest transaction with non -partial changes, and
> now in ReorderBufferStreamTXN, it will return immediately because the
> base_snapshot is NULL.
>

Your analysis sounds correct to me.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-04-26 13:31:02 Re: Performance degradation of REFRESH MATERIALIZED VIEW
Previous Message Dilip Kumar 2021-04-26 12:48:29 Re: Enhanced error message to include hint messages for redundant options error