From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Henry Hinze <henry(dot)hinze(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Subject: | Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop |
Date: | 2020-11-18 04:18:18 |
Message-ID: | CAA4eK1JmDGpr+Ouvj8x7u9RWVSLWyfsDiz2TRbrMNmdAqDpJ0g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Wed, Nov 18, 2020 at 8:18 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Wed, Nov 18, 2020 at 1:29 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > On 2020-Nov-04, Amit Kapila wrote:
> >
> > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > > > * STREAM COMMIT bug?
> > > > In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > > > apparently in a tablesync worker we shouldn't do it.
> > >
> > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > where we disable streaming for the init phase. As far as I understand,
> > > for tablesync we create the initial slot during which streaming will
> > > be disabled then we will copy the table (here logical decoding won't
> > > be used) and then allow the apply worker to get any other data which
> > > is inserted in the meantime. Now, I might be missing something here
> > > but if you can explain it a bit more or share some test to show how we
> > > can reach here via tablesync worker then we can discuss the possible
> > > solution.
> >
> > Hmm, okay, that sounds like there would be no bug then. Maybe what we
> > need is just an assert in apply_handle_stream_commit that
> > !am_tablesync_worker(), as in the attached patch. Passes tests.
>
> Hi.
>
> Using the same debugging technique described in a previous mail [1], I
> have tested again but this time using a SUBSCRIPTION capable of
> streaming.
>
> While paused in the debugger (to force an unusual timing situation) I
> can publish INSERTs en masse and cause streaming replication to occur.
>
> To cut a long story short, a tablesync worker CAN in fact end up
> processing (e.g. apply_dispatch) streaming messages.
> So the tablesync worker CAN get into the apply_handle_stream_commit.
> And this scenario, albeit rare, will crash.
>
Thank you for reproducing this issue. Dilip, Peter, is anyone of you
interested in writing a fix for this?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2020-11-18 05:49:12 | Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop |
Previous Message | Peter Smith | 2020-11-18 02:47:54 | Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop |