Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Henry Hinze <henry(dot)hinze(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date: 2020-11-07 03:54:25
Message-ID: CAA4eK1L6_MUmOTBKdkhfuSykj4Sx2-_fTeT_NaR0pDyzaCdb+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Nov 7, 2020 at 5:31 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2020-Nov-05, Amit Kapila wrote:
>
> > On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > >
> > > On 2020-Nov-04, Amit Kapila wrote:
> > >
> > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > >
> > > > > * STREAM COMMIT bug?
> > > > > In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > > > > apparently in a tablesync worker we shouldn't do it.
> > > >
> > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > > where we disable streaming for the init phase. As far as I understand,
> > > > for tablesync we create the initial slot during which streaming will
> > > > be disabled then we will copy the table (here logical decoding won't
> > > > be used) and then allow the apply worker to get any other data which
> > > > is inserted in the meantime. Now, I might be missing something here
> > > > but if you can explain it a bit more or share some test to show how we
> > > > can reach here via tablesync worker then we can discuss the possible
> > > > solution.
> > >
> > > Hmm, okay, that sounds like there would be no bug then. Maybe what we
> > > need is just an assert in apply_handle_stream_commit that
> > > !am_tablesync_worker(), as in the attached patch. Passes tests.
> > >
> >
> > +1. But do we want to have this Assert only in stream_commit API or
> > all stream APIs as well?
>
> Well, the only reason I care about this is that apply_handle_commit
> contains a comment that we must not do CommitTransactionCommand in the
> syncworker case; so if you look at apply_handle_stream_commit and note
> that it doesn't concern it about that, you become concerned that it
> might be broken. I don't think the other routines handling the "stream"
> thing have that issue.
>

Fair enough, as mentioned in my previous email, I think we need to
confirm once that after copy how the decoding happens on upstream for
transactions during the phase where tablesync workers is moving to
state SUBREL_STATE_SYNCDONE from SUBREL_STATE_CATCHUP. I'll try to
come up (in next few days) with some test case to debug and test this
particular scenario and share my findings.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dilip Kumar 2020-11-07 05:38:22 Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Previous Message Thomas Munro 2020-11-07 01:11:40 Re: pg_dump error attempting to upgrade from PostgreSQL 10 to PostgreSQL 12