Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Henry Hinze <henry(dot)hinze(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date: 2020-10-02 01:19:49
Message-ID: 1118374.1601601589@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> On 2020-Oct-01, Peter Eisentraut wrote:
>> What's the difference between this case and what the test suite is testing?
>> Is it that it replicates between two databases on the same instance?

> I don't know why the tests pass, but the message
> ERROR: error reading result of streaming command:
> does appear in the logs after running src/test/subscription many times
> (I see it in tests 001, 002, 013 and 014, apart from the new one in
> 100). It's indeed surprising that these tests all pass!

> I turned Henry's reproducer into the attached TAP test, and it does
> reproduce the problem; but if I reduce the number of rows from 5000 to
> 1000, then it no longer does. I don't quite see why this would be a
> problem with a larger table only. Do you?

I think we really need to figure that out before concluding that this
problem is solved. Now that we've seen this, I'm wondering urgently
what other coverage gaps we've got there.

> The fix is the commented-out line in walsender.c; the test reliably
> passes for me if I uncomment that, and the error message disappear from
> the server logs in all the other tests.

I agree that this is what we need to do code-wise; we can't let the
protocol break stand, or we'll break all sorts of cross-version
replication scenarios. However, we'd better also change the protocol
spec to say that this is what is supposed to happen.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Flemming Rene Jønsson 2020-10-02 05:22:10 Re: BUG #16641: Postgresql driver 42.2.15 and 42.2.16 has problems connecting to AWS RDS Postgresql database
Previous Message Alvaro Herrera 2020-10-02 00:08:54 Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop