Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Henry Hinze <henry(dot)hinze(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date: 2020-09-30 21:32:04
Message-ID: 911656.1601501524@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Henry Hinze <henry(dot)hinze(at)gmail(dot)com> writes:
> I've made an important observation!
> Since I had the impression this setup was already working with RC1 of PG
> 13, I re-installed RC1 and did the same test. And it's working fine!

Ugh. So that points the finger at commits 07082b08c/bfb12cd2b,
which are the only nearby change between rc1 and 13.0. A quick
comparison of before-and-after checkouts confirms it.

After some digging around, I realize that that commit actually
resulted in a protocol break. libpqwalreceiver is expecting to
get an additional CommandComplete message after COPY OUT finishes,
per libpqrcv_endstreaming(), and it's no longer getting one.

(I have not read the protocol document to see if this is per spec;
but spec or no, that's what libpqwalreceiver is expecting.)

The question that this raises is how the heck did that get past
our test suites? It seems like the error should have been obvious
to even the most minimal testing.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-09-30 21:35:43 Re: BUG #16419: wrong parsing BC year in to_date() function
Previous Message Alvaro Herrera 2020-09-30 21:27:09 Re: ERROR: insufficient columns in the PRIMARY KEY constraint definition