From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Henry Hinze <henry(dot)hinze(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop |
Date: | 2020-09-30 21:52:38 |
Message-ID: | 912614.1601502758@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> On 2020-Sep-30, Tom Lane wrote:
>> The question that this raises is how the heck did that get past
>> our test suites? It seems like the error should have been obvious
>> to even the most minimal testing.
> ... yeah, that's indeed an important question. I'm going to guess that
> the TAP suites are too forgiving :-(
One thing I noticed while trying to trace this down is that while the
initial table sync is happening, we have *both* a regular
walsender/walreceiver pair and a "sync" pair, eg
postgres 905650 0.0 0.0 186052 11888 ? Ss 17:12 0:00 postgres: logical replication worker for subscription 16398
postgres 905651 50.1 0.0 173704 13496 ? Ss 17:12 0:09 postgres: walsender postgres [local] idle
postgres 905652 104 0.4 186832 148608 ? Rs 17:12 0:19 postgres: logical replication worker for subscription 16398 sync 16393
postgres 905653 12.2 0.0 174380 15524 ? Ss 17:12 0:02 postgres: walsender postgres [local] COPY
Is it supposed to be like that? Notice also that the regular walsender
has consumed significant CPU time; it's not pinning a CPU like the sync
walreceiver is, but it's eating maybe 20% of a CPU according to "top".
I wonder whether in cases with only small tables (which is likely all
that our tests test), the regular walreceiver manages to complete the
table sync despite repeated(?) failures of the sync worker.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2020-09-30 22:10:38 | Re: BUG #16419: wrong parsing BC year in to_date() function |
Previous Message | Alvaro Herrera | 2020-09-30 21:42:16 | Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop |