From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Sergey Belyashov <sergey(dot)belyashov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17438: Logical replication hangs on master after huge DB load |
Date: | 2022-03-16 11:45:27 |
Message-ID: | CAA4eK1JO_zijrTqoZdzMn0FtTfV=Nj6Fr++BfdsBkHZqfA_cPw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, Mar 14, 2022 at 11:49 PM PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 17438
> Logged by: Sergey Belyashov
> Email address: sergey(dot)belyashov(at)gmail(dot)com
> PostgreSQL version: 14.2
> Operating system: Debian 11, GNU/Linux x86_64
> Description:
>
> Master DB has few tables: A (few inserts per second, about 200 updates per
> second, ~100 deletes each 5 minutes), B (~100 inserts each 5 minutes), C
> (~200 inserts and ~200 updates per second). B and C are large partitioned by
> range tables (36 and 12 partitions). A is small table about 10K entries
> (often updates). Table A has publications for inserts and deletes. Table B
> has publication for all operations except truncate via root.
>
> I do some maintenance work. I stop production load on DB and do some high
> load operations with table C (for example: "insert into D select * from C").
> After completion replications for A and B freezes and loads CPU for 50-99%
> without actual data transmission. I try to disable/enable/refresh
> subscription, but no effect. I try to restart master - no result. Only
> drop/create of subscriptions helps me.
>
Is it possible to get some reproducible script/test for this problem?
> Publisher logs many messages like following:
> 2022-03-14 19:57:02.907 MSK [1771976] user(at)DB ERROR: replication slot
> "A_sub" is active for PID 1766849
> 2022-03-14 19:57:02.907 MSK [1771976] user(at)DB STATEMENT: START_REPLICATION
> SLOT "A_sub" LOGICAL 28C/60150F50 (proto_version '2', publication_names
> '"A_pub"')
> 2022-03-14 19:57:02.909 MSK [1771977] user(at)DB ERROR: replication slot
> "B_sub" is active for PID 1766828
> 2022-03-14 19:57:02.909 MSK [1771977] user(at)DB STATEMENT: START_REPLICATION
> SLOT "B_sub" LOGICAL 28C/AE2B7D8 (proto_version '2',
> publication_names '"B_pub"')
>
> Subscriber logs many messages like following:
> 2022-03-14 19:56:52.709 MSK [3266082] LOG: logical replication apply worker
> for subscription "B_sub" has started
> 2022-03-14 19:56:52.710 MSK [993] LOG: background worker "logical
> replication worker" (PID 3266080) exited with exit code 1
> 2022-03-14 19:56:52.814 MSK [3266081] ERROR: could not start WAL streaming:
> ERROR: replication slot "A_sub" is active for PID 1766849
> 2022-03-14 19:56:52.815 MSK [993] LOG: background worker "logical
> replication worker" (PID 3266081) exited with exit code 1
> 2022-03-14 19:56:52.818 MSK [3266082] ERROR: could not start WAL streaming:
> ERROR: replication slot "B_sub" is active for PID 1766828
> 2022-03-14 19:56:52.819 MSK [993] LOG: background worker "logical
> replication worker" (PID 3266082) exited with exit code 1
>
Just by seeing these LOGs, it seems subscriber side workers are
exiting due to some error and publisher-side (WALSender) still
continues due to which I think we are seeing ""A_sub" is active for
PID 1766849". Do you see any different type of error in
subscriber-side logs?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Sergey Belyashov | 2022-03-16 12:09:30 | Re: BUG #17438: Logical replication hangs on master after huge DB load |
Previous Message | Peter Geoghegan | 2022-03-16 08:22:52 | Re: VACUUM can set pages all-frozen without also setting them all-visible |