Re: BUG #18415: Logical replication errors.

From: vignesh C <vignesh21(at)gmail(dot)com>
To: koderan <koderan(at)yandex(dot)ru>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #18415: Logical replication errors.
Date: 2024-03-29 12:19:13
Message-ID: CALDaNm27H_7qGAc=gFk+0PfpwEcQgL6aQV+5fVUXAOZ6bEtc7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, 29 Mar 2024 at 05:50, koderan <koderan(at)yandex(dot)ru> wrote:
>
> There is publisher and subscriber logs, but the logs on both nodes did not give me clarity.
> Master (time when errors start to appear):
>
> 2024-03-28 22:26:08.459 +07 [1004642] ERROR: could not start WAL streaming: ERROR: replication slot "pg_975006_sync_972467_7293549148661379912" does not exist
> 2024-03-28 22:26:08.462 +07 [1597281] LOG: background worker "logical replication worker" (PID 1004642) exited with exit code 1
> 2024-03-28 22:26:08.466 +07 [1048261] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "scan_type_i18n" has started
> 2024-03-28 22:26:08.542 +07 [1048261] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "scan_type_i18n" has finished
> 2024-03-28 22:26:08.548 +07 [1048263] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "phone_type_i18n" has started
> 2024-03-28 22:26:08.641 +07 [1048263] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "phone_type_i18n" has finished
> 2024-03-28 22:26:08.644 +07 [1048264] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "reply_status_type_i18n" has started
> 2024-03-28 22:26:08.690 +07 [1048264] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "reply_status_type_i18n" has finished
> 2024-03-28 22:26:08.693 +07 [1048265] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "user" has started
> 2024-03-28 22:26:09.054 +07 [1048265] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "user" has finished
> 2024-03-28 22:26:09.059 +07 [1048267] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "queue_event" has started
> 2024-03-28 22:26:09.141 +07 [1048267] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "queue_event" has finished
> 2024-03-28 22:26:09.146 +07 [1048274] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "credit" has started
> 2024-03-28 22:26:09.163 +07 [1048274] ERROR: could not start WAL streaming: ERROR: replication slot "pg_975006_sync_972243_7293549148661379912" does not exist
> 2024-03-28 22:26:09.165 +07 [1597281] LOG: background worker "logical replication worker" (PID 1048274) exited with exit code 1
> 2024-03-28 22:26:09.167 +07 [1048281] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "requisite_value_contragent" has started
> 2024-03-28 22:26:09.184 +07 [1048281] ERROR: could not start WAL streaming: ERROR: replication slot "pg_975006_sync_972563_7293549148661379912" does not exist
> 2024-03-28 22:26:09.186 +07 [1597281] LOG: background worker "logical replication worker" (PID 1048281) exited with exit code 1
> 2024-03-28 22:26:09.189 +07 [1048282] LOG: logical replication table synchronization worker for subscription "log_rep_sub", table "order_registry_np" has started
>
> The same time on replica:
>
> 2024-03-28 22:26:07.967 +07 [25424] postgres(at)claims LOG: duration: 2406278.201 ms statement: COPY public.orders (id, order_number, order_date, order_amount, payment_info_code, payer_code, payer_name, receiver_code, receiver_name, sender_code, sender_name, cargo_description, cargo_volume_weight, cargo_weight, declared_price, currency_code, internet_store_number, subdivision_code, cfo_code, active, date_created, date_updated, delay) TO STDOUT
> 2024-03-28 22:26:08.426 +07 [25424] postgres(at)claims ERROR: replication slot "pg_975006_sync_972467_7293549148661379912" does not exist
> 2024-03-28 22:26:08.466 +07 [1170] postgres(at)claims LOG: logical decoding found initial starting point at B1C5/BFBA3410
> 2024-03-28 22:26:08.466 +07 [1170] postgres(at)claims DETAIL: Waiting for transactions (approximately 1) older than 2998665818 to end.
> 2024-03-28 22:26:08.469 +07 [1165] [unknown](at)[unknown] LOG: could not accept SSL connection: EOF detected
> 2024-03-28 22:26:08.496 +07 [1170] postgres(at)claims LOG: logical decoding found consistent point at B1C5/BFBA4580
> 2024-03-28 22:26:08.496 +07 [1170] postgres(at)claims DETAIL: There are no running transactions.
> 2024-03-28 22:26:08.507 +07 [1170] postgres(at)claims LOG: starting logical decoding for slot "pg_975006_sync_971138_7293549148661379912"
> 2024-03-28 22:26:08.507 +07 [1170] postgres(at)claims DETAIL: Streaming transactions committing after B1C5/BFBA45B8, reading WAL from B1C5/BFBA3290.
> 2024-03-28 22:26:08.507 +07 [1170] postgres(at)claims LOG: logical decoding found consistent point at B1C5/BFBA3410
> 2024-03-28 22:26:08.507 +07 [1170] postgres(at)claims DETAIL: Logical decoding will begin using saved snapshot.
> 2024-03-28 22:26:08.546 +07 [1172] postgres(at)claims LOG: logical decoding found initial starting point at B1C5/BFBAA200
> 2024-03-28 22:26:08.546 +07 [1172] postgres(at)claims DETAIL: Waiting for transactions (approximately 2) older than 2998665822 to end.
> 2024-03-28 22:26:08.594 +07 [1172] postgres(at)claims LOG: logical decoding found consistent point at B1C5/BFBB5E80
> 2024-03-28 22:26:08.594 +07 [1172] postgres(at)claims DETAIL: There are no running transactions.
> 2024-03-28 22:26:08.602 +07 [1172] postgres(at)claims LOG: starting logical decoding for slot "pg_975006_sync_970776_7293549148661379912"
> 2024-03-28 22:26:08.602 +07 [1172] postgres(at)claims DETAIL: Streaming transactions committing after B1C5/BFBB5EB8, reading WAL from B1C5/BFBA5290.
> 2024-03-28 22:26:08.602 +07 [1172] postgres(at)claims LOG: logical decoding found consistent point at B1C5/BFBAA200
> 2024-03-28 22:26:08.602 +07 [1172] postgres(at)claims DETAIL: Logical decoding will begin using saved snapshot.
> 2024-03-28 22:26:08.645 +07 [1173] postgres(at)claims LOG: logical decoding found consistent point at B1C5/BFBB8E28
> 2024-03-28 22:26:08.645 +07 [1173] postgres(at)claims DETAIL: There are no running transactions.
> 2024-03-28 22:26:08.652 +07 [1173] postgres(at)claims LOG: starting logical decoding for slot "pg_975006_sync_971808_7293549148661379912"
> 2024-03-28 22:26:08.652 +07 [1173] postgres(at)claims DETAIL: Streaming transactions committing after B1C5/BFBB8E60, reading WAL from B1C5/BFBB8E28.
> 2024-03-28 22:26:08.652 +07 [1173] postgres(at)claims LOG: logical decoding found consistent point at B1C5/BFBB8E28
> 2024-03-28 22:26:08.652 +07 [1173] postgres(at)claims DETAIL: There are no running transactions.
> 2024-03-28 22:26:08.697 +07 [1174] postgres(at)claims LOG: logical decoding found consistent point at B1C5/BFBBE588
> 2024-03-28 22:26:08.697 +07 [1174] postgres(at)claims DETAIL: There are no running transactions.
> 2024-03-28 22:26:09.019 +07 [1174] postgres(at)claims LOG: starting logical decoding for slot "pg_975006_sync_971995_7293549148661379912"
> 2024-03-28 22:26:09.019 +07 [1174] postgres(at)claims DETAIL: Streaming transactions committing after B1C5/BFBBE5C0, reading WAL from B1C5/BFBBE588.
> 2024-03-28 22:26:09.019 +07 [1174] postgres(at)claims LOG: logical decoding found consistent point at B1C5/BFBBE588
> 2024-03-28 22:26:09.019 +07 [1174] postgres(at)claims DETAIL: There are no running transactions.
> 2024-03-28 22:26:09.056 +07 [1180] postgres(at)claims LOG: logical decoding found consistent point at B1C5/BFBE0C88
> 2024-03-28 22:26:09.056 +07 [1180] postgres(at)claims DETAIL: There are no running transactions.
> 2024-03-28 22:26:09.105 +07 [1180] postgres(at)claims LOG: starting logical decoding for slot "pg_975006_sync_972492_7293549148661379912"
> 2024-03-28 22:26:09.105 +07 [1180] postgres(at)claims DETAIL: Streaming transactions committing after B1C5/BFBE0CC0, reading WAL from B1C5/BFBE0C88.
> 2024-03-28 22:26:09.105 +07 [1180] postgres(at)claims LOG: logical decoding found consistent point at B1C5/BFBE0C88
> 2024-03-28 22:26:09.105 +07 [1180] postgres(at)claims DETAIL: There are no running transactions.
> 2024-03-28 22:26:09.129 +07 [1181] postgres(at)claims ERROR: replication slot "pg_975006_sync_972243_7293549148661379912" does not exist
> 2024-03-28 22:26:09.151 +07 [1182] postgres(at)claims ERROR: replication slot "pg_975006_sync_972563_7293549148661379912" does not exist
> 2024-03-28 22:26:09.172 +07 [1184] postgres(at)claims ERROR: replication slot "pg_975006_sync_972454_7293549148661379912" does not exist
> 2024-03-28 22:26:09.196 +07 [1185] postgres(at)claims ERROR: replication slot "pg_975006_sync_972467_7293549148661379912" does not exist
>
>
> 28.03.2024, 20:03, "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>:
>
> On Thu, Mar 28, 2024 at 9:59 AM PG Bug reporting form <noreply(at)postgresql(dot)org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 18415
> Logged by: Andy Kosheen
> Email address: koderan(at)yandex(dot)ru
> PostgreSQL version: 15.4
> Operating system: Oracle Linux Server 8.4
> Description:
>
> In the process of logical replication from version 13.3 (on Ubuntu 16.04.6
> LTS) to version 15.4 (on Oracle Linux Server 8.4), when most of the data has
> already been transferred, the following errors begin to occur, appearing
> endlessly until you stop the replication process (this situation occurs on
> databases larger than at least 50GB, even if there is a small number of
> transactions on the source database):
> 2024-03-20 03:19:41.410 +07 [3433858] LOG: logical replication apply worker
> for subscription "log_rep_sub" has started
> 2024-03-20 03:19:41.425 +07 [3433858] ERROR: could not start WAL streaming:
> ERROR: replication slot "log_rep_sub" does not exist

By any chance do you have a reproducible test case for this. Also are
there any other abnormalities in the log. What was the
max_replication_slots configuration set in the publisher and what are
the replication slots that were being used in the publisher at that
point(select * from pg_replication_slots).

Regards,
Vignesh

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2024-03-29 16:05:31 Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Previous Message Akshat Jaimini 2024-03-29 10:31:01 Re: BUG: deadlock between autovacuum worker and client backend during removal of orphan temp tables with sequences