From: | Kaushik Iska <kaushik(at)peerdb(dot)io> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Cc: | Sai Krishna Srirampur <sai(at)peerdb(dot)io>, Philip Dubé <philip(at)peerdb(dot)io> |
Subject: | Re: Intermittent Issue with WAL Segment Removal in Logical Replication |
Date: | 2023-12-27 15:31:14 |
Message-ID: | CAHYLuV=M2YTxecoc1MH=TbChei7pAyk2gNLHnCM_eGSnGhjeOQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi all,
I'm including additional details, as I am able to reproduce this issue a
little more reliably.
Postgres Version: POSTGRES_14_9.R20230830.01_07
Vendor: Google Cloud SQL
Logical Replication Protocol version 1
Here are the logs of attempt succeeding right after it fails:
2023-12-27 01:12:40.581 UTC [59790]: [6-1] db=postgres,user=postgres
STATEMENT: START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
6/5AE67D79 (proto_version '1', publication_names
'peerflow_pub_wal_testing_2') <- FAILS
2023-12-27 01:12:41.087 UTC [59790]: [7-1] db=postgres,user=postgres ERROR:
requested WAL segment 000000010000000600000059 has already been removed
2023-12-27 01:12:44.581 UTC [59794]: [3-1] db=postgres,user=postgres
STATEMENT: START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
6/5AE67D79 (proto_version '1', publication_names
'peerflow_pub_wal_testing_2') <- SUCCEEDS
2023-12-27 01:12:44.582 UTC [59794]: [4-1] db=postgres,user=postgres LOG:
logical decoding found consistent point at 6/5A31F050
Happy to include any additional details of my setup.
Thanks,
Kaushik
On Tue, Dec 26, 2023 at 10:36 AM Kaushik Iska <kaushik(at)peerdb(dot)io> wrote:
> Dear PostgreSQL Community,
>
> I am seeking guidance regarding a recurring issue we've encountered with
> WAL segment removal during logical replication using pgoutput plugin. We
> sporadically encounter an error indicating that a requested WAL segment has
> already been removed. This issue arises intermittently when executing
> START_REPLICATION. An example error message is as follows:
>
>
> requested WAL segment 000000010000146000000AE has already been removed
>
>
> Please note that this error is not specific to the segment mentioned
> above; it serves as an example of the type of error we are experiencing.
>
> Additional Context:
>
>
> -
>
> max_slot_wal_keep_size is -1, logical_decoding_work_mem is 4 GB.
> -
>
> The error seems to appear randomly and is not consistent.
> -
>
> After a couple of retries, the replication process eventually succeeds.
> -
>
> For one of the users it seems to be happening every 16 hours or so.
>
>
> Our approach involves starting with START_REPLICATION 0, replicating data
> in batches, and then restarting at the last LSN of the previous batch. We
> are trying to understand the root cause behind the intermittent removal of
> WAL segments during logical replication. Specifically, we are looking for
> insights into:
>
>
> -
>
> The potential reasons for the WAL segments being reported as removed.
> -
>
> Why this error occurs intermittently and why replication succeeds
> after several retries.
> -
>
> Any advice on troubleshooting and resolving this issue, or insights
> into whether it might be related to our specific replication setup or a
> characteristic of pgoutput, would be highly valuable.
>
>
> Related Posts
>
>
> -
>
> https://issues.redhat.com/browse/DBZ-590
> -
>
> Troubleshooting Postgres Sources | Airbyte Documentation
> <https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#under-cdc-incremental-mode-there-are-still-full-refresh-syncs>
> -
>
>
> https://fivetran.com/docs/databases/postgresql/troubleshooting/last-tracked-lsn-error
>
>
>
> Thank you very much for your time and assistance.
>
> Thanks,
>
> Kaushik Iska
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2023-12-27 15:40:55 | Re: Changing a schema's name with function1 calling function2 |
Previous Message | Kirk Wolak | 2023-12-27 07:29:59 | Re: Read write performance check |