Intermittent Issue with WAL Segment Removal in Logical Replication

From: Kaushik Iska <kaushik(at)peerdb(dot)io>
To: pgsql-general(at)postgresql(dot)org
Cc: Sai Krishna Srirampur <sai(at)peerdb(dot)io>, Philip Dubé <philip(at)peerdb(dot)io>
Subject: Intermittent Issue with WAL Segment Removal in Logical Replication
Date: 2023-12-26 15:36:35
Message-ID: CAHYLuVkH4nWohHyMBam2wQi0XT0NN5FJdTWjb+sS--YCkDWofA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dear PostgreSQL Community,

I am seeking guidance regarding a recurring issue we've encountered with
WAL segment removal during logical replication using pgoutput plugin. We
sporadically encounter an error indicating that a requested WAL segment has
already been removed. This issue arises intermittently when executing
START_REPLICATION. An example error message is as follows:

requested WAL segment 000000010000146000000AE has already been removed

Please note that this error is not specific to the segment mentioned above;
it serves as an example of the type of error we are experiencing.

Additional Context:

-

max_slot_wal_keep_size is -1, logical_decoding_work_mem is 4 GB.
-

The error seems to appear randomly and is not consistent.
-

After a couple of retries, the replication process eventually succeeds.
-

For one of the users it seems to be happening every 16 hours or so.

Our approach involves starting with START_REPLICATION 0, replicating data
in batches, and then restarting at the last LSN of the previous batch. We
are trying to understand the root cause behind the intermittent removal of
WAL segments during logical replication. Specifically, we are looking for
insights into:

-

The potential reasons for the WAL segments being reported as removed.
-

Why this error occurs intermittently and why replication succeeds after
several retries.
-

Any advice on troubleshooting and resolving this issue, or insights into
whether it might be related to our specific replication setup or a
characteristic of pgoutput, would be highly valuable.

Related Posts

-

https://issues.redhat.com/browse/DBZ-590
-

Troubleshooting Postgres Sources | Airbyte Documentation
<https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#under-cdc-incremental-mode-there-are-still-full-refresh-syncs>
-

https://fivetran.com/docs/databases/postgresql/troubleshooting/last-tracked-lsn-error

Thank you very much for your time and assistance.

Thanks,

Kaushik Iska

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Clemens Eisserer 2023-12-26 16:16:41 Re: Query crash with 15.5 on debian bookworm/armv8
Previous Message Laurence Parry 2023-12-26 14:15:34 Logical replication stall on box type with REPLICA IDENTITY FULL due to lack of exact equality