Re: Intermittent Issue with WAL Segment Removal in Logical Replication

From: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
To: Kaushik Iska <kaushik(at)peerdb(dot)io>
Cc: pgsql-general(at)postgresql(dot)org, Sai Krishna Srirampur <sai(at)peerdb(dot)io>, Philip Dubé <philip(at)peerdb(dot)io>
Subject: Re: Intermittent Issue with WAL Segment Removal in Logical Replication
Date: 2023-12-28 22:24:25
Message-ID: CANzqJaCkTp3itT_r5nWpV-tOT8JZ27HuVUYEhcA3HmraN5gfKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Dec 28, 2023 at 4:54 PM Kaushik Iska <kaushik(at)peerdb(dot)io> wrote:

> Hi all,
>
> I'm including additional details, as I am able to reproduce this issue a
> little more reliably.
>
> Postgres Version: POSTGRES_14_9.R20230830.01_07
> Vendor: Google Cloud SQL
> Logical Replication Protocol version 1
>
> Here are the logs of attempt succeeding right after it fails:
>
> 2023-12-27 01:12:40.581 UTC [59790]: [6-1] db=postgres,user=postgres
> STATEMENT: START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
> 6/5AE67D79 (proto_version '1', publication_names
> 'peerflow_pub_wal_testing_2') <- FAILS
> 2023-12-27 01:12:41.087 UTC [59790]: [7-1] db=postgres,user=postgres
> ERROR: requested WAL segment 000000010000000600000059 has already been
> removed
> 2023-12-27 01:12:44.581 UTC [59794]: [3-1] db=postgres,user=postgres
> STATEMENT: START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
> 6/5AE67D79 (proto_version '1', publication_names
> 'peerflow_pub_wal_testing_2') <- SUCCEEDS
> 2023-12-27 01:12:44.582 UTC [59794]: [4-1] db=postgres,user=postgres LOG:
> logical decoding found consistent point at 6/5A31F050
>
> Happy to include any additional details of my setup.
>
> Thanks,
> Kaushik
>
>
> On Tue, Dec 26, 2023 at 10:36 AM Kaushik Iska <kaushik(at)peerdb(dot)io> wrote:
>
>> Dear PostgreSQL Community,
>>
>> I am seeking guidance regarding a recurring issue we've encountered with
>> WAL segment removal during logical replication using pgoutput plugin. We
>> sporadically encounter an error indicating that a requested WAL segment has
>> already been removed. This issue arises intermittently when executing
>> START_REPLICATION. An example error message is as follows:
>>
>>
>> requested WAL segment 000000010000146000000AE has already been removed
>>
>>
>> Please note that this error is not specific to the segment mentioned
>> above; it serves as an example of the type of error we are experiencing.
>>
>> Additional Context:
>>
>>
>> -
>>
>> max_slot_wal_keep_size is -1, logical_decoding_work_mem is 4 GB.
>> -
>>
>> The error seems to appear randomly and is not consistent.
>> -
>>
>> After a couple of retries, the replication process eventually
>> succeeds.
>> -
>>
>> For one of the users it seems to be happening every 16 hours or so.
>>
>>
>> Our approach involves starting with START_REPLICATION 0, replicating data
>> in batches, and then restarting at the last LSN of the previous batch. We
>> are trying to understand the root cause behind the intermittent removal of
>> WAL segments during logical replication. Specifically, we are looking for
>> insights into:
>>
>>
>> -
>>
>> The potential reasons for the WAL segments being reported as removed.
>> -
>>
>> Why this error occurs intermittently and why replication succeeds
>> after several retries.
>> -
>>
>> Any advice on troubleshooting and resolving this issue, or insights
>> into whether it might be related to our specific replication setup or a
>> characteristic of pgoutput, would be highly valuable.
>>
>>
>> Related Posts
>>
>>
>> -
>>
>> https://issues.redhat.com/browse/DBZ-590
>> -
>>
>> Troubleshooting Postgres Sources | Airbyte Documentation
>> <https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#under-cdc-incremental-mode-there-are-still-full-refresh-syncs>
>> -
>>
>>
>> https://fivetran.com/docs/databases/postgresql/troubleshooting/last-tracked-lsn-error
>>
>>
>>
>> Thank you very much for your time and assistance.
>>
>> Thanks,
>>
>> Kaushik Iska
>>
>>
It might be interesting to see the contents of pg_replication_slots.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Junwang Zhao 2023-12-29 10:05:37 Re: Increased storage size of jsonb in pg15
Previous Message Adrian Klaver 2023-12-28 20:47:10 Re: Increased storage size of jsonb in pg15