Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error

From: Mohan NBSPS <mohan(dot)nbs(dot)ont(at)gmail(dot)com>
To: Johannes Truschnigg <johannes(at)truschnigg(dot)info>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error
Date: 2024-05-28 19:00:23
Message-ID: CAPCvfWdaymCYkLz-AseQLwGj8qh6pT1Kx73LoS-wkNNrukEX_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Tue, May 28, 2024 at 2:47 PM Johannes Truschnigg <
johannes(at)truschnigg(dot)info> wrote:

> Hi Mohan,
>
> On Tue, May 28, 2024 at 02:26:41PM -0400, Mohan NBSPS wrote:
> > Dear Community,
> > [...]
> > ```
> > FATAL: could not receive data from WAL stream: ERROR: requested WAL
> > segment 000000010000004100000049 has already been removed
> > FATAL: the database system is starting up
> > ```
> >
> > from my understanding, the WAL file is streamed over the network
> (secondary
> > pulls from primary) and creates a WAL file in the secondary.
> > then it replays the copied WAL file using a different process.
> >
> > in order for the local WAL file to go out of sync,
> >
> > 1. the primary removed the WAL file, the secondary was streaming
> > 2. the WAL file on the secondary got corrupted
> > 3 ....
> >
> > Questions
> >
> > - what do those error messages mean ?
> > - how can I prevent this from happening ?
>
> It means that, unless you have archived the required WAL segments somewhere
> and can recover them from there, your replica is now broken, and you will
> have
> to re-create it anew.
>
> You can prevent this by correctly configuring streaming replication either
> by
> using replication slots (not sure if that's already implemented in 9.5,
> actually - you should prioritize upgrading to a supported release while you
> are working this problem!), or by introducing a WAL archive[0] for
> replicas to
> retrieve WAL from that the primary has already evicted from its kept
> segments.
>
> Hth!
>

Thank you Johannes for the advice.

We are looking at moving to 16.
We did not implement slots to avoid disk space issues on primary (possible
network disconnect may fill up primary `pg_xlog`).

We have changed the WAL settings to retain more WAL files.

Since we have not seen this issue before, (have been running postgresql for
over 10 years), what kind
of scenario would trigger this ?

- we do not see any network latencies or outages

Thank you again.

>
> [0]:
>
> https://www.postgresql.org/docs/9.5/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVING
>
> --
> with best regards:
> - Johannes Truschnigg ( johannes(at)truschnigg(dot)info )
>
> www: https://johannes.truschnigg.info/
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Johannes Truschnigg 2024-05-28 19:11:36 Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error
Previous Message Johannes Truschnigg 2024-05-28 18:47:12 Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error