Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error

From: Mohan NBSPS <mohan(dot)nbs(dot)ont(at)gmail(dot)com>
To: Johannes Truschnigg <johannes(at)truschnigg(dot)info>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error
Date: 2024-05-28 19:24:33
Message-ID: CAPCvfWed_69rS6CbBhNQd8uTCXmK7YJPsgi5BSQ-CZdfFkwXeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Tue, May 28, 2024 at 3:11 PM Johannes Truschnigg <
johannes(at)truschnigg(dot)info> wrote:

> On Tue, May 28, 2024 at 03:00:23PM -0400, Mohan NBSPS wrote:
> > [...]
> > Thank you Johannes for the advice.
> >
> > We are looking at moving to 16.
> > We did not implement slots to avoid disk space issues on primary
> (possible
> > network disconnect may fill up primary `pg_xlog`).
>
> Yes, replication slots can interrupt your primary. Relying on
> wal_keep_segments alone can kill your replicas. Having a WAL archive can be
> the best of both worlds, but also needs careful monitoring and management.
>
>
> > We have changed the WAL settings to retain more WAL files.
> >
> > Since we have not seen this issue before, (have been running postgresql
> for
> > over 10 years), what kind
> > of scenario would trigger this ?
>
> Every time you interrupt the replication stream (such as when a replica
> reboots, or its postgres master process is stopped), you enter a race
> condition between WAL segments accumulating on the primary, and the
> replication stream to pick up again once the replica is up once more. So
> if,
> during your replica restart, enough WAL was produced to exceed
> wal_keep_segments, the lineage is broken, and the replica cannot ever
> catch up
> again.
>
> Also, the "invalid resource manager" log line you reported *might* hint at
> data corruption in your WAL segments. I think that data checksums and WAL
> compression could both make detection of such conditions more reliable.
>
>
so there was enough disruption or time lag for the current WAL file to go
out out of sync.
why I am saying this is, we do a lot of O/S patches and reboots over all
these years disrupting
the primary/secondary server.
however the robustness of postgres seems to reveal itself by recovering and
working without issue.

Thank you again.
I will take those recommendations into consideration for implementation.

I might update this thread and probably ask more questions.

--
> with best regards:
> - Johannes Truschnigg ( johannes(at)truschnigg(dot)info )
>
> www: https://johannes.truschnigg.info/
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Ron Johnson 2024-05-28 21:24:56 Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error
Previous Message Johannes Truschnigg 2024-05-28 19:11:36 Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error