From: | Mohan NBSPS <mohan(dot)nbs(dot)ont(at)gmail(dot)com> |
---|---|
To: | Johannes Truschnigg <johannes(at)truschnigg(dot)info> |
Cc: | pgsql-admin(at)lists(dot)postgresql(dot)org |
Subject: | Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error |
Date: | 2024-05-28 19:24:33 |
Message-ID: | CAPCvfWed_69rS6CbBhNQd8uTCXmK7YJPsgi5BSQ-CZdfFkwXeA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
On Tue, May 28, 2024 at 3:11 PM Johannes Truschnigg <
johannes(at)truschnigg(dot)info> wrote:
> On Tue, May 28, 2024 at 03:00:23PM -0400, Mohan NBSPS wrote:
> > [...]
> > Thank you Johannes for the advice.
> >
> > We are looking at moving to 16.
> > We did not implement slots to avoid disk space issues on primary
> (possible
> > network disconnect may fill up primary `pg_xlog`).
>
> Yes, replication slots can interrupt your primary. Relying on
> wal_keep_segments alone can kill your replicas. Having a WAL archive can be
> the best of both worlds, but also needs careful monitoring and management.
>
>
> > We have changed the WAL settings to retain more WAL files.
> >
> > Since we have not seen this issue before, (have been running postgresql
> for
> > over 10 years), what kind
> > of scenario would trigger this ?
>
> Every time you interrupt the replication stream (such as when a replica
> reboots, or its postgres master process is stopped), you enter a race
> condition between WAL segments accumulating on the primary, and the
> replication stream to pick up again once the replica is up once more. So
> if,
> during your replica restart, enough WAL was produced to exceed
> wal_keep_segments, the lineage is broken, and the replica cannot ever
> catch up
> again.
>
> Also, the "invalid resource manager" log line you reported *might* hint at
> data corruption in your WAL segments. I think that data checksums and WAL
> compression could both make detection of such conditions more reliable.
>
>
so there was enough disruption or time lag for the current WAL file to go
out out of sync.
why I am saying this is, we do a lot of O/S patches and reboots over all
these years disrupting
the primary/secondary server.
however the robustness of postgres seems to reveal itself by recovering and
working without issue.
Thank you again.
I will take those recommendations into consideration for implementation.
I might update this thread and probably ask more questions.
--
> with best regards:
> - Johannes Truschnigg ( johannes(at)truschnigg(dot)info )
>
> www: https://johannes.truschnigg.info/
>
From | Date | Subject | |
---|---|---|---|
Next Message | Ron Johnson | 2024-05-28 21:24:56 | Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error |
Previous Message | Johannes Truschnigg | 2024-05-28 19:11:36 | Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error |