Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error

From: Mohan NBSPS <mohan(dot)nbs(dot)ont(at)gmail(dot)com>
To: Johannes Truschnigg <johannes(at)truschnigg(dot)info>
Cc: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>, Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error
Date: 2024-05-31 14:28:58
Message-ID: CAPCvfWevVWABj+Vroo8EFZYm4=JDvMzb3AAh9G0qdSrq-5O3Aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Update

Regarding the failing secondaries, with error

```
LOG: invalid resource manager ID
```

It was identified that the server running the secondaries was down for a
considerable amount of time
(2-3 hours).
The errors started around that date.

Guess, the primaries (several of them) moved on writing to the WAL based on
how busy they are
and the secondaries were out of sync in the WAL file.

Thank you for the advice.

On Wed, May 29, 2024 at 1:19 AM Johannes Truschnigg <
johannes(at)truschnigg(dot)info> wrote:

> On Tue, May 28, 2024 at 05:24:56PM -0400, Ron Johnson wrote:
> > On Tue, May 28, 2024 at 3:11 PM Johannes Truschnigg <
> > >[...]
> > > Yes, replication slots can interrupt your primary.
> > >
> >
> > Please define "interrupt". Using a replication slot, I thought files
> would
> > just accumulate in pg_wal while the replica is down (or the network is
> > slow, or the replica can't keep up with the primary).
> >
> > Disaster, of course, when that disk fills up, but that's always been the
> > case.
>
> And that is exactly the scenario I meant when I said "interrupt". If you
> use
> replication slots, your monitoring/alerting isn't set up correctly, and
> you're
> accumulating a lot of WAL, chances are ENOSPC on the primary is around the
> corner for you.
>
> That's why I generally prefer a WAL archive on a separate file system for
> replicas to source segments from, because filling that up won't break the
> primary (unless the archive_command misbehaves). That also needs proper
> monitoring/alerting, of course (and a contingency plan for what to do
> when/if
> the archive runs over) - but everyone whose workload is important enough
> for a
> replication setup to make sense is required to have that in my book.
>
> --
> with best regards:
> - Johannes Truschnigg ( johannes(at)truschnigg(dot)info )
>
> www: https://johannes.truschnigg.info/
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message ROHIT SACHDEVA 2024-05-31 18:07:46 Re: Queries in replica are failing
Previous Message Erik Wienhold 2024-05-30 11:01:54 Re: psql - prompt for password