Re: Failover Testing Failures: invalid resource manager ID in primary checkpoint record

From: Don Seiler <don(at)seiler(dot)us>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Failover Testing Failures: invalid resource manager ID in primary checkpoint record
Date: 2023-01-19 17:23:23
Message-ID: CAHJZqBAOo+TCLwCxf8-_0h4r=kYFpDb8UC5hxLv_mDfUHJ8vkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Thu, Jan 19, 2023 at 9:57 AM Don Seiler <don(at)seiler(dot)us> wrote:

> On Thu, Jan 19, 2023 at 9:50 AM Don Seiler <don(at)seiler(dot)us> wrote:
>
>> I'm going to have to review what chef might have done. I was relying on
>> chef to deploy the configs before attempting to restart but it may have
>> tried to start the service early
>>
>
> Reviewing the chef recipe, this does seem to be the case. The code
> to determine whether or not to place the standby.signal file comes after it
> already attempts to start the PG service. Another self-inflicted wound
> apparently.
>

Yes this was exactly what the problem was. After fixing the order of
operations in the chef recipe, the old primary nicely transitioned into the
new replica without having to do any rewind/restore operation. Obviously
requires cleanly shutting down the old primary first, which should be the
case for planned DR exercises. True DR emergencies would require rewind or
restore if the old primary were revived.

Don.

--
Don Seiler
www.seiler.us

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Stephen Frost 2023-01-20 04:05:05 Re: pgbackrest questions
Previous Message Don Seiler 2023-01-19 15:57:17 Re: Failover Testing Failures: invalid resource manager ID in primary checkpoint record