Failover Testing Failures: invalid resource manager ID in primary checkpoint record

From: Don Seiler <don(at)seiler(dot)us>
To: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Failover Testing Failures: invalid resource manager ID in primary checkpoint record
Date: 2023-01-18 23:47:37
Message-ID: CAHJZqBAEx_rxuApJaBX7g9i9yrz8vVjvZAPG+P9=XgYuzrrgAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

PostgreSQL 12.13 (PGDG packages) in a streaming replication configuration.
pgBackrest 2.43 used for WAL archiving and DB backups to cloud storage

I'm testing and documenting a DR exercise process where I:

1. Cleanly shutdown PG on the primary
2. Promote the PG DR replica
3. Place the standby.signal file on the old primary and start it up
(presumes no other configurations need changing, primary_conninfo etc were
already set).

My hope is I could just start the old primary / new replica if it was
cleanly shutdown prior to promoting the replica. However when I try to
start up that new replica, I'm met with:

LOG: restored log file "00000002000000B70000005A" from archive
LOG: invalid resource manager ID in primary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 17660) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
LOG: database system is shut down

It doesn't appear any WAL files are missing as it finds all the files that
it asks for. Am I missing a piece here?

My hope is to avoid having to do a restore to rebuild the new replica.

Aside for those that may be asking: most of these databases do not have
data checksums enabled so pg_rewind isn't in the picture. Although I'm
reading now that we could enable the wal_log_hints parameter as an
alternative. I'm leery of the overhead but if it's the same overhead that
would be done with data checksums then I guess there would be nothing lost
when we eventually enable them.

--
Don Seiler
www.seiler.us

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2023-01-19 12:44:34 Re: Failover Testing Failures: invalid resource manager ID in primary checkpoint record
Previous Message Tom Lane 2023-01-18 16:02:18 Re: \dconfig in PostgreSQL 14