Re: Primary and standby setting cross-checks

From: Noah Misch <noah(at)leadboat(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Primary and standby setting cross-checks
Date: 2024-09-25 03:03:45
Message-ID: 20240925030345.67.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 29, 2024 at 09:52:06PM +0300, Heikki Linnakangas wrote:
> Currently, if you configure a hot standby server with a smaller
> max_connections setting than the primary, the server refuses to start up:
>
> LOG: entering standby mode
> FATAL: recovery aborted because of insufficient parameter settings
> DETAIL: max_connections = 10 is a lower setting than on the primary server,
> where its value was 100.

> happen anyway:
>
> 2024-08-29 21:44:32.634 EEST [668327] FATAL: out of shared memory
> 2024-08-29 21:44:32.634 EEST [668327] HINT: You might need to increase
> "max_locks_per_transaction".
> 2024-08-29 21:44:32.634 EEST [668327] CONTEXT: WAL redo at 2/FD40FCC8 for
> Standby/LOCK: xid 996 db 5 rel 154045
> 2024-08-29 21:44:32.634 EEST [668327] WARNING: you don't own a lock of type
> AccessExclusiveLock
> 2024-08-29 21:44:32.634 EEST [668327] LOG: RecoveryLockHash contains entry
> for lock no longer recorded by lock manager: xid 996 database 5 relation
> 154045
> TRAP: failed Assert("false"), File: "../src/backend/storage/ipc/standby.c",

> Granted, if you restart the server, it will probably succeed because
> restarting the server will kill all the other queries that were holding
> locks. But yuck.

Agreed.

> So how to improve this? I see a few options:
>
> a) Downgrade the error at startup to a warning, and allow starting the
> standby with smaller settings in standby. At least with a smaller
> max_locks_per_transactions. The other settings also affect the size of
> known-assigned XIDs array, but if the CSN snapshots get committed, that will
> get fixed. In most cases there is enough lock memory anyway, and it will be
> fine. Just fix the assertion failure so that the error message is a little
> nicer.
>
> b) If you run out of lock space, kill running queries, and prevent new ones
> from starting. Track the locks in startup process' private memory until
> there is enough space in the lock manager, and then re-open for queries. In
> essence, go from hot standby mode to warm standby, until it's possible to go
> back to hot standby mode again.

Either seems fine. Having never encountered actual lock exhaustion from this,
I'd lean toward (a) for simplicity.

> Thoughts, better ideas?

I worry about future code assuming a MaxBackends-sized array suffices for
something. That could work almost all the time, breaking only when a standby
replays WAL from a server having a larger array. What could we do now to
catch that future mistake promptly? As a start, 027_stream_regress.pl could
use low settings on its standby.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-09-25 03:10:02 Re: Normalize queries starting with SET for pg_stat_statements
Previous Message Nisha Moond 2024-09-25 02:50:19 Re: Clock-skew management in logical replication