BUG #16879: Delayed standby does not connect to primary on startup

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: mahadevan(at)rapidloop(dot)com
Subject: BUG #16879: Delayed standby does not connect to primary on startup
Date: 2021-02-21 09:39:26
Message-ID: 16879-3673aa2e1562a38f@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16879
Logged by: Mahadevan Ramachandran
Email address: mahadevan(at)rapidloop(dot)com
PostgreSQL version: 13.2
Operating system: Linux, Debian 10
Description:

Hi.

Below is a situation reproducible with version 13.1 and 13.2 (at least). At
the end of it, the streaming replication standby on startup does not connect
to the primary. It is unclear whether this is an issue, or whether the
standby will connect later on somehow and resume replication.

We have a customer who reported pg_last_wal_receive_lsn() returning NULL on
a delayed standby, and this is what the investigation led to. In their case
though, the standby was pulling in wal via restore_command and not
replicaton slots. Customer reports he can see changes in tables as expected,
with the appropriate delay. Primary and standby have been running well
beyond recovery_min_apply_delay.

Here are the steps to reproduce:

Step 1: Have a primary streaming-replicating to a standby via a replication
slot. Both primary and standby have default configurations from initdb,
except for:

@primary:
port = 7000

@standby:
port = 7001
hot_standby = on
primary_conninfo = 'port=7000'
primary_slot_name = 'slot1'

Step 2: Ensure standby is all caught up, and no ongoing changes in
primary.

@primary:
postgres=# select slot_name, restart_lsn, active, active_pid from
pg_replication_slots ;
slot_name | restart_lsn | active | active_pid
-----------+-------------+--------+------------
slot1 | 0/2EDFF8C8 | t | 28061
(1 row)

@standby:
postgres=# select pg_last_wal_replay_lsn(), pg_last_wal_receive_lsn();
pg_last_wal_replay_lsn | pg_last_wal_receive_lsn
------------------------+-------------------------
0/2EDFF8C8 | 0/2EDFF8C8
(1 row)

Step 3: Change recovery_min_apply_delay = 1h in standby's configuration.
Restart the standby.

@primary:
postgres=# select slot_name, restart_lsn, active, active_pid from
pg_replication_slots ;
slot_name | restart_lsn | active | active_pid
-----------+-------------+--------+------------
slot1 | 0/2EDFF8C8 | t | 28180
(1 row)

@standby:
postgres=# select pg_last_wal_replay_lsn(), pg_last_wal_receive_lsn();
pg_last_wal_replay_lsn | pg_last_wal_receive_lsn
------------------------+-------------------------
0/2EDFF8C8 | 0/2E000000

Step 4: Make some updates in the primary: "pgbench -T5" should do it. Wait
for changes to finish.

Step 5: Restart the standby again.

@primary:
postgres=# select slot_name, restart_lsn, active, active_pid from
pg_replication_slots ;
slot_name | restart_lsn | active | active_pid
-----------+-------------+--------+------------
slot1 | 0/2FDE97A8 | f | ~

@standby:
postgres=# select pg_last_wal_replay_lsn(), pg_last_wal_receive_lsn();
pg_last_wal_replay_lsn | pg_last_wal_receive_lsn
------------------------+-------------------------
0/2EE05DB8 | ~

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-02-21 12:27:47 BUG #16880: running into 2 error when trying to install
Previous Message PG Bug reporting form 2021-02-20 15:24:06 BUG #16878: Not highlighting search results