Broken replication in 3 node setup, master halt, standby 1 promoted, standby 2 - replication not working

From: Pritam Barhate <pritambarhate(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Broken replication in 3 node setup, master halt, standby 1 promoted, standby 2 - replication not working
Date: 2018-01-10 19:06:54
Message-ID: CALpo98VPj7oFhxUgiXBj+MGpq-M8nt0btHnu2P7R=UsL6zZZaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

I am trying to test the failover scenario on a brand new 3 node setup.

pg1 is master
pg2 and pg3 are standby hosts in streaming replication, both following pg1.

When I halt pg1, promote pg2 and update recovery.conf on pg3 to point to
pg2, the replication is broken. Can you please help me identify the correct
procedure to achieve this failover?

Here are the details:

Software Versions:

PostgreSQL: 9.6
Ubuntu 16.04
PgBackRest 1.25

Here is the scenario:

Set up 3 nodes in streaming replication:

master (pg1), standby 1 (pg2), standby 2 (pg3)

Also, I have set up pgbackrest, to back up from standby. It's working
correctly.

Run the following on pg1:

sudo -u postgres psql -x -c "select * from pg_stat_replication;"

-[ RECORD 1 ]----+------------------------------
pid | 20714
usesysid | 16384
usename | replicator
application_name | walreceiver
client_addr | 10.0.15.11
client_hostname |
client_port | 48640
backend_start | 2018-01-10 18:17:46.105694+00
backend_xmin |
state | streaming
sent_location | 0/6000060
write_location | 0/6000060
flush_location | 0/6000060
replay_location | 0/6000060
sync_priority | 0
sync_state | async
-[ RECORD 2 ]----+------------------------------
pid | 20715
usesysid | 16384
usename | replicator
application_name | walreceiver
client_addr | 10.0.15.12
client_hostname |
client_port | 42198
backend_start | 2018-01-10 18:17:48.712575+00
backend_xmin |
state | streaming
sent_location | 0/6000060
write_location | 0/6000060
flush_location | 0/6000060
replay_location | 0/6000060
sync_priority | 0
sync_state | async

It reports both the standbys working correctly.

Halt pg1.

On pg2:

sudo -u postgres pg_ctlcluster 9.6 main promote

On pg3:

sudo service postgresql stop

sudo -u postgres vim /var/lib/postgresql/9.6/main/recovery.conf

modify it's contents are as follows:

```
primary_conninfo = 'host=pg2 port=5432 user=replicator'
standby_mode = 'on'
restore_command = '/usr/bin/pgbackrest --stanza=main archive-get %f "%p"'
```

sudo service postgresql start

On pg2:

sudo -u postgres psql -x -c "select * from pg_stat_replication;"

-[ RECORD 1 ]----+------------------------------
pid | 21238
usesysid | 16384
usename | replicator
application_name | walreceiver
client_addr | 10.0.15.12
client_hostname |
client_port | 44852
backend_start | 2018-01-10 18:22:20.980701+00
backend_xmin |
state | startup
sent_location |
write_location |
flush_location |
replay_location |
sync_priority | 0
sync_state | async

As you can see sent_location, write_location, flush_location all are empty.

On pg3:

tail -f /var/log/postgresql/postgresql-9.6-main.log
2018-01-10 18:22:19.490 UTC [21229] LOG: restored log file
"000000010000000000000005" from archive
2018-01-10 18:22:19.497 UTC [21229] LOG: redo starts at 0/5000028
2018-01-10 18:22:19.757 UTC [21243] postgres(at)postgres FATAL: the database
system is starting up
2018-01-10 18:22:20.188 UTC [21229] LOG: restored log file
"000000010000000000000006" from archive
2018-01-10 18:22:20.195 UTC [21229] LOG: consistent recovery state reached
at 0/7000000
2018-01-10 18:22:20.195 UTC [21228] LOG: database system is ready to
accept read only connections
2018-01-10 18:22:20.812 UTC [21229] LOG: invalid magic number 0000 in log
segment 000000010000000000000007, offset 0
2018-01-10 18:22:20.824 UTC [21255] LOG: fetching timeline history file
for timeline 2 from primary server
2018-01-10 18:22:20.828 UTC [21255] LOG: primary server contains no more
WAL on requested timeline 1
2018-01-10 18:22:21.417 UTC [21255] LOG: primary server contains no more
WAL on requested timeline 1

At this point is I insert data in pg2, it doesn't reflect in pg3.

Please help.

Regards,

Pritam.

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Azimuddin Mohammed 2018-01-10 20:24:55 Replication mode
Previous Message Francis Santiago 2018-01-10 17:32:38 Re: pgAdmin Import/export does not show a progress dialog on mac