pg_rewind - restore new slave failed to startup during recovery

From: Dylan Luong <Dylan(dot)Luong(at)unisa(dot)edu(dot)au>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: pg_rewind - restore new slave failed to startup during recovery
Date: 2017-08-22 00:52:58
Message-ID: 7c8e0fb3f88e4904b9026465b093f78e@ITUPW-EXMBOX2B.UniNet.unisa.edu.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi
I have 1 master and 1 slave wal streaming replication setup and the Application connects via a load balancer (LTM) where the all connections are redirected to the master member (master db).

We have archive_mode enabled.

I am trying to test to use pg_rewind to restore the new slave (old master) after a failover while the system is under load.

Here are the steps I take to test:

1. Disable the master ltm member (all connections redired to slave member)

2. Promote slave (touch promote.me)

3. Stop the master db (old master)

4. Do pg_rewind on the new slave (old master)

5. Start the new slave.

Please see attached psql.jpg for the result from the pg_rewind.

However, when I tried to start the new slave, I am getting the error that it cannot locate the archive wal files and can not receive data from WAL stream error.
Please see attached logs.jpg.

Checking the on the new master, I see that the check point that its trying to restore is the file 000000040000009C0000006F, but the file does not exist anywhere on the new master. Not in the pg_xlog or the archive folder. (as specified in the postgresql.conf)

Please see attached psql.jpg.

Here is my recovery.conf :

standby_mode = 'on'
primary_conninfo = 'host=10.69.19.18 user=replicant'
trigger_file = '/var/run/promote_me'
restore_command = 'cp /pg_backup/backup/archive_sync/%f "%p"'

does anyone know why?

Under what conditions will pg_rewind wont' work?

Thanks.
Regards
Dylan

Attachment Content-Type Size
image/jpeg 30.8 KB
logs.jpg image/jpeg 82.5 KB
image/jpeg 30.8 KB

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2017-08-22 01:06:44 Re: pg_rewind - restore new slave failed to startup during recovery
Previous Message marcelo 2017-08-21 16:27:56 Porting libpq to QNX 4.25