how to switch old replication Master to new Standby after promoting old Standby

From: John Lumby <johnlumby(at)hotmail(dot)com>
To: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: how to switch old replication Master to new Standby after promoting old Standby
Date: 2016-03-14 17:28:16
Message-ID: COL131-W552E6E64C1C5202B7C3847A3880@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


From: johnlumby(at)hotmail(dot)com
To: pgsql-general-owner(at)postgresql(dot)org
Subject: how to switch old replication Master to new Standby after promoting old Standby
Date: Mon, 14 Mar 2016 13:23:29 -0400

In a scenario involving replication where no failure occurs but I want to interchange Master <->Standby,
*and* want to avoid making another full base backup and rsyncing it across,
I have found it easy to promote old Standby to new Master with pg_ctl promote,
but very difficult to restart the old Primary as a new Standby *without* performing new base backup.

Assume current 9.5.1 and using streaming replication with a named replication slot if relevant

Second assumption - I am able to temporarily prevent any relational updates to the database before I start the switchover
e.g. the
default_transaction_read_only = on
setting together with being able to control what transactions do if anything

So I reach a point where both systems have postgresql running without any replication,
both have identical content in all databases (that I can control),
and I am willing to tolerate short restarts if need be and also to scp/rsync the contents of pg_xlog
and other small files but *not* the entire cluster directory or any database base directories.

What do I do next?

Here is what I have found seems to work but I am not sure it is robust:

1. shut down both new Master and intended-to-be-new-Standby
2. on intended-to-be-new-Standby, remove the entire content of pg_xlog and the global/pg_control
3. from new Master , tar + scp the entire content of pg_xlog and the global/pg_control to intended-to-be-new-Standby
4. create intended-to-be-new-Standby's recovery.conf,
specifying recovery_target_timeline = 'latest'
(but I think it works with this setting omitted)
5. start new Master
6. start new Standby and up it comes
in its postgres log(**) I see
LOG: 00000: database system was shut down at 2016-03-14 16:41:24 GMT
LOCATION: StartupXLOG, xlog.c:5936
LOG: 00000: entering standby mode
LOCATION: StartupXLOG, xlog.c:6016
LOG: 00000: consistent recovery state reached at C/480000D0
LOCATION: CheckRecoveryConsistency, xlog.c:7493
LOG: 00000: invalid record length at C/480000D0
LOCATION: ReadRecord, xlog.c:3960
LOG: 00000: started streaming WAL from primary at C/48000000 on timeline 3
LOCATION: WalReceiverMain, walreceiver.c:358
LOG: 00000: redo starts at C/480000D0
LOCATION: StartupXLOG, xlog.c:6700

Is the invalid record length msg anything to worry about?

But this method is purely empirical. Is it robust? Anyone have any better recommendations?

Cheers, John Lumby

(**Note this log was from a pre-release 9.5, 9.5alpha2
I don't have 9.5.1 to hand at present

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Shulgin, Oleksandr 2016-03-14 17:54:11 Re: how to switch old replication Master to new Standby after promoting old Standby
Previous Message Adrian Klaver 2016-03-14 17:19:38 Re: Unexpected result using floor() function