From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Richard Schmidt <Richard(dot)Schmidt(at)metservice(dot)com> |
Cc: | "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Pg_rewind cannot load history wal |
Date: | 2018-08-03 20:59:22 |
Message-ID: | 20180803205922.GC20967@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Aug 01, 2018 at 09:09:30PM +0000, Richard Schmidt wrote:
> Our procedure that runs on machine A and B is as follows:
>
> 1. Build new databases on A and B, and configure A as Primary and B
> as Standby databases.
> 2. Make some changes to the A (the primary) and check that they are
> replicated to the B (the standby)
> 3. Promote B to be the new primary
> 4. Switch of the A (the original primary)
> 5. Add the replication slot to B (the new primary) for A (soon to
> be standby)
> 6. Add a recovery.conf to A (soon to be standby). File contains
> recovery_target_timeline = 'latest' and restore_command = 'cp
> /ice-dev/wal_archive/%f "%p"
> 7. Run pg_rewind on A - this appears to work as it returns the
> message 'source and target cluster are on the same timeline no
> rewind required';
> 8. Start up server A (now a slave)
Step 7 is incorrect here, after promotion of B you should see pg_rewind
actually do its work. The problem is that you are missing a piece in
your flow in the shape of a checkpoint on the promoted standby to run
after 3 and before step 7. This makes the promoted standby update its
timeline number in the on-disk control file, which is used by pg_rewind
to check if a rewind needs to happen or not.
We see too many reports of such mistakes, I am going to propose a patch
on the -hackers mailing list to mention that in the documentation...
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Alessandro Aste | 2018-08-03 21:02:22 | Re: Eror while dropping a user |
Previous Message | Ertan Küçükoğlu | 2018-08-03 20:45:15 | RE: How to prevent "no wait lock" after a connection drop |