Re: Pg_rewind cannot load history wal

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Richard Schmidt <Richard(dot)Schmidt(at)metservice(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Pg_rewind cannot load history wal
Date: 2018-08-03 20:59:22
Message-ID: 20180803205922.GC20967@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Aug 01, 2018 at 09:09:30PM +0000, Richard Schmidt wrote:
> Our procedure that runs on machine A and B is as follows:
>
> 1. Build new databases on A and B, and configure A as Primary and B
> as Standby databases.
> 2. Make some changes to the A (the primary) and check that they are
> replicated to the B (the standby)
> 3. Promote B to be the new primary
> 4. Switch of the A (the original primary)
> 5. Add the replication slot to B (the new primary) for A (soon to
> be standby)
> 6. Add a recovery.conf to A (soon to be standby). File contains
> recovery_target_timeline = 'latest' and restore_command = 'cp
> /ice-dev/wal_archive/%f "%p"
> 7. Run pg_rewind on A - this appears to work as it returns the
> message 'source and target cluster are on the same timeline no
> rewind required';
> 8. Start up server A (now a slave)

Step 7 is incorrect here, after promotion of B you should see pg_rewind
actually do its work. The problem is that you are missing a piece in
your flow in the shape of a checkpoint on the promoted standby to run
after 3 and before step 7. This makes the promoted standby update its
timeline number in the on-disk control file, which is used by pg_rewind
to check if a rewind needs to happen or not.

We see too many reports of such mistakes, I am going to propose a patch
on the -hackers mailing list to mention that in the documentation...
--
Michael

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alessandro Aste 2018-08-03 21:02:22 Re: Eror while dropping a user
Previous Message Ertan Küçükoğlu 2018-08-03 20:45:15 RE: How to prevent "no wait lock" after a connection drop