BUG #14109: pg_rewind fails to update target control file in one scenario

From: johnlumby(at)hotmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #14109: pg_rewind fails to update target control file in one scenario
Date: 2016-04-24 19:25:49
Message-ID: 20160424192549.2725.71787@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 14109
Logged by: John Lumby
Email address: johnlumby(at)hotmail(dot)com
PostgreSQL version: 9.5.1
Operating system: linux 64-bit
Description:

scenario :
two systems currently in an operating streaming replication relationship :
Primary systemA Standby SystemB
with no WAL queued and no inserts/updates/deletes now being performed on
systemA

then in chronological sequence :
. shut down SystemA
. pg_ctl promote SystemB
and verify systemB is running correctly stand-alone
. pg_rewind SystemA
output is something like
connected to server
fetched file "global/pg_control", length 8192
fetched file "pg_xlog/0000000D.history", length 388
servers diverged at WAL position 9/A90002A8 on timeline
12
no rewind required

. set up correct recovery.conf on SystemA
. start SystemA postgres server

At this point, both systemB and systemA appear to be running correctly,
but any insert/update/delete now performed on systemB is not replicated to
systemA.
Also pg_stat_replication view on systemB shows state 'startup' , not
'streaming'

I believe there is a bug in pg_rewind for this scenario, where it finds
that
the following conditions are true :
1 - source and target cluster are not on the same timeline
2 - the histories diverged exactly at the end of the
shutdown checkpoint record on the target,
so there are no WAL records in the target
that don't belong in the source's history

The code then concludes that no rewind is needed.

Which is true --
However, what I believe *is* needed is to update the target control file
with the new timeline and other information from the source.

This patch seems to fix the problem on my system :

--- src/bin/pg_rewind/pg_rewind.c.orig 2016-02-08 16:12:28.000000000 -0500
+++ src/bin/pg_rewind/pg_rewind.c 2016-04-24 14:50:52.646737233 -0400
@@ -247,7 +247,14 @@ main(int argc, char **argv)
* needed.
*/
if (chkptendrec == divergerec)
+ {
rewind_needed = false;
+ /* however we must still copy the control file from source
to target
+ * because of the timeline change.
+ */
+ printf(_("no rewind required but will update global control file from
source for increase in timeline.\n"));
+ goto updateControlFile;
+ }
else
rewind_needed = true;
}
@@ -318,6 +325,7 @@ main(int argc, char **argv)
pg_log(PG_PROGRESS, "\ncreating backup label and updating control
file\n");
createBackupLabel(chkptredo, chkpttli, chkptrec);

+ updateControlFile:
/*
* Update control file of target. Make it ready to perform archive
* recovery when restarting.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2016-04-25 07:23:58 Re: BUG #14109: pg_rewind fails to update target control file in one scenario
Previous Message Noah Misch 2016-04-24 18:15:17 Re: BUG #14081: System LC_COLLATE changed