From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Use pg_rewind when target timeline was switched |
Date: | 2015-08-20 06:57:20 |
Message-ID: | CAB7nPqSqOgXOhp-k4HB_3iMwfTk6Ypq0aqDBF4At3zwx1e-Sdw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 22, 2015 at 4:28 PM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> On Wed, Jul 22, 2015 at 8:48 AM, Michael Paquier <
> michael(dot)paquier(at)gmail(dot)com> wrote
>
>> On Mon, Jul 20, 2015 at 9:18 PM, Alexander Korotkov
>> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>> > attached patch allows pg_rewind to work when target timeline was
>> switched.
>> > Actually, this patch fixes TODO from pg_rewind comments.
>> >
>> > /*
>> > * Trace the history backwards, until we hit the target timeline.
>> > *
>> > * TODO: This assumes that there are no timeline switches on the
>> target
>> > * cluster after the fork.
>> > */
>> >
>> > This patch allows pg_rewind to handle data directory synchronization is
>> much
>> > more general way.
>> For instance, user can return promoted standby to old master.
>
>
+ /*
+ * Since incomplete segments are copied into next
timelines, find the
+ * lastest timeline holding required segment.
+ */
+ while (private->tliIndex < targetNentries - 1 &&
+ targetHistory[private->tliIndex].end <
targetSegEnd)
+ {
+ private->tliIndex++;
+ tli_index++;
+ }
It seems to me that the patch is able to handle timeline switches onwards,
but not backwards and this is what would be required to return a promoted
standby, that got switched to let's say timeline 2 when promoted, to an old
master, that is still on timeline 1. This code actually fails when scanning
for the last checkpoint before WAL forked as it will be on the timeline 1
of the old master. Imagine for example that the WAL has forked at
0/30XXXXX which is saved in segment 000000020000000000000003 (say 2/0/3) on
the promoted standby, and that the last checkpoint record is on 0/20XXXXX,
which is part of 000000010000000000000002 (1/0/2). I think that we should
scan 2/0/3 (not the partial segment 1/0/3), and then 1/0/2 when looking for
the last checkpoint record. Hence the startup index TLI should be set to
the highest timeline and should be decremented depending on what is in the
history file.
The code above looks correct to me when scanning the WAL history onwards
though, which is what is done when extracting the page map, but not
backwards when we try to find the last common checkpoint record. This code
actually fails trying to open 2/0/2 that does not exist in the promoted
standby's pg_xlog in my test case.
Attached is a small script I have used to reproduce the failure.
I think that the documentation needs a brush up as well to outline the fact
that pg_rewind would be able to put back as well a standby in a cluster,
after for example an operator mistake when promoting a node that should not
be.
Thoughts?
--
Michael
Attachment | Content-Type | Size |
---|---|---|
rewind_test.bash | application/octet-stream | 1.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2015-08-20 07:05:17 | Re: Declarative partitioning |
Previous Message | Tomas Vondra | 2015-08-20 03:19:27 | Re: DBT-3 with SF=20 got failed |