From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | hlinnaka(at)iki(dot)fi |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Cascading replication and recovery_target_timeline='latest' |
Date: | 2012-09-03 23:25:02 |
Message-ID: | CAHGQGwGrLAvWvV23VLJez4qjovPGaJNtJJ2o=9g_UTwjSQy8dg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 4, 2012 at 7:07 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 03.09.2012 10:43, Fujii Masao wrote:
>>
>> On Sat, Sep 1, 2012 at 2:32 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> On Fri, Aug 31, 2012 at 5:03 PM, Heikki Linnakangas<hlinnaka(at)iki(dot)fi>
>>> wrote:
>>>>
>>>> Aside from the missing locking, I wonder what that does to a cascaded
>>>>
>>>> standby. If there is an active walsender running while RecoveryTargetTLI
>>>> is
>>>> changed, I think what will happen is that the walsender will continue to
>>>> stream WAL from the old timeline, but because the startup process is now
>>>> actually replaying from a different timeline, the walsender will send
>>>> bogus
>>>> WAL to the standby.
>>>
>>>
>>> Good catch! That's really problem. To address that, we should terminate
>>> all cascading walsenders when the timeline history file is read and
>>> the recovery target timeline is changed?
>>
>>
>> This is not right fix. After terminating cascading walsenders, it
>> might take them
>> some time to come to an end, and during that time they might send bogus
>> WAL
>> from old timeline. Currently there is no safeguard against sending bogus
>> WAL
>> from old timeline. To implement such a safeguard, cascading walsender
>> needs
>> to know when the timeline is updated and which is the last valid WAL file
>> of
>> the timeline as the startup process knows. IOW, we need to change
>> cascading
>> walsenders so that they also read and understand the timeline history
>> files.
>> This is not easy fix at this stage (9.2.0 is about to be released...).
>>
>> So, as one idea, I'm thiking to just forbid cascading replication when
>> recovery_target_timeline is set to 'latest'. Thought?
>
>
> Hmm, I was thinking that when walsender gets the position it can send the
> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current
> recovery timeline. If it has changed, refuse to send the new WAL and
> terminate. That would be a fairly small change, it would just close the
> window between requesting walsenders to terminate and them actually
> terminating.
Yeah, sounds good. Could you implement the patch? If you don't have time,
I will....
Regards,
--
Fujii Masao
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2012-09-03 23:26:33 | Re: Cascading replication and recovery_target_timeline='latest' |
Previous Message | Andrew Dunstan | 2012-09-03 22:20:22 | Re: pg_upgrade del/rmdir path fix |