Re: Switching timeline over streaming replication

From: "md(at)rpzdesign(dot)com" <md(at)rpzdesign(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: 'Pg Hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Switching timeline over streaming replication
Date: 2012-09-25 18:01:27
Message-ID: 5061F177.5050703@rpzdesign.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit:

At some point, every master - slave replicator gets to the point where
they need
to start thinking about master-master replication.

Instead of getting stuck in the weeds to finally realize that
master-master is the ONLY way
to go, many developers do not start out planning for master - master,
but they should, out of habit.

You can save yourself a lot of grief just be starting with master-master
architecture.

But you don't have to USE it, you can just not send WRITE traffic to the
servers that you do
not want to WRITE to, but all of them should be WRITE servers. That way,
the only timeline
you ever need is your decision to send WRITE traffic request to them,
but there is nothing
that prevents you from running MASTER - MASTER all the time and skip the
whole slave thing
entirely.

At this point, I think synchronous replication is only for immediate
local replication needs
and async for all the master - master stuff.

cheers,

marco

On 9/24/2012 9:44 PM, Amit Kapila wrote:
>> On Monday, September 24, 2012 9:08 PM md(at)rpzdesign(dot)com wrote:
>> What a disaster waiting to happen. Maybe the only replication should be
>> master-master replication
>> so there is no need to sequence timelines or anything, all servers are
>> ready masters, no backups or failovers.
>> If you really do not want a master serving, then it should only be
>> handled in the routing
>> of traffic to that server and not the replication logic itself. The
>> only thing that ever came about
>> from failovers was the failure to turn over. The above is opinion
>> only.
> This feature is for users who want to use master-standby configurations.
>
> What do you mean by :
> "then it should only be handled in the routing of traffic to that server
> and not the replication logic itself."
>
> Do you have any idea other than proposed implementation or do you see any
> problem in currently proposed solution?
>
>
>> On 9/24/2012 7:33 AM, Amit Kapila wrote:
>>> On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:
>>>> I've been working on the often-requested feature to handle timeline
>>>> changes over streaming replication. At the moment, if you kill the
>>>> master and promote a standby server, and you have another standby
>>>> server that you'd like to keep following the new master server, you
>>>> need a WAL archive in addition to streaming replication to make it
>>>> cross the timeline change. Streaming replication will just error
>> out.
>>>> Having a WAL archive is usually a good idea in complex replication
>>>> scenarios anyway, but it would be good to not require it.
>>> Confirm my understanding of this feature:
>>>
>>> This feature is for case when standby-1 who is going to be promoted
>> to
>>> master has archive mode 'on'.
>>> As in that case only its timeline will change.
>>>
>>> If above is right, then there can be other similar scenario's where
>> it can
>>> be used:
>>>
>>> Scenario-1 (1 Master, 1 Stand-by)
>>> 1. Master (archive_mode=on) goes down.
>>> 2. Master again comes up
>>> 3. Stand-by tries to follow it
>>>
>>> Now in above scenario also due to timeline mismatch it gives error,
>> but your
>>> patch should fix it.
>>>
>>>
>>>> Some parts of this patch are just refactoring that probably make
>> sense
>>>> regardless of the new functionality. For example, I split off the
>>>> timeline history file related functions to a new file, timeline.c.
>>>> That's not very much code, but it's fairly isolated, and xlog.c is
>>>> massive, so I feel that anything that we can move off from xlog.c is
>> a
>>>> good thing. I also moved off the two functions RestoreArchivedFile()
>>>> and ExecuteRecoveryCommand(), to a separate file. Those are also not
>>>> much code, but are fairly isolated. If no-one objects to those
>> changes,
>>>> and the general direction this work is going to, I'm going split off
>>>> those refactorings to separate patches and commit them separately.
>>>>
>>>> I also made the timeline history file a bit more detailed: instead
>> of
>>>> recording just the WAL segment where the timeline was changed, it
>> now
>>>> records the exact XLogRecPtr. That was required for the walsender to
>>>> know the switchpoint, without having to parse the XLOG records (it
>>>> reads and parses the history file, instead)
>>> IMO separating timeline history file related functions to a new file
>> is
>>> good.
>>> However I am not sure about splitting for RestoreArchivedFile() and
>>> ExecuteRecoveryCommand() into separate file.
>>> How about splitting for all Archive related functions:
>>> static void XLogArchiveNotify(const char *xlog);
>>> static void XLogArchiveNotifySeg(XLogSegNo segno);
>>> static bool XLogArchiveCheckDone(const char *xlog);
>>> static bool XLogArchiveIsBusy(const char *xlog);
>>> static void XLogArchiveCleanup(const char *xlog);
>>> ..
>>> ..
>>>
>>> In any case, it will be better if you can split it into multiple
>> patches:
>>> 1. Having new functionality of "Switching timeline over streaming
>>> replication"
>>> 2. Refactoring related changes.
>>>
>>> It can make my testing and review for new feature patch little
>> easier.
>>> With Regards,
>>> Amit Kapila.
>>>
>>>
>>>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Satoshi Nagayasu 2012-09-25 18:28:49 Re: New statistics for WAL buffer dirty writes
Previous Message Martijn van Oosterhout 2012-09-25 17:28:25 Re: Oid registry