From: | Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Patch for fail-back without fresh backup |
Date: | 2013-07-07 07:19:01 |
Message-ID: | CAD21AoBtMSmusSeWHvprWdzFKE5f8c=Yzmsi9OhC-joBLFLRHw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 17 June 2013 09:03, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>> I agree. We should probably find a better name for this. Any suggestions ?
>
> err, I already made one...
>
>>> But that's not the whole story. I can see some utility in a patch that
>>> makes all WAL transfer synchronous, rather than just commits. Some
>>> name like synchronous_transfer might be appropriate. e.g.
>>> synchronous_transfer = all | commit (default).
>
>> Since commits are more foreground in nature and this feature
>> does not require us to wait during common foreground activities, we want a
>> configuration where master can wait for synchronous transfers at other than
>> commits. May we can solve that by having more granular control to the said
>> parameter ?
>>
>>>
>>> The idea of another slew of parameters that are very similar to
>>> synchronous replication but yet somehow different seems weird. I can't
>>> see a reason why we'd want a second lot of parameters. Why not just
>>> use the existing ones for sync rep? (I'm surprised the Parameter
>>> Police haven't visited you in the night...) Sure, we might want to
>>> expand the design for how we specify multi-node sync rep, but that is
>>> a different patch.
>>
>>
>> How would we then distinguish between synchronous and the new kind of
>> standby ?
>
> That's not the point. The point is "Why would we have a new kind of
> standby?" and therefore why do we need new parameters?
>
>> I am told, one of the very popular setups for DR is to have one
>> local sync standby and one async (may be cascaded by the local sync). Since
>> this new feature is more useful for DR because taking a fresh backup on a
>> slower link is even more challenging, IMHO we should support such setups.
>
> ...which still doesn't make sense to me. Lets look at that in detail.
>
> Take 3 servers, A, B, C with A and B being linked by sync rep, and C
> being safety standby at a distance.
>
> Either A or B is master, except in disaster. So if A is master, then B
> would be the failover target. If A fails, then you want to failover to
> B. Once B is the target, you want to failback to A as the master. C
> needs to follow the new master, whichever it is.
>
> If you set up sync rep between A and B and this new mode between A and
> C. When B becomes the master, you need to failback from B from A, but
> you can't because the new mode applied between A and C only, so you
> have to failback from C to A. So having the new mode not match with
> sync rep means you are forcing people to failback using the slow link
> in the common case.
>
> You might observe that having the two modes match causes problems if A
> and B fail, so you are forced to go to C as master and then eventually
> failback to A or B across a slow link. That case is less common and
> could be solved by extending sync transfer to more/multi nodes.
>
> It definitely doesn't make sense to have sync rep on anything other
> than a subset of sync transfer. So while it may be sensible in the
> future to make sync transfer a superset of sync rep nodes, it makes
> sense to make them the same config for now.
I have updated the patch.
we support following 2 cases.
1. SYNC server and also make same failback safe standby server
2. ASYNC server and also make same failback safe standby server
1. changed name of parameter
give up 'failback_safe_standby_names' parameter from the first patch.
and changed name of parameter from 'failback_safe_mode ' to
'synchronous_transfer'.
this parameter accepts 'all', 'data_flush' and 'commit'.
-'commit'
'commit' means that master waits for corresponding WAL to flushed
to disk of standby server on commits.
but master doesn't waits for replicated data pages.
-'data_flush'
'data_flush' means that master waits for replicated data page
(e.g, CLOG, pg_control) before flush to disk of master server.
but if user set to 'data_flush' to this parameter,
'synchronous_commit' values is ignored even if user set
'synchronous_commit'.
-'all'
'all' means that master waits for replicated WAL and data page.
2. put SyncRepWaitForLSN() function into XLogFlush() function
we have put SyncRepWaitForLSN() function into XLogFlush() function,
and change argument of XLogFlush().
they are setup case and need to set parameters.
- SYNC server and also make same failback safe standgy server (case 1)
synchronous_transfer = all
synchronous_commit = remote_write/on
synchronous_standby_names = <ServerName>
- ASYNC server and also make same failback safe standgy server (case 2)
synchronous_transfer = data_flush
(synchronous_commit values is ignored)
- default SYNC replication
synchronous_transfer = commit
synchronous_commit = on
synchronous_standby_names = <ServerName>
- default ASYNC replication
synchronous_transfer = commit
ToDo
1. currently this patch supports synchronous transfer. so we can't set
different synchronous transfer mode to each server.
we need to improve the patch for support following cases.
- SYNC standby and make separate ASYNC failback safe standby
- ASYNC standby and make separate ASYNC failback safe standby
2. we have not measure performance yet. we need to measure perfomance.
please give me your feedback.
Regards,
-------
Sawada Masahiko
From | Date | Subject | |
---|---|---|---|
Next Message | Sawada Masahiko | 2013-07-07 07:27:37 | Re: Patch for fail-back without fresh backup |
Previous Message | Tom Dunstan | 2013-07-07 04:09:16 | Re: [HACKERS] JPA + enum == Exception |