From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, fazool mein <fazoolmein(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Synchronous replication - patch status inquiry |
Date: | 2010-09-02 10:24:05 |
Message-ID: | AANLkTi=JJSMD0DBiCNP9sWmqWkjmdkwLfdf33RdSgLgY@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 1, 2010 at 7:23 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> That requirement falls out from the handling of disconnected standbys. If a
> standby is not connected, what does the master do with commits? If the
> answer is anything else than acknowledge them to the client immediately, as
> if the standby never existed, the master needs to know what standby servers
> exist. Otherwise it can't know if all the standbys are connected or not.
Thanks. I understood why the registration is required.
> I'd like to keep this as simple as possible, yet flexible so that with
> enough scripting and extensions, you can get all sorts of behavior. I think
> quorum commit falls into the "extension" category; if you're setup is
> complex enough, it's going to be impossible to represent that in our config
> files no matter what. But if you write a little proxy, you can implement
> arbitrary rules there.
Agreed.
> I think recv/fsync/replay should be specified in the standby. It has no
> direct effect on the master, the master would just relay the setting to the
> standby when it connects, or the standby would send multiple XLogRecPtrs and
> let the master decide when the WAL is persistent enough.
The latter seems wasteful since the master uses only one XLogRecPtr even if
the standby sends multiple ones. So I prefer the former design. Which also
makes the code and design very simple, and we can easily write the proxy.
> "sync vs async" on the other hand should be specified in the master, because
> it has a direct impact on the behavior of commits in the master.
>
> I propose a configuration file standbys.conf, in the master:
>
> # STANDBY NAME SYNCHRONOUS TIMEOUT
> importantreplica yes 100ms
> tempcopy no 10s
Seems good. In fact, instead of yes/no, async/recv/fsync/replay is specified
in SYNCHRONOUS field?
OTOH, something like standby_name parameter should be introduced in
recovery.conf.
We should allow multiple standbys with the same name? Probably yes.
We might need to add NUMBER field into the standbys.conf, in the future.
> Yeah, though of course you might want to set that per-standby too..
Yep.
> Let's step back a bit and ask what would be the simplest thing that you
> could call "synchronous replication" in good conscience, and also be useful
> at least to some people. Let's leave out the "down" mode, because that
> requires registration. We'll probably have to do registration at some point,
> but let's take as small steps as possible.
Agreed.
> Without the "down" mode in the master, frankly I don't see the point of the
> "recv" and "fsync" levels in the standby. Either way, when the master
> acknowledges a commit to the client, you don't know if it has made it to the
> standby yet because the replication connection might be down for some
> reason.
True. We cannot know whether the standby can be brought up to the master
without any data loss when the master crashes, because the standby might
be disconnected before for some reasons and not have some latest data.
But the situation would be the same even when 'replay' mode is chosen.
Though we might be able to check whether the latest transaction has
replicated to the standby by running read only query to the standby,
it's actually difficult to do that. How can we know the content of the
latest transaction?
Also even when 'recv' or 'fsync' is chosen, we might be able to check
that by doing pg_last_xlog_receive_location() on the standby. But the
similar question occurs to me: How can we know the LSN of the latest
transaction?
I'm thinking to introduce new parameter specifying the command which
is executed when the standby is disconnected. This command is executed
by walsender before resuming the transaction processings which have
been suspended by the disconnection. For example, if STONISH against
the standby is supplied as the command, we can prevent the standby not
having the latest data from becoming the master by forcibly shutting
such a delayed standby down. Thought?
> That leaves us the 'replay' mode, which *is* useful, because it gives you
> the guarantee that when the master acknowledges a commit, it will appear
> committed in all hot standby servers that are currently connected. With that
> guarantee you can build a reliable cluster with something pgpool-II where
> all writes go to one node, and reads are distributed to multiple nodes.
I'm concerned that the conflict by read-only query and recovery might
harm the performance on the master in 'replay' mode. If the conflict
occurs, all running transactions on the master have to wait for it to
disappear, and which can take very long. Of course, wihtout the conflict,
waiting until the standby has received, fsync'd, read and replayed WAL
would take long. So I'd like to support also 'recv' and 'fsync'.
I believe that it's not complicated and difficult to implement those
two modes.
> I'm not sure what we should aim for in the first phase. But if you want as
> little code as possible yet have something useful, I think 'replay' mode
> with no standby registration is the way to go.
What about recv/fsync/replay mode with no standby registration?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2010-09-02 10:54:52 | Re: register/unregister standby Re: Synchronous replication |
Previous Message | Thom Brown | 2010-09-02 10:23:12 | Re: register/unregister standby Re: Synchronous replication |