From: | Dimitri Fontaine <dfontaine(at)hi-media(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Heikki Linnakangas <heikki(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Configuring synchronous replication |
Date: | 2010-09-17 09:10:40 |
Message-ID: | m2sk1868hb.fsf@hi-media.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> * Support multiple standbys with various synchronization levels.
>
> * What happens if a synchronous standby isn't connected at the moment?
> Return immediately vs. wait forever.
>
> * Per-transaction control. Some transactions are important, others are not.
>
> * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers
> can be seen as important special cases of this.
>
> * async, recv, fsync and replay levels of synchronization.
>
> So what should the user interface be like? Given the 1st and 2nd
> requirement, we need standby registration. If some standbys are important
> and others are not, the master needs to distinguish between them to be able
> to determine that a transaction is safely delivered to the important
> standbys.
Well the 1st point can be handled in a distributed fashion, where the
sync level is setup at the slave. Ditto for second point, you can get
the exact same behavior control attached to the quorum facility.
What I think you're description is missing is the implicit feature that
you want to be able to setup the "ignore-or-wait" failure behavior per
standby. I'm not sure we need that, or more precisely that we need to
have that level of detail in the master's setup.
Maybe what we need instead is a more detailed quorum facility, but as
you're talking about something similar later in the mail, let's follow
you.
> For per-transaction control, ISTM it would be enough to have a simple
> user-settable GUC like synchronous_commit. Let's call it
> "synchronous_replication_commit" for now. For non-critical transactions, you
> can turn it off. That's very simple for developers to understand and use. I
> don't think we need more fine-grained control than that at transaction
> level, in all the use cases I can think of you have a stream of important
> transactions, mixed with non-important ones like log messages that you want
> to finish fast in a best-effort fashion. I'm actually tempted to tie that to
> the existing synchronous_commit GUC, the use case seems exactly the
> same.
Well, that would be an over simplification. In my applications I set the
"sessions" transaction with synchronous_commit = off, but the business
transactions to synchronous_commit = on. Now, among those last, I have
backoffice editing and money transactions. I'm not willing to be forced
to endure the same performance penalty for both when I know the
distributed durability needs aren't the same.
> OTOH, if we do want fine-grained per-transaction control, a simple boolean
> or even an enum GUC doesn't really cut it. For truly fine-grained control
> you want to be able to specify exceptions like "wait until this is replayed
> in slave named 'reporting'" or 'don't wait for acknowledgment from slave
> named 'uk-server'". With standby registration, we can invent a syntax for
> specifying overriding rules in the transaction. Something like SET
> replication_exceptions = 'reporting=replay, uk-server=async'.
Then you want to be able to have more than one reporting server and need
only one of them at the "replay" level, but you don't need to know which
it is. Or on the contrary you have a failover server and you want to be
sure this one is at the replay level whatever happens.
Then you want topology flexibility: you need to be able to replace a
reporting server with another, ditto for the failover one.
Did I tell you my current thinking on how to tackle that yet? :) Using a
distributed setup, where each slave has a weight (several votes per
transaction) and a level offering would allow that I think.
Now something similar to your idea that I can see a need for is being
able to have a multi-part quorum target: when you currently say that you
want 2 votes for sync, you would be able to say you want 2 votes for
recv, 2 for fsync and 1 for replay. Remember that any slave is setup to
offer only one level of synchronicity but can offer multiple votes.
How this would look like in the setup? Best would be to register the
different service levels your application need. Time to bikeshed a
little?
sync_rep_services = {critical: recv=2, fsync=2, replay=1;
important: fsync=3;
reporting: recv=2, apply=1}
Well you get the idea, it could maybe get stored on a catalog somewhere
with nice SQL commands etc. The goal is then to be able to handle a much
simpler GUC in the application, sync_rep_service = important for
example. Reserved label would be off, the default value.
> For the control between async/recv/fsync/replay, I like to think in terms of
> a) asynchronous vs synchronous
> b) if it's synchronous, how synchronous is it? recv, fsync or replay?
Same here.
> I think it makes most sense to set sync vs. async in the master, and the
> level of synchronicity in the slave.
Yeah, exactly.
If you add a weight to each slave then a quorum commit, you don't change
the implementation complexity and you offer lot of setup flexibility. If
the slave sync-level and weight are SIGHUP, then it even become rather
easy to switch roles online or to add new servers or to organise a
maintenance window — the quorum to reach is a per-transaction GUC on the
master, too, right?
Regards,
--
dim
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2010-09-17 09:19:58 | Re: Configuring synchronous replication |
Previous Message | Simon Riggs | 2010-09-17 08:15:29 | Re: Configuring synchronous replication |
From | Date | Subject | |
---|---|---|---|
Next Message | KaiGai Kohei | 2010-09-17 09:15:11 | Re: ALTER TYPE extensions |
Previous Message | Simon Riggs | 2010-09-17 08:15:29 | Re: Configuring synchronous replication |