From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Greg Smith <greg(at)2ndquadrant(dot)com> |
Cc: | Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Synchronization levels in SR |
Date: | 2010-06-02 09:25:20 |
Message-ID: | 4C062380.8090108@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02/06/10 10:22, Greg Smith wrote:
> Heikki Linnakangas wrote:
>> The possibilities are endless... Your proposal above covers a pretty
>> good set of scenarios, but it's by no means complete. If we try to
>> solve everything the configuration will need to be written in a
>> Turing-complete Replication Description Language. We'll have to pick a
>> useful, easy-to-understand subset that covers the common scenarios. To
>> handle the more exotic scenarios, you can write a proxy that sits in
>> front of the master, and implements whatever rules you wish, with the
>> rules written in C.
>
> I was thinking about this a bit recently. As I see it, there are three
> fundamental parts of this:
>
> 1) We have a transaction that is being committed. The rest of the
> computations here are all relative to it.
Agreed.
> So in a 3 node case, the internal state table might look like this after
> a bit of data had been committed:
>
> node | location | state
> ----------------------------------
> a | local | fsync b | remote | recv
> c | remote | async
>
> This means that the local node has a fully persistent copy, but the best
> either remote one has done is received the data, it's not on disk at all
> yet at the remote data center. Still working its way through.
>
> 3) The decision about whether the data has been committed to enough
> places to be considered safe by the master is computed by a function
> that is passed this internal table as something like a SRF, and it
> returns a boolean. Once that returns true, saying it's satisfied, the
> transaction closes on the master and continues to percolate out from
> there. If it's false, we wait for another state change to come in and
> return to (2).
You can't implement "wait for X to ack the commit, but if that doesn't
happen in Y seconds, time out and return true anyway" with that.
> While exposing the local state and running this computation isn't free,
> in situations where there truly are remote nodes in here being
> communicated with the network overhead is going to dwarf that. If there
> were a fast path for the simplest cases and this complicated one for the
> rest, I think you could get the fully programmable behavior some people
> want using simple SQL, rather than having to write a new "Replication
> Description Language" or something so ambitious. This data about what's
> been replicated to where looks an awful lot like a set of rows you can
> operate on using features already in the database to me.
Yeah, if we want to provide full control over when a commit is
acknowledged to the client, there's certainly no reason we can't expose
that using a hook or something.
It's pretty scary to call a user-defined function at that point in
transaction. Even if we document that you must refrain from doing nasty
stuff like modifying tables in that function, it's still scary.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2010-06-02 09:28:58 | Re: obsolete comments in xlog.c |
Previous Message | Fujii Masao | 2010-06-02 07:39:47 | obsolete comments in xlog.c |