From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Christopher Browne <cbbrowne(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com> |
Subject: | Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node |
Date: | 2012-06-20 13:02:28 |
Message-ID: | CA+TgmobFriNxSVZdcFqXK5QE9Rhv9MPkFGNkpdRvU8FMFZuMcg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 20, 2012 at 5:15 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> As I said before, I definitely agree that we want to have a separate transport
> format once we have decoding nailed down. We still need to ship wal around if
> the decoding happens in a different instance, but *after* that it can be
> shipped in something more convenient/appropriate.
Right, OK, we agree on this then.
> One bit is fine if you have only very simple replication topologies. Once you
> think about globally distributed databases its a bit different. You describe
> some of that below, but just to reiterate:
> Imagine having 6 nodes, 3 on one of two continents (ABC in north america, DEF
> in europe). You may only want to have full intercontinental interconnect
> between two of those (say A and D). If you only have one bit to represent the
> origin thats not going to work because you won't be able discern the changes
> from BC on A from the changes from those originating on DEF.
I don't see the problem. A certainly knows via which link the LCRs arrived.
So: change happens on A. A sends the change to B, C, and D. B and C
apply the change. One bit is enough to keep them from regenerating
new LCRs that get sent back to A. So they're fine. D also receives
the changes (from A) and applies them, but it also does not need to
regenerate LCRs. Instead, it can take the LCRs that it has already
got (from A) and send those to E and F.
Or: change happens on B. B sends the changes to A. Since A knows the
network topology, it sends the changes to C and D. D sends them to E
and F. Nobody except B needs to *generate* LCRs. All any other node
needs to do is suppress *redundant* LCR generation.
> Another topology which is interesting is circular replications (i.e. changes
> get shipped A->B, B->C, C->A) which is a sensible topology if you only have a
> low change rate and a relatively high number of nodes because you don't need
> the full combinatorial amount of connections.
I think this one is OK too. You just generate LCRs on the origin node
and then pass them around the ring at every step. When the next hop
would be the origin node then you're done.
I think you may be imagining that A generates LCRs and sends them to
B. B applies them, and then from the WAL just generated, it produces
new LCRs which then get sent to C. If you do that, then, yes,
everything that you need to disentangle various network topologies
must be present in WAL. But what I'm saying is: don't do it like
that. Generate the LCRs just ONCE, at the origin node, and then pass
them around the network, applying them at every node. Then, the
information that is needed in WAL is confined to one bit: the
knowledge of whether or not a particular transaction is local (and
thus LCRs should be generated) or non-local (and thus they shouldn't,
because the origin already generated them and thus we're just handing
them around to apply everywhere).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2012-06-20 13:12:00 | Re: [PATCH 04/16] Add embedded list interface (header only) |
Previous Message | Robert Haas | 2012-06-20 12:51:30 | Re: [PATCH 04/16] Add embedded list interface (header only) |