Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Daniel Farina <daniel(at)heroku(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Date: 2012-06-19 17:24:13
Message-ID: CA+Tgmoa0SbgMi39jeMHLbqxhWWojWVgOHiPLwtZeBH4QM3QGTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 19, 2012 at 12:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
>>>> ...  (If you are thinking
>>>> of something sufficiently high-level that merging could possibly work,
>>>> then it's not WAL, and we shouldn't be trying to make the WAL
>>>> representation cater for it.)
>
>> Do you really see this as such a big problem?
>
> It looks suspiciously like "I have a hammer, therefore every problem
> must be a nail".  I don't like the design concept of cramming logical
> replication records into WAL in the first place.

Me, neither. I think it's necessary to try to find a way of
generating logical replication records from WAL. But once generated,
I think those records should form their own stream, independent of
WAL. If you take the contrary position that they should be included
in WAL, then when you filter the WAL stream down to just the records
of interest to logical replication, you end up with a WAL stream with
holes in it, which is one of the things that Andres listed as an
unresolved design problem in his original email.

Moreover, this isn't necessary at all for single-master replication,
or even multi-source replication where each table has a single master.
It's only necessary for full multi-master replication, which we have
no consensus to include in core, and even if we did have a consensus
to include it in core, it certainly shouldn't be the first feature we
design.

> However, if we're dead set on doing it that way, let us put information
> that is only relevant to logical replication records into only the
> logical replication records.

Right. If we decide we need this, and if we did decide to conflate
the WAL stream, both of which I disagree with as noted above, then we
still don't need it on every record. It would probably be sufficient
for local transactions to do nothing at all (and we can implicitly
assume that they have master node ID = local node ID) and transactions
which are replaying remote changes to emit one record per XID per
checkpoint cycle containing the remote node ID.

> Saving a couple bytes in each such record
> is penny-wise and pound-foolish, I'm afraid; especially when you're
> nailing down hard, unexpansible limits at the very beginning of the
> development process in order to save those bytes.

I completely agree. I think that, as Dan said upthread, having a 64
or 128 bit ID so that it can be generated automatically rather than
configured by an administrator who must be careful not to duplicate
node IDs in any pair of systems that could ever end up talking to each
other would be a vast usability improvement. Perhaps systems A, B,
and C are replicating to each other today, as are systems D and E.
But now suppose that someone decides they want to replicate one table
between A and D. Suddenly the node IDs have to be distinct where they
didn't before, and now there's potentially a problem to hassle with
that wouldn't have been an issue if the node IDs had been wide enough
to begin with. It is not unusual for people to decide after-the-fact
to begin replicating between machines where this wasn't originally
anticipated and which may even be even be under different
administrative control.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-06-19 17:25:40 Re: initdb and fsync
Previous Message Jeff Davis 2012-06-19 17:22:02 Re: initdb and fsync