Re: [PATCH 8/8] Introduce wal decoding via catalog timetravel

From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>,"Greg Stark" <stark(at)mit(dot)edu>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Peter Geoghegan" <peter(at)2ndquadrant(dot)com>,"Andres Freund" <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, hlinnakangas(at)vmware(dot)com
Subject: Re: [PATCH 8/8] Introduce wal decoding via catalog timetravel
Date: 2012-10-22 14:17:01
Message-ID: 20121022141701.224550@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> Greg Stark <stark(at)mit(dot)edu> wrote:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Isn't there an even more serious problem, namely that this
>>> assumes *all* transactions are serializable?

Do you mean in terms of the serializable transaction isolation level,
or something else?

I haven't read the patches, but I've been trying to follow the
discussion and I don't recall any hint of basing this on serializable
transactions on each source. Of course, when it comes down to
commits, both where a change is committed and where the work is
copied, there must be a commit order; and with asynchronous work
where data isn't partitioned such that there is a single clear owner
for each partition there will be conflicts which must be resolved.  I
don't get the impression that this point has been lost on Simon and
Andres.

>>> What happens when they aren't? Or even just that the effective
>>> commit order is not XID order?
>>
>> Firstly, I haven't read the code but I'm confident it doesn't make
>> the elementary error of assuming commit order == xid order. I
>> assume it's applying the reassembled transactions in commit order.

Same here.

>> I don't think it assumes the transactions are serializable because
>> it's only concerned with writes, not reads. So the transaction
>> it's replaying may or may not have been able to view the data
>> written by other transactions that commited earlier but it doesn't
>> matter when trying to reproduce the effects using constants.

IIRC, the developers of this feature have explicitly said that they
will defer any consideration of trying to extend serializable
transaction isolation behavior to a multi-server basis until after
they have other things working. (Frankly, to do otherwise would not
be sane.) It appears to me that it can't be managed in a general
sense without destroying almost all the advantages of multi-master
replication, at least (as I said before) where data isn't partitioned
such that there is a single clear owner for each partition.

Where such partitioning is present and there are data sets maintained
exclusively by serializable transactions, anomaly-free reads of the
data could be accomplished by committing transactions on the replicas
in "apparent order of execution" rather than "commit order". Apparent
order of execution must take both commit order and read-write
dependencies into consideration.

>> The data written by this transaction is either written or not when
>> the commit happens and it's all written or not at that time. Even
>> in non-serializable mode updates take row locks and nobody can see
>> the data or modify it until the transaction commits.

As with read-only transactions and hot standbys, the problem comes in
when a transaction commits and is replicated while a transction
remains uncommitted which is basing its updates on the earlier state
of the data. It gets even more exciting with MMR since the
transaction working with the old version of the data might be on a
different machine, on another continent. With certain types of
workloads, it seems to me that it could get pretty crazy if certain
closely-related actions are not kept within a single database (like
controlling the active batch and adding items to a batch).

In the "wild, half-baked, hand-wavey suggestions" department -- maybe
there should be some consideration of a long-term way within MMR to
direct activities to certain logical nodes, each of which could be
mapped to a single physical node at any one time. Basically, to route
a request through the MMR network to the current logical node for
handling something, and have the effects ripple back out through all
nodes.

> This uses Commit Serializability, which is valid, as you say.

Well, it is a type of concurrency control with a name and a
definition, if that's what you mean. I agree it provides sufficient
guarantees to create a workable MMR system, if you have adequate
conflict resolution. My concern is that it not be confused with
serializability in the mathematical sense or in the sense of
transaction isolation levels.

In general on this thread, when I've seen the terms "serializable"
and "serializability" I haven't been clear on whether the words are
being used in their more general sense as words in the English
language, or in a more particular technical sense.

-Kevin

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-10-22 14:18:08 Re: Successor of MD5 authentication, let's use SCRAM
Previous Message Robert Haas 2012-10-22 13:36:46 Re: Deprecating RULES