Re: The plan for FDW-based sharding

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The plan for FDW-based sharding
Date: 2016-02-27 19:14:39
Message-ID: 56D1F59F.1060204@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Neither pg_dtm, neither pg_tsdtm supports serializable isolation level.
We implemented distributed snapshot isolation - repeatable-read isolation level.
We also do not support read-committed isolation level now.

We do not try to preserve transaction commit order at all nodes.
But in principle it can be implemented using XTM API: it allows to redefine function which actually sets transaction status. pg_dtm performs 2PC here.
And in principle it is possible to enforce commits in any particular order.

Concerning CSNs, may be you are right and it is not correct to use this notion in this case. Actually there are many "CSNs" involved in transaction commit.
First of all each transaction is assigned local CSN (timestamp) when it is ready to commit. Then CSNs of all nodes are exchanged and maximal CSN is chosen.
This maximum is writen as final transaction CSN and is used in visibility check.

On 02/27/2016 01:48 AM, Kevin Grittner wrote:
> On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
>> pg_tsdtm is based on another approach: it is using system time
>> as CSN
> Which brings up an interesting point, if we want logical
> replication to be free of serialization anomalies for those using
> serializable transactions, we need to support applying transactions
> in an order which may not be the same as commit order -- CSN (as
> such) would be the wrong thing. If serializable transaction 1 (T1)
> modifies a row and concurrent serializable transaction 2 (T2) reads
> the old version of the row, and modifies something based on that,
> T2 must be applied to a logical replica first even if T1 commits
> before it; otherwise the logical replica could see a state not
> consistent with business rules and which could not have been seen
> (due to SSI) on the source database. Any DTM API which does not
> support some mechanism to rearrange the order of transactions from
> commit order to some other order (based on, for example, read-write
> dependencies) is not complete. If it does support that, it gives
> us a way forward for presenting consistent data on logical
> replicas.
>
> To avoid confusion, it might be best to reserve CSN for actual
> commit sequence numbers, or at least values which increase
> monotonically with each commit. The term of art for what I
> described above is "apparent order of execution", so maybe we want
> to use AOE or AOoE for the order we choose to use in a particular
> implementation. It doesn't seem to me to be outright inaccurate
> for cases where the system time on the various systems is used.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yury Zhuravlev 2016-02-27 19:20:16 Re: PostgreSQL extension API? Documentation?
Previous Message Álvaro Hernández Tortosa 2016-02-27 19:11:35 Re: PostgreSQL extension API? Documentation?