RE: Transactions involving multiple postgres foreign servers, take 2

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: 'Fujii Masao' <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Muhammad Usama <m(dot)usama(at)gmail(dot)com>, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: RE: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-09-10 01:13:08
Message-ID: TYAPR01MB2990DBB9A8281DCA014FFA1CFE270@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexey-san, Sawada-san,
cc: Fujii-san,

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
> But if we
> implement 2PC as the improvement on FDW independently from PostgreSQL
> sharding, I think that it's necessary to support other FDW. And this is our
> direction, isn't it?

I understand the same way as Fujii san. 2PC FDW is itself useful, so I think we should pursue the tidy FDW interface and good performance withinn the FDW framework. "tidy" means that many other FDWs should be able to implement it. I guess XA/JTA is the only material we can use to consider whether the FDW interface is good.

> Sawada-san's patch supports that case by implememnting some conponents
> for that also in PostgreSQL core. For example, with the patch, all the remote
> transactions that participate at the transaction are managed by PostgreSQL
> core instead of postgres_fdw layer.
>
> Therefore, at least regarding the difference 2), I think that Sawada-san's
> approach is better. Thought?

I think so. Sawada-san's patch needs to address the design issues I posed before digging into the code for thorough review, though.

BTW, is there something Sawada-san can take from Alexey-san's patch? I'm concerned about the performance for practical use. Do you two have differences in these points, for instance? The first two items are often cited to evaluate the algorithm's performance, as you know.

* The number of round trips to remote nodes.
* The number of disk I/Os on each node and all nodes in total (WAL, two-phase file, pg_subtrans file, CLOG?).
* Are prepare and commit executed in parallel on remote nodes? (serious DBMSs do so)
* Is there any serialization point in the processing? (Sawada-san's has one)

I'm sorry to repeat myself, but I don't think we can compromise the 2PC performance. Of course, we recommend users to design a schema that co-locates data that each transaction accesses to avoid 2PC, but it's not always possible (e.g., when secondary indexes are used.)

Plus, as the following quote from TPC-C specification shows, TPC-C requires 15% of (Payment?) transactions to do 2PC. (I knew this on Microsoft, CockroachDB, or Citus Data's site.)

--------------------------------------------------
Independent of the mode of selection, the customer resident
warehouse is the home warehouse 85% of the time and is a randomly selected remote warehouse 15% of the time.
This can be implemented by generating two random numbers x and y within [1 .. 100];

. If x <= 85 a customer is selected from the selected district number (C_D_ID = D_ID) and the home warehouse
number (C_W_ID = W_ID). The customer is paying through his/her own warehouse.

. If x > 85 a customer is selected from a random district number (C_D_ID is randomly selected within [1 .. 10]),
and a random remote warehouse number (C_W_ID is randomly selected within the range of active
warehouses (see Clause 4.2.2), and C_W_ID ≠ W_ID). The customer is paying through a warehouse and a
district other than his/her own.
--------------------------------------------------

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2020-09-10 01:31:31 Re: extension patch of CREATE OR REPLACE TRIGGER
Previous Message Tom Lane 2020-09-09 23:24:55 Re: SIGQUIT handling, redux