From: | Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "m(dot)usama(at)gmail(dot)com" <m(dot)usama(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "sulamul(at)gmail(dot)com" <sulamul(at)gmail(dot)com>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "ildar(at)adjust(dot)com" <ildar(at)adjust(dot)com>, "horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "chris(dot)travers(at)adjust(dot)com" <chris(dot)travers(at)adjust(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "ishii(at)sraoss(dot)co(dot)jp" <ishii(at)sraoss(dot)co(dot)jp> |
Subject: | Re: Transactions involving multiple postgres foreign servers, take 2 |
Date: | 2021-07-05 06:28:58 |
Message-ID: | c043e14f-2c63-786f-9284-fdf8d6760835@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2021/06/30 10:05, Masahiko Sawada wrote:
> On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>>
>> Hi Jamison-san, sawada-san,
>>
>> Thanks for testing!
>>
>> FWIF, I tested using pgbench with "--rate=" option to know the server
>> can execute transactions with stable throughput. As sawada-san said,
>> the latest patch resolved second phase of 2PC asynchronously. So,
>> it's difficult to control the stable throughput without "--rate=" option.
>>
>> I also worried what I should do when the error happened because to increase
>> "max_prepared_foreign_transaction" doesn't work. Since too overloading may
>> show the error, is it better to add the case to the HINT message?
>>
>> BTW, if sawada-san already develop to run the resolver processes in parallel,
>> why don't you measure performance improvement? Although Robert-san,
>> Tunakawa-san and so on are discussing what architecture is best, one
>> discussion point is that there is a performance risk if adopting asynchronous
>> approach. If we have promising solutions, I think we can make the discussion
>> forward.
>
> Yeah, if we can asynchronously resolve the distributed transactions
> without worrying about max_prepared_foreign_transaction error, it
> would be good. But we will need synchronous resolution at some point.
> I think we at least need to discuss it at this point.
>
> I've attached the new version patch that incorporates the comments
> from Fujii-san and Ikeda-san I got so far. We launch a resolver
> process per foreign server, committing prepared foreign transactions
> on foreign servers in parallel. To get a better performance based on
> the current architecture, we can have multiple resolver processes per
> foreign server but it seems not easy to tune it in practice. Perhaps
> is it better if we simply have a pool of resolver processes and we
> assign a resolver process to the resolution of one distributed
> transaction one by one? That way, we need to launch resolver processes
> as many as the concurrent backends using 2PC.
Thanks for updating the patches.
I have tested in my local laptop and summary is the following.
(1) The latest patch(v37) can improve throughput by 1.5 times compared to v36.
Although I expected it improves by 2.0 times because the workload is that one
transaction access two remote servers... I think the reason is that the disk
is bottleneck and I couldn't prepare disks for each postgresql servers. If I
could, I think the performance can be improved by 2.0 times.
(2) The latest patch(v37) throughput of foreign_twophase_commit = required is
about 36% compared to the case if foreign_twophase_commit = disabled.
Although the throughput is improved, the absolute performance is not good. It
may be the fate of 2PC. I think the reason is that the number of WAL writes is
much increase and, the disk writes in my laptop is the bottleneck. I want to
know the result testing in richer environments if someone can do so.
(3) The latest patch(v37) has no overhead if foreign_twophase_commit =
disabled. On the contrary, the performance improved by 3%. It may be within
the margin of error.
The test detail is following.
# condition
* 1 coordinator and 3 foreign servers
* 4 instance shared one ssd disk.
* one transaction queries different two foreign servers.
``` fxact_update.pgbench
\set id random(1, 1000000)
\set partnum 3
\set p1 random(1, :partnum)
\set p2 ((:p1 + 1) % :partnum) + 1
BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```
* pgbench generates load. I increased ${RATE} little by little until "maximum
number of foreign transactions reached" error happens.
```
pgbench -f fxact_update.pgbench -R ${RATE} -c 8 -j 8 -T 180
```
* parameters
max_prepared_transactions = 100
max_prepared_foreign_transactions = 200
max_foreign_transaction_resolvers = 4
# test source code patterns
1. 2pc patches(v36) based on 6d0eb385 (foreign_twophase_commit = required).
2. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = required).
3. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = disabled).
4. 2595e039 without 2pc patches(v37).
# results
1. tps = 241.8000TPS
latency average = 10.413ms
2. tps = 359.017519 ( by 1.5 times compared to 1. by 0.36% compared to 3.)
latency average = 15.427ms
3. tps = 987.372220 ( by 1.03% compared to 4. )
latency average = 8.102ms
4. tps = 955.984574
latency average = 8.368ms
The disk is the bottleneck in my environment because disk util is almost 100%
in every pattern. If disks for each instance can be prepared, I think we can
expect more performance improvements.
>> In my understanding, there are three improvement idea. First is that to make
>> the resolver processes run in parallel. Second is that to send "COMMIT/ABORT
>> PREPARED" remote servers in bulk. Third is to stop syncing the WAL
>> remove_fdwxact() after resolving is done, which I addressed in the mail sent
>> at June 3rd, 13:56. Since third idea is not yet discussed, there may
>> be my misunderstanding.
>
> Yes, those optimizations are promising. On the other hand, they could
> introduce complexity to the code and APIs. I'd like to keep the first
> version simple. I think we need to discuss them at this stage but can
> leave the implementation of both parallel execution and batch
> execution as future improvements.
OK, I agree.
> For the third idea, I think the implementation was wrong; it removes
> the state file then flushes the WAL record. I think these should be
> performed in the reverse order. Otherwise, FdwXactState entry could be
> left on the standby if the server crashes between them. I might be
> missing something though.
Oh, I see. I think you're right though what you wanted to say is that it
flushes the WAL records then removes the state file. If "COMMIT/ABORT
PREPARED" statements execute in bulk, it seems enough to sync the wal only
once, then remove all related state files.
BTW, I tested the binary building with -O2, and I got the following warnings.
It's needed to be fixed.
```
fdwxact.c: In function 'PrepareAllFdwXacts':
fdwxact.c:897:13: warning: 'flush_lsn' may be used uninitialized in this
function [-Wmaybe-uninitialized]
897 | canceled = SyncRepWaitForLSN(flush_lsn, false);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
From | Date | Subject | |
---|---|---|---|
Next Message | Yura Sokolov | 2021-07-05 06:36:27 | Re: rand48 replacement |
Previous Message | Emre Hasegeli | 2021-07-05 06:27:50 | Re: Yet another fast GiST build |