Re: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "m(dot)usama(at)gmail(dot)com" <m(dot)usama(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "sulamul(at)gmail(dot)com" <sulamul(at)gmail(dot)com>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "ildar(at)adjust(dot)com" <ildar(at)adjust(dot)com>, "horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "chris(dot)travers(at)adjust(dot)com" <chris(dot)travers(at)adjust(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "ishii(at)sraoss(dot)co(dot)jp" <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2021-07-09 11:36:30
Message-ID: CAD21AoAyJnnuEt1i0X=aQ7aBFNqCgujToi_U3f8VzNS2ayLU0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for the late reply.

On Mon, Jul 5, 2021 at 3:29 PM Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2021/06/30 10:05, Masahiko Sawada wrote:
> > On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
> >>
> >> Hi Jamison-san, sawada-san,
> >>
> >> Thanks for testing!
> >>
> >> FWIF, I tested using pgbench with "--rate=" option to know the server
> >> can execute transactions with stable throughput. As sawada-san said,
> >> the latest patch resolved second phase of 2PC asynchronously. So,
> >> it's difficult to control the stable throughput without "--rate=" option.
> >>
> >> I also worried what I should do when the error happened because to increase
> >> "max_prepared_foreign_transaction" doesn't work. Since too overloading may
> >> show the error, is it better to add the case to the HINT message?
> >>
> >> BTW, if sawada-san already develop to run the resolver processes in parallel,
> >> why don't you measure performance improvement? Although Robert-san,
> >> Tunakawa-san and so on are discussing what architecture is best, one
> >> discussion point is that there is a performance risk if adopting asynchronous
> >> approach. If we have promising solutions, I think we can make the discussion
> >> forward.
> >
> > Yeah, if we can asynchronously resolve the distributed transactions
> > without worrying about max_prepared_foreign_transaction error, it
> > would be good. But we will need synchronous resolution at some point.
> > I think we at least need to discuss it at this point.
> >
> > I've attached the new version patch that incorporates the comments
> > from Fujii-san and Ikeda-san I got so far. We launch a resolver
> > process per foreign server, committing prepared foreign transactions
> > on foreign servers in parallel. To get a better performance based on
> > the current architecture, we can have multiple resolver processes per
> > foreign server but it seems not easy to tune it in practice. Perhaps
> > is it better if we simply have a pool of resolver processes and we
> > assign a resolver process to the resolution of one distributed
> > transaction one by one? That way, we need to launch resolver processes
> > as many as the concurrent backends using 2PC.
>
> Thanks for updating the patches.
>
> I have tested in my local laptop and summary is the following.

Thank you for testing!

>
> (1) The latest patch(v37) can improve throughput by 1.5 times compared to v36.
>
> Although I expected it improves by 2.0 times because the workload is that one
> transaction access two remote servers... I think the reason is that the disk
> is bottleneck and I couldn't prepare disks for each postgresql servers. If I
> could, I think the performance can be improved by 2.0 times.
>
>
> (2) The latest patch(v37) throughput of foreign_twophase_commit = required is
> about 36% compared to the case if foreign_twophase_commit = disabled.
>
> Although the throughput is improved, the absolute performance is not good. It
> may be the fate of 2PC. I think the reason is that the number of WAL writes is
> much increase and, the disk writes in my laptop is the bottleneck. I want to
> know the result testing in richer environments if someone can do so.
>
>
> (3) The latest patch(v37) has no overhead if foreign_twophase_commit =
> disabled. On the contrary, the performance improved by 3%. It may be within
> the margin of error.
>
>
>
> The test detail is following.
>
> # condition
>
> * 1 coordinator and 3 foreign servers
>
> * 4 instance shared one ssd disk.
>
> * one transaction queries different two foreign servers.
>
> ``` fxact_update.pgbench
> \set id random(1, 1000000)
>
> \set partnum 3
> \set p1 random(1, :partnum)
> \set p2 ((:p1 + 1) % :partnum) + 1
>
> BEGIN;
> UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
> UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
> COMMIT;
> ```
>
> * pgbench generates load. I increased ${RATE} little by little until "maximum
> number of foreign transactions reached" error happens.
>
> ```
> pgbench -f fxact_update.pgbench -R ${RATE} -c 8 -j 8 -T 180
> ```
>
> * parameters
> max_prepared_transactions = 100
> max_prepared_foreign_transactions = 200
> max_foreign_transaction_resolvers = 4
>
>
> # test source code patterns
>
> 1. 2pc patches(v36) based on 6d0eb385 (foreign_twophase_commit = required).
> 2. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = required).
> 3. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = disabled).
> 4. 2595e039 without 2pc patches(v37).
>
>
> # results
>
> 1. tps = 241.8000TPS
> latency average = 10.413ms
>
> 2. tps = 359.017519 ( by 1.5 times compared to 1. by 0.36% compared to 3.)
> latency average = 15.427ms
>
> 3. tps = 987.372220 ( by 1.03% compared to 4. )
> latency average = 8.102ms
>
> 4. tps = 955.984574
> latency average = 8.368ms
>
> The disk is the bottleneck in my environment because disk util is almost 100%
> in every pattern. If disks for each instance can be prepared, I think we can
> expect more performance improvements.

It seems still not good performance. I'll also test using your script.

>
>
> >> In my understanding, there are three improvement idea. First is that to make
> >> the resolver processes run in parallel. Second is that to send "COMMIT/ABORT
> >> PREPARED" remote servers in bulk. Third is to stop syncing the WAL
> >> remove_fdwxact() after resolving is done, which I addressed in the mail sent
> >> at June 3rd, 13:56. Since third idea is not yet discussed, there may
> >> be my misunderstanding.
> >
> > Yes, those optimizations are promising. On the other hand, they could
> > introduce complexity to the code and APIs. I'd like to keep the first
> > version simple. I think we need to discuss them at this stage but can
> > leave the implementation of both parallel execution and batch
> > execution as future improvements.
>
> OK, I agree.
>
>
> > For the third idea, I think the implementation was wrong; it removes
> > the state file then flushes the WAL record. I think these should be
> > performed in the reverse order. Otherwise, FdwXactState entry could be
> > left on the standby if the server crashes between them. I might be
> > missing something though.
>
> Oh, I see. I think you're right though what you wanted to say is that it
> flushes the WAL records then removes the state file. If "COMMIT/ABORT
> PREPARED" statements execute in bulk, it seems enough to sync the wal only
> once, then remove all related state files.
>
>
> BTW, I tested the binary building with -O2, and I got the following warnings.
> It's needed to be fixed.
>
> ```
> fdwxact.c: In function 'PrepareAllFdwXacts':
> fdwxact.c:897:13: warning: 'flush_lsn' may be used uninitialized in this
> function [-Wmaybe-uninitialized]
> 897 | canceled = SyncRepWaitForLSN(flush_lsn, false);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ```

Thank you for the report. I'll fix it in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-07-09 11:38:36 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Georgios 2021-07-09 11:26:58 Introduce pg_receivewal gzip compression tests