Re: Perform streaming logical transactions by background workers and parallel apply

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-10-28 00:47:08
Message-ID: CAD21AoALU3Pee9Hi673PVVNTr1Z_73cqCR5LvF3fw0Dgo_FHvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 27, 2022 at 11:34 AM shiy(dot)fnst(at)fujitsu(dot)com
<shiy(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wed, Oct 26, 2022 7:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Oct 25, 2022 at 8:38 AM Masahiko Sawada
> > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Oct 21, 2022 at 6:32 PM houzj(dot)fnst(at)fujitsu(dot)com
> > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > I've started to review this patch. I tested v40-0001 patch and have
> > > one question:
> > >
> > > IIUC even when most of the changes in the transaction are filtered out
> > > in pgoutput (eg., by relation filter or row filter), the walsender
> > > sends STREAM_START. This means that the subscriber could end up
> > > launching parallel apply workers also for almost empty (and streamed)
> > > transactions. For example, I created three subscriptions each of which
> > > subscribes to a different table. When I loaded a large amount of data
> > > into one table, all three (leader) apply workers received START_STREAM
> > > and launched their parallel apply workers.
> > >
> >
> > The apply workers will be launched just the first time then we
> > maintain a pool so that we don't need to restart them.
> >
> > > However, two of them
> > > finished without applying any data. I think this behaviour looks
> > > problematic since it wastes workers and rather decreases the apply
> > > performance if the changes are not large. Is it worth considering a
> > > way to delay launching a parallel apply worker until we find out the
> > > amount of changes is actually large?
> > >
> >
> > I think even if changes are less there may not be much difference
> > because we have observed that the performance improvement comes from
> > not writing to file.
> >
> > > For example, the leader worker
> > > writes the streamed changes to files as usual and launches a parallel
> > > worker if the amount of changes exceeds a threshold or the leader
> > > receives the second segment. After that, the leader worker switches to
> > > send the streamed changes to parallel workers via shm_mq instead of
> > > files.
> > >
> >
> > I think writing to file won't be a good idea as that can hamper the
> > performance benefit in some cases and not sure if it is worth.
> >
>
> I tried to test some cases that only a small part of the transaction or an empty
> transaction is sent to subscriber, to see if using streaming parallel will bring
> performance degradation.
>
> The test was performed ten times, and the average was taken.
> The results are as follows. The details and the script of the test is attached.
>
> 10% of rows are sent
> ----------------------------------
> HEAD 24.4595
> patched 18.4545
>
> 5% of rows are sent
> ----------------------------------
> HEAD 21.244
> patched 17.9655
>
> 0% of rows are sent
> ----------------------------------
> HEAD 18.0605
> patched 17.893
>
>
> It shows that when only 5% or 10% of rows are sent to subscriber, using parallel
> apply takes less time than HEAD, and even if all rows are filtered there's no
> performance degradation.

Thank you for the testing!

I think this performance improvement comes from both applying changes
in parallel to receiving changes and avoiding writing a file. I'm
happy to know there is also a benefit also for small streaming
transactions. I've also measured the overhead when processing
streaming empty transactions and confirmed the overhead is negligible.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-10-28 01:24:23 Re: Allow file inclusion in pg_hba and pg_ident files
Previous Message Peter Smith 2022-10-28 00:38:50 Re: Support logical replication of DDLs