RE: Perform streaming logical transactions by background workers and parallel apply

From: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-10-27 02:34:24
Message-ID: OSZPR01MB631065ECB16022359116FDE6FD339@OSZPR01MB6310.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 26, 2022 7:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Oct 25, 2022 at 8:38 AM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Oct 21, 2022 at 6:32 PM houzj(dot)fnst(at)fujitsu(dot)com
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > I've started to review this patch. I tested v40-0001 patch and have
> > one question:
> >
> > IIUC even when most of the changes in the transaction are filtered out
> > in pgoutput (eg., by relation filter or row filter), the walsender
> > sends STREAM_START. This means that the subscriber could end up
> > launching parallel apply workers also for almost empty (and streamed)
> > transactions. For example, I created three subscriptions each of which
> > subscribes to a different table. When I loaded a large amount of data
> > into one table, all three (leader) apply workers received START_STREAM
> > and launched their parallel apply workers.
> >
>
> The apply workers will be launched just the first time then we
> maintain a pool so that we don't need to restart them.
>
> > However, two of them
> > finished without applying any data. I think this behaviour looks
> > problematic since it wastes workers and rather decreases the apply
> > performance if the changes are not large. Is it worth considering a
> > way to delay launching a parallel apply worker until we find out the
> > amount of changes is actually large?
> >
>
> I think even if changes are less there may not be much difference
> because we have observed that the performance improvement comes from
> not writing to file.
>
> > For example, the leader worker
> > writes the streamed changes to files as usual and launches a parallel
> > worker if the amount of changes exceeds a threshold or the leader
> > receives the second segment. After that, the leader worker switches to
> > send the streamed changes to parallel workers via shm_mq instead of
> > files.
> >
>
> I think writing to file won't be a good idea as that can hamper the
> performance benefit in some cases and not sure if it is worth.
>

I tried to test some cases that only a small part of the transaction or an empty
transaction is sent to subscriber, to see if using streaming parallel will bring
performance degradation.

The test was performed ten times, and the average was taken.
The results are as follows. The details and the script of the test is attached.

10% of rows are sent
----------------------------------
HEAD 24.4595
patched 18.4545

5% of rows are sent
----------------------------------
HEAD 21.244
patched 17.9655

0% of rows are sent
----------------------------------
HEAD 18.0605
patched 17.893

It shows that when only 5% or 10% of rows are sent to subscriber, using parallel
apply takes less time than HEAD, and even if all rows are filtered there's no
performance degradation.

Regards
Shi yu

Attachment Content-Type Size
script.zip application/x-zip-compressed 3.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2022-10-27 02:41:55 Re: generic plans and "initial" pruning
Previous Message Michael Paquier 2022-10-27 02:33:48 Re: GUC values - recommended way to declare the C variables?