Re: Perform streaming logical transactions by background workers and parallel apply

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-05-02 09:09:36
Message-ID: CAA4eK1+4qAc8vbma3obtzrzJOTx_w-DJvBzxY9JuUG_uCP9OiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 2, 2022 at 11:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Apr 8, 2022 at 6:14 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Wednesday, April 6, 2022 1:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > In this email, I would like to discuss allowing streaming logical
> > > transactions (large in-progress transactions) by background workers
> > > and parallel apply in general. The goal of this work is to improve the
> > > performance of the apply work in logical replication.
> > >
> > > Currently, for large transactions, the publisher sends the data in
> > > multiple streams (changes divided into chunks depending upon
> > > logical_decoding_work_mem), and then on the subscriber-side, the apply
> > > worker writes the changes into temporary files and once it receives
> > > the commit, it read from the file and apply the entire transaction. To
> > > improve the performance of such transactions, we can instead allow
> > > them to be applied via background workers. There could be multiple
> > > ways to achieve this:
> > >
> > > Approach-1: Assign a new bgworker (if available) as soon as the xact's
> > > first stream came and the main apply worker will send changes to this
> > > new worker via shared memory. We keep this worker assigned till the
> > > transaction commit came and also wait for the worker to finish at
> > > commit. This preserves commit ordering and avoid writing to and
> > > reading from file in most cases. We still need to spill if there is no
> > > worker available. We also need to allow stream_stop to complete by the
> > > background worker to finish it to avoid deadlocks because T-1's
> > > current stream of changes can update rows in conflicting order with
> > > T-2's next stream of changes.
> > >
> >
> > Attach the POC patch for the Approach-1 of "Perform streaming logical
> > transactions by background workers". The patch is still a WIP patch as
> > there are serval TODO items left, including:
> >
> > * error handling for bgworker
> > * support for SKIP the transaction in bgworker
> > * handle the case when there is no more worker available
> > (might need spill the data to the temp file in this case)
> > * some potential bugs
>
> Are you planning to support "Transaction dependency" Amit mentioned in
> his first mail in this patch? IIUC since the background apply worker
> applies the streamed changes as soon as receiving them from the main
> apply worker, a conflict that doesn't happen in the current streaming
> logical replication could happen.
>

This patch seems to be waiting for stream_stop to finish, so I don't
see how the issues related to "Transaction dependency" can arise? What
type of conflict/issues you have in mind?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-05-02 09:35:00 Re: bogus: logical replication rows/cols combinations
Previous Message Pavel Stehule 2022-05-02 08:59:33 strange slow query - lost lot of time somewhere