Re: Perform streaming logical transactions by background workers and parallel apply

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-05-03 07:16:41
Message-ID: CAHut+PvPwTQ_GpVMw3JgZzhWcwTrAiT-_5M-XzEO7GUnHH8hGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 3, 2022 at 2:15 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, May 2, 2022 at 5:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, May 2, 2022 at 6:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, May 2, 2022 at 11:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > Are you planning to support "Transaction dependency" Amit mentioned in
> > > > his first mail in this patch? IIUC since the background apply worker
> > > > applies the streamed changes as soon as receiving them from the main
> > > > apply worker, a conflict that doesn't happen in the current streaming
> > > > logical replication could happen.
> > > >
> > >
> > > This patch seems to be waiting for stream_stop to finish, so I don't
> > > see how the issues related to "Transaction dependency" can arise? What
> > > type of conflict/issues you have in mind?
> >
> > Suppose we set both publisher and subscriber:
> >
> > On publisher:
> > create table test (i int);
> > insert into test values (0);
> > create publication test_pub for table test;
> >
> > On subscriber:
> > create table test (i int primary key);
> > create subscription test_sub connection '...' publication test_pub; --
> > value 0 is replicated via initial sync
> >
> > Now, both 'test' tables have value 0.
> >
> > And suppose two concurrent transactions are executed on the publisher
> > in following order:
> >
> > TX-1:
> > begin;
> > insert into test select generate_series(0, 10000); -- changes will be streamed;
> >
> > TX-2:
> > begin;
> > delete from test where c = 0;
> > commit;
> >
> > TX-1:
> > commit;
> >
> > With the current streaming logical replication, these changes will be
> > applied successfully since the deletion is applied before the
> > (streamed) insertion. Whereas with the apply bgworker, it fails due to
> > an unique constraint violation since the insertion is applied first.
> > I've confirmed that it happens with v5 patch.
> >
>
> Good point but I am not completely sure if doing transaction
> dependency tracking for such cases is really worth it. I feel for such
> concurrent cases users can anyway now also get conflicts, it is just a
> matter of timing. One more thing to check transaction dependency, we
> might need to spill the data for streaming transactions in which case
> we might lose all the benefits of doing it via a background worker. Do
> we see any simple way to avoid this?
>

Avoiding unexpected differences like this is why I suggested the
option should have to be explicitly enabled instead of being on by
default as it is in the current patch. See my review comment #14 [1].
It means the user won't have to change their existing code as a
workaround.

------
[1] https://www.postgresql.org/message-id/CAHut%2BPuqYP5eD5wcSCtk%3Da6KuMjat2UCzqyGoE7sieCaBsVskQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2022-05-03 07:46:10 Re: failures in t/031_recovery_conflict.pl on CI
Previous Message Amul Sul 2022-05-03 06:50:43 Re: Proposal for internal Numeric to Uint64 conversion function.