Re: Make default subscription streaming option as Parallel

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Make default subscription streaming option as Parallel
Date: 2024-10-21 15:09:53
Message-ID: CALDaNm1Wz91NZwMUgypuMNi9BQFVLT4Dg1TtO4zHo3nR+4ZRZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 21 Oct 2024 at 14:36, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Oct 7, 2024 at 11:05 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > The tests demonstrate a significant performance improvement when using
> > the parallel streaming option, insert shows 40-48 %improvement, delete
> > shows 34-39 %improvement, update shows 26-30 %improvement. In the case
> > of rollback the improvement is between 12-44%, the improvement
> > slightly reduces with larger amounts of data being rolled back in this
> > case. If there's a significant amount of data to roll back, the
> > performance of streaming in parallel may be comparable to or slightly
> > lower in some instances. However, this is acceptable since commit
> > operations are generally more frequent than rollback operations.
> >
> > One key point to consider is that the lock on transaction objects will
> > be held for a longer duration when using streaming in parallel. This
> > occurs because the parallel apply worker initiates the transaction as
> > soon as streaming begins, maintaining the lock until the transaction
> > is fully completed. As a result, for long-running transactions, this
> > extended lock can hinder concurrent access that requires a lock.
> >
>
> The longer-running transactions will anyway have a risk of deadlocks
> or longer waits if there are concurrent operations on the subscribers.
> However, with parallel apply, there is a risk of deadlock among the
> leader and parallel workers when the schema in publisher and
> subscriber is different. Say the subscriber has a unique constraint
> that the publisher doesn't have. See the comments in this regard atop
> applyparallelworker.c in the "Locking Considerations" section. Having
> said that, the apply workers will detect deadlock in such cases and
> will retry to apply the errored-out transaction. So, there is a
> self-healing in-built mechanism and in such cases, we anyway have a
> risk of UNIQUE_KEY conflict ERRORs which in most cases would require
> manual intervention.
>
> > Since there is a significant percentage improvement, we should make
> > the default subscription streaming option parallel.
> >
>
> This makes sense to me. I think it would be better to add a Note or
> Warning in the docs for the risk of deadlock when the schema of
> publisher and subscriber is not the same even though such cases should
> be less.

Yes this can happen like scenarios below(with deadlock_timeout = 10ms):
Publisher:
CREATE TABLE t1(c1 int);
create publication pub1 for table t1;

Subscriber has an addition index on the table:
CREATE TABLE t1(c1 int);
CREATE UNIQUE INDEX idx1 on t1(c1);
Create subscription ...;

Publisher:
Session1:
Begin;
INSERT INTO t1 SELECT i FROM generate_series(1, 5000) s(i);

Session2:
-- Insert the record that is already inserted in session1
INSERT INTO t1 value(1);

Session1:
Commit;

In this case a deadlock will occur.

Attached v3 version patch has a caution added for the same.

Regards,
Vignesh

Attachment Content-Type Size
v3-0001-Make-default-value-for-susbcription-streaming-opt.patch text/x-patch 21.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2024-10-21 15:11:16 Re: New "raw" COPY format
Previous Message Joel Jacobson 2024-10-21 14:37:52 Re: [PATCH] Add array_reverse() function