From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Make default subscription streaming option as Parallel |
Date: | 2024-10-21 15:09:53 |
Message-ID: | CALDaNm1Wz91NZwMUgypuMNi9BQFVLT4Dg1TtO4zHo3nR+4ZRZA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 21 Oct 2024 at 14:36, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Oct 7, 2024 at 11:05 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > The tests demonstrate a significant performance improvement when using
> > the parallel streaming option, insert shows 40-48 %improvement, delete
> > shows 34-39 %improvement, update shows 26-30 %improvement. In the case
> > of rollback the improvement is between 12-44%, the improvement
> > slightly reduces with larger amounts of data being rolled back in this
> > case. If there's a significant amount of data to roll back, the
> > performance of streaming in parallel may be comparable to or slightly
> > lower in some instances. However, this is acceptable since commit
> > operations are generally more frequent than rollback operations.
> >
> > One key point to consider is that the lock on transaction objects will
> > be held for a longer duration when using streaming in parallel. This
> > occurs because the parallel apply worker initiates the transaction as
> > soon as streaming begins, maintaining the lock until the transaction
> > is fully completed. As a result, for long-running transactions, this
> > extended lock can hinder concurrent access that requires a lock.
> >
>
> The longer-running transactions will anyway have a risk of deadlocks
> or longer waits if there are concurrent operations on the subscribers.
> However, with parallel apply, there is a risk of deadlock among the
> leader and parallel workers when the schema in publisher and
> subscriber is different. Say the subscriber has a unique constraint
> that the publisher doesn't have. See the comments in this regard atop
> applyparallelworker.c in the "Locking Considerations" section. Having
> said that, the apply workers will detect deadlock in such cases and
> will retry to apply the errored-out transaction. So, there is a
> self-healing in-built mechanism and in such cases, we anyway have a
> risk of UNIQUE_KEY conflict ERRORs which in most cases would require
> manual intervention.
>
> > Since there is a significant percentage improvement, we should make
> > the default subscription streaming option parallel.
> >
>
> This makes sense to me. I think it would be better to add a Note or
> Warning in the docs for the risk of deadlock when the schema of
> publisher and subscriber is not the same even though such cases should
> be less.
Yes this can happen like scenarios below(with deadlock_timeout = 10ms):
Publisher:
CREATE TABLE t1(c1 int);
create publication pub1 for table t1;
Subscriber has an addition index on the table:
CREATE TABLE t1(c1 int);
CREATE UNIQUE INDEX idx1 on t1(c1);
Create subscription ...;
Publisher:
Session1:
Begin;
INSERT INTO t1 SELECT i FROM generate_series(1, 5000) s(i);
Session2:
-- Insert the record that is already inserted in session1
INSERT INTO t1 value(1);
Session1:
Commit;
In this case a deadlock will occur.
Attached v3 version patch has a caution added for the same.
Regards,
Vignesh
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Make-default-value-for-susbcription-streaming-opt.patch | text/x-patch | 21.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2024-10-21 15:11:16 | Re: New "raw" COPY format |
Previous Message | Joel Jacobson | 2024-10-21 14:37:52 | Re: [PATCH] Add array_reverse() function |