Re: Make default subscription streaming option as Parallel

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Make default subscription streaming option as Parallel
Date: 2024-10-21 09:06:30
Message-ID: CAA4eK1Keu9jiJVCRac7a77+q29z8pGiJQocPO26uqAkdSEyVRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 7, 2024 at 11:05 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> The tests demonstrate a significant performance improvement when using
> the parallel streaming option, insert shows 40-48 %improvement, delete
> shows 34-39 %improvement, update shows 26-30 %improvement. In the case
> of rollback the improvement is between 12-44%, the improvement
> slightly reduces with larger amounts of data being rolled back in this
> case. If there's a significant amount of data to roll back, the
> performance of streaming in parallel may be comparable to or slightly
> lower in some instances. However, this is acceptable since commit
> operations are generally more frequent than rollback operations.
>
> One key point to consider is that the lock on transaction objects will
> be held for a longer duration when using streaming in parallel. This
> occurs because the parallel apply worker initiates the transaction as
> soon as streaming begins, maintaining the lock until the transaction
> is fully completed. As a result, for long-running transactions, this
> extended lock can hinder concurrent access that requires a lock.
>

The longer-running transactions will anyway have a risk of deadlocks
or longer waits if there are concurrent operations on the subscribers.
However, with parallel apply, there is a risk of deadlock among the
leader and parallel workers when the schema in publisher and
subscriber is different. Say the subscriber has a unique constraint
that the publisher doesn't have. See the comments in this regard atop
applyparallelworker.c in the "Locking Considerations" section. Having
said that, the apply workers will detect deadlock in such cases and
will retry to apply the errored-out transaction. So, there is a
self-healing in-built mechanism and in such cases, we anyway have a
risk of UNIQUE_KEY conflict ERRORs which in most cases would require
manual intervention.

> Since there is a significant percentage improvement, we should make
> the default subscription streaming option parallel.
>

This makes sense to me. I think it would be better to add a Note or
Warning in the docs for the risk of deadlock when the schema of
publisher and subscriber is not the same even though such cases should
be less.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Reshke 2024-10-21 09:30:24 Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Previous Message Aleksander Alekseev 2024-10-21 09:06:29 [PATCH] Add array_reverse() function