Re: Conflict detection for update_deleted in logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2025-01-08 10:33:05
Message-ID: CAD21AoBEpr8RHYzXyn3udWgb4fRkqqaSKup4PxWPqSofHSNvnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 8, 2025 at 1:53 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jan 8, 2025 at 3:02 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > >
> > > Here is further performance test analysis with v16 patch-set.
> > >
> > >
> > > In the test scenarios already shared on -hackers [1], where pgbench was run only on the publisher node in a pub-sub setup, no performance degradation was observed on either node.
> > >
> > >
> > >
> > > In contrast, when pgbench was run only on the subscriber side with detect_update_deleted=on [2], the TPS performance was reduced due to dead tuple accumulation. This performance drop depended on the wal_receiver_status_interval—larger intervals resulted in more dead tuple accumulation on the subscriber node. However, after the improvement in patch v16-0002, which dynamically tunes the status request, the default TPS reduction was limited to only 1%.
> > >
> > >
> > >
> > > We performed more benchmarks with the v16-patches where pgbench was run on both the publisher and subscriber, focusing on TPS performance. To summarize the key observations:
> > >
> > > - No performance impact on the publisher as dead tuple accumulation does not occur on the publisher.
> >
> > Nice. It means that frequently getting in-commit-phase transactions by
> > the subscriber didn't have a negative impact on the publisher's
> > performance.
> >
> > >
> > > - The performance is reduced on the subscriber side (TPS reduction (~50%) [3] ) due to dead tuple retention for the conflict detection when detect_update_deleted=on.
> > >
> > > - Performance reduction happens only on the subscriber side, as workload on the publisher is pretty high and the apply workers must wait for the amount of transactions with earlier timestamps to be applied and flushed before advancing the non-removable XID to remove dead tuples.
> >
> > Assuming that the performance dip happened due to dead tuple retention
> > for the conflict detection, would TPS on other databases also be
> > affected?
> >
>
> As we use slot->xmin to retain dead tuples, shouldn't the impact be
> global (means on all databases)?

I think so too.

>
> > >
> > >
> > > [3] Test with pgbench run on both publisher and subscriber.
> > >
> > >
> > >
> > > Test setup:
> > >
> > > - Tests performed on pgHead + v16 patches
> > >
> > > - Created a pub-sub replication system.
> > >
> > > - Parameters for both instances were:
> > >
> > >
> > >
> > > share_buffers = 30GB
> > >
> > > min_wal_size = 10GB
> > >
> > > max_wal_size = 20GB
> > >
> > > autovacuum = false
> >
> > Since you disabled autovacuum on the subscriber, dead tuples created
> > by non-hot updates are accumulated anyway regardless of
> > detect_update_deleted setting, is that right?
> >
>
> I think hot-pruning mechanism during the update operation will remove
> dead tuples even when autovacuum is disabled.

True, but why did it disable autovacuum? It seems that
case1-2_setup.sh doesn't specify fillfactor, which makes hot-updates
less likely to happen.

I understand that a certain performance dip happens due to dead tuple
retention, which is fine, but I'm surprised that the TPS decreased by
50% within 120 seconds. The TPS goes even worse for a longer test? I
did a quick benchmark where I completely disabled removing dead tuples
(by autovacuum=off and a logical slot) and ran pgbench but I didn't
see such a precipitous dip.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2025-01-08 10:42:09 Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py
Previous Message Matthias van de Meent 2025-01-08 10:32:10 Re: pg_settings.unit and DefineCustomXXXVariable