Re: Conflict detection and logging in logical replication

From: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: Re: Conflict detection and logging in logical replication
Date: 2024-08-13 03:57:23
Message-ID: CABdArM6gULXDHKwpuWWfLeHCpkrnbv4oOUw7igiW7ziPxLp5Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 5, 2024 at 10:05 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Aug 5, 2024 at 9:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Aug 2, 2024 at 6:28 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > >
> > > Performance tests done on the v8-0001 and v8-0002 patches, available at [1].
> > >
> >
> > Thanks for doing the detailed tests for this patch.
> >
> > > The purpose of the performance tests is to measure the impact on
> > > logical replication with track_commit_timestamp enabled, as this
> > > involves fetching the commit_ts data to determine
> > > delete_differ/update_differ conflicts.
> > >
> > > Fortunately, we did not see any noticeable overhead from the new
> > > commit_ts fetch and comparison logic. The only notable impact is
> > > potential overhead from logging conflicts if they occur frequently.
> > > Therefore, enabling conflict detection by default seems feasible, and
> > > introducing a new detect_conflict option may not be necessary.
> > >
> > ...
> > >
> > > Test 1: create conflicts on Sub using pgbench.
> > > ----------------------------------------------------------------
> > > Setup:
> > > - Both publisher and subscriber have pgbench tables created as-
> > > pgbench -p $node1_port postgres -qis 1
> > > - At Sub, a subscription created for all the changes from Pub node.
> > >
> > > Test Run:
> > > - To test, ran pgbench for 15 minutes on both nodes simultaneously,
> > > which led to concurrent updates and update_differ conflicts on the
> > > Subscriber node.
> > > Command used to run pgbench on both nodes-
> > > ./pgbench postgres -p 8833 -c 10 -j 3 -T 300 -P 20
> > >
> > > Results:
> > > For each case, note the “tps” and total time taken by the apply-worker
> > > on Sub to apply the changes coming from Pub.
> > >
> > > Case1: track_commit_timestamp = off, detect_conflict = off
> > > Pub-tps = 9139.556405
> > > Sub-tps = 8456.787967
> > > Time of replicating all the changes: 19min 28s
> > > Case 2 : track_commit_timestamp = on, detect_conflict = on
> > > Pub-tps = 8833.016548
> > > Sub-tps = 8389.763739
> > > Time of replicating all the changes: 20min 20s
> > >
> >
> > Why is there a noticeable tps (~3%) reduction in publisher TPS? Is it
> > the impact of track_commit_timestamp = on or something else?

When both the publisher and subscriber nodes are on the same machine,
we observe a decrease in the publisher's TPS in case when
'track_commit_timestamp' is ON for the subscriber. Testing on pgHead
(without the patch) also showed a similar reduction in the publisher's
TPS.

Test Setup: The test was conducted with the same setup as Test-1.

Results:
Case 1: pgHead - 'track_commit_timestamp' = OFF
- Pub TPS: 9306.25
- Sub TPS: 8848.91
Case 2: pgHead - 'track_commit_timestamp' = ON
- Pub TPS: 8915.75
- Sub TPS: 8667.12

On pgHead too, there was a ~400tps reduction in the publisher when
'track_commit_timestamp' was enabled on the subscriber.

Additionally, code profiling of the walsender on the publisher showed
that the overhead in Case-2 was mainly in the DecodeCommit() call
stack, causing slower write operations, especially in
logicalrep_write_update() and OutputPluginWrite().

case1 : 'track_commit_timestamp' = OFF
--11.57%--xact_decode
| | DecodeCommit
| | ReorderBufferCommit
...
| | --6.10%--pgoutput_change
| | |
| | |--3.09%--logicalrep_write_update
| | ....
| | |--2.01%--OutputPluginWrite
| | |--1.97%--WalSndWriteData

case2: 'track_commit_timestamp' = ON
|--53.19%--xact_decode
| | DecodeCommit
| | ReorderBufferCommit
...
| | --30.25%--pgoutput_change
| | |
| | |--15.23%--logicalrep_write_update
| | ....
| | |--9.82%--OutputPluginWrite
| | |--9.57%--WalSndWriteData

-- In Case 2, the subscriber's process of writing timestamp data for
millions of rows appears to have impacted all write operations on the
machine.

To confirm the profiling results, we conducted the same test with the
publisher and subscriber on separate machines.

Results:
Case 1: 'track_commit_timestamp' = OFF
- Run 1: Pub TPS: 2144.10, Sub TPS: 2216.02
- Run 2: Pub TPS: 2159.41, Sub TPS: 2233.82

Case 2: 'track_commit_timestamp' = ON
- Run 1: Pub TPS: 2174.39, Sub TPS: 2226.89
- Run 2: Pub TPS: 2148.92, Sub TPS: 2224.80

Note: The machines used in this test were not as powerful as the one
used in the earlier tests, resulting in lower overall TPS (~2k vs.
~8-9k).
However, the results show no significant reduction in the publisher's
TPS, indicating minimal impact when the nodes are run on separate
machines.

> Was track_commit_timestamp enabled only on subscriber (as needed) or
> on both publisher and subscriber? Nisha, can you please confirm from
> your logs?

Yes, track_commit_timestamp was enabled only on the subscriber.

> > > Case3: track_commit_timestamp = on, detect_conflict = off
> > > Pub-tps = 8886.101726
> > > Sub-tps = 8374.508017
> > > Time of replicating all the changes: 19min 35s
> > > Case 4: track_commit_timestamp = off, detect_conflict = on
> > > Pub-tps = 8981.924596
> > > Sub-tps = 8411.120808
> > > Time of replicating all the changes: 19min 27s
> > >
> > > **The difference of TPS between each case is small. While I can see a
> > > slight increase of the replication time (about 5%), when enabling both
> > > track_commit_timestamp and detect_conflict.
> > >
> >
> > The difference in TPS between case 1 and case 2 is quite visible.
> > IIUC, the replication time difference is due to the logging of
> > conflicts, right?
> >

Right, the major difference is due to the logging of conflicts.

--
Thanks,
Nisha

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-08-13 04:03:44 Re: Logical Replication of sequences
Previous Message Peter Smith 2024-08-13 03:49:26 Re: Logical Replication of sequences