Re: Conflict detection and logging in logical replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Conflict detection and logging in logical replication
Date: 2024-08-05 04:35:01
Message-ID: CAJpy0uDmp26MdGKJSR7H9mFSpB7jFuSypPU7HoZGcU1fiNbnUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 5, 2024 at 9:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Aug 2, 2024 at 6:28 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > Performance tests done on the v8-0001 and v8-0002 patches, available at [1].
> >
>
> Thanks for doing the detailed tests for this patch.
>
> > The purpose of the performance tests is to measure the impact on
> > logical replication with track_commit_timestamp enabled, as this
> > involves fetching the commit_ts data to determine
> > delete_differ/update_differ conflicts.
> >
> > Fortunately, we did not see any noticeable overhead from the new
> > commit_ts fetch and comparison logic. The only notable impact is
> > potential overhead from logging conflicts if they occur frequently.
> > Therefore, enabling conflict detection by default seems feasible, and
> > introducing a new detect_conflict option may not be necessary.
> >
> ...
> >
> > Test 1: create conflicts on Sub using pgbench.
> > ----------------------------------------------------------------
> > Setup:
> > - Both publisher and subscriber have pgbench tables created as-
> > pgbench -p $node1_port postgres -qis 1
> > - At Sub, a subscription created for all the changes from Pub node.
> >
> > Test Run:
> > - To test, ran pgbench for 15 minutes on both nodes simultaneously,
> > which led to concurrent updates and update_differ conflicts on the
> > Subscriber node.
> > Command used to run pgbench on both nodes-
> > ./pgbench postgres -p 8833 -c 10 -j 3 -T 300 -P 20
> >
> > Results:
> > For each case, note the “tps” and total time taken by the apply-worker
> > on Sub to apply the changes coming from Pub.
> >
> > Case1: track_commit_timestamp = off, detect_conflict = off
> > Pub-tps = 9139.556405
> > Sub-tps = 8456.787967
> > Time of replicating all the changes: 19min 28s
> > Case 2 : track_commit_timestamp = on, detect_conflict = on
> > Pub-tps = 8833.016548
> > Sub-tps = 8389.763739
> > Time of replicating all the changes: 20min 20s
> >
>
> Why is there a noticeable tps (~3%) reduction in publisher TPS? Is it
> the impact of track_commit_timestamp = on or something else?

Was track_commit_timestamp enabled only on subscriber (as needed) or
on both publisher and subscriber? Nisha, can you please confirm from
your logs?

> > Case3: track_commit_timestamp = on, detect_conflict = off
> > Pub-tps = 8886.101726
> > Sub-tps = 8374.508017
> > Time of replicating all the changes: 19min 35s
> > Case 4: track_commit_timestamp = off, detect_conflict = on
> > Pub-tps = 8981.924596
> > Sub-tps = 8411.120808
> > Time of replicating all the changes: 19min 27s
> >
> > **The difference of TPS between each case is small. While I can see a
> > slight increase of the replication time (about 5%), when enabling both
> > track_commit_timestamp and detect_conflict.
> >
>
> The difference in TPS between case 1 and case 2 is quite visible.
> IIUC, the replication time difference is due to the logging of
> conflicts, right?
>
> > Test2: create conflict using a manual script
> > ----------------------------------------------------------------
> > - To measure the precise time taken by the apply-worker in all cases,
> > create a test with a table having 10 million rows.
> > - To record the total time taken by the apply-worker, dump the
> > current time in the logfile for apply_handle_begin() and
> > apply_handle_commit().
> >
> > Setup:
> > Pub : has a table ‘perf’ with 10 million rows.
> > Sub : has the same table ‘perf’ with its own 10 million rows (inserted
> > by 1000 different transactions). This table is subscribed for all
> > changes from Pub.
> >
> > Test Run:
> > At Pub: run UPDATE on the table ‘perf’ to update all its rows in a
> > single transaction. (this will lead to update_differ conflict for all
> > rows on Sub when enabled).
> > At Sub: record the time(from log file) taken by the apply-worker to
> > apply all updates coming from Pub.
> >
> > Results:
> > Below table shows the total time taken by the apply-worker
> > (apply_handle_commit Time - apply_handle_begin Time ).
> > (Two test runs for each of the four cases)
> >
> > Case1: track_commit_timestamp = off, detect_conflict = off
> > Run1 - 2min 42sec 579ms
> > Run2 - 2min 41sec 75ms
> > Case 2 : track_commit_timestamp = on, detect_conflict = on
> > Run1 - 6min 11sec 602ms
> > Run2 - 6min 25sec 179ms
> > Case3: track_commit_timestamp = on, detect_conflict = off
> > Run1 - 2min 34sec 223ms
> > Run2 - 2min 33sec 482ms
> > Case 4: track_commit_timestamp = off, detect_conflict = on
> > Run1 - 2min 35sec 276ms
> > Run2 - 2min 38sec 745ms
> >
> > ** In the case-2 when both track_commit_timestamp and detect_conflict
> > are enabled, the time taken by the apply-worker is ~140% higher.
> >
> > Test3: Case when no conflict is detected.
> > ----------------------------------------------------------------
> > To measure the time taken by the apply-worker when there is no
> > conflict detected. This test is to confirm if the time overhead in
> > Test1-Case2 is due to the new function GetTupleCommitTs() which
> > fetches the origin and timestamp information for each row in the table
> > before applying the update.
> >
> > Setup:
> > - The Publisher and Subscriber both have an empty table to start with.
> > - At Sub, the table is subscribed for all changes from Pub.
> > - At Pub: Insert 10 million rows and the same will be replicated to
> > the Sub table as well.
> >
> > Test Run:
> > At Pub: run an UPDATE on the table to update all rows in a single
> > transaction. (This will NOT hit the update_differ on Sub because now
> > all the tuples have the Pub’s origin).
> >
> > Results:
> > Case1: track_commit_timestamp = off, detect_conflict = off
> > Run1 - 2min 39sec 261ms
> > Run2 - 2min 30sec 95ms
> > Case 2 : track_commit_timestamp = on, detect_conflict = on
> > Run1 - 2min 38sec 985ms
> > Run2 - 2min 46sec 624ms
> > Case3: track_commit_timestamp = on, detect_conflict = off
> > Run1 - 2min 59sec 887ms
> > Run2 - 2min 34sec 336ms
> > Case 4: track_commit_timestamp = off, detect_conflict = on
> > Run1 - 2min 33sec 477min
> > Run2 - 2min 37sec 677ms
> >
> > Test Summary -
> > -- The duration for case-2 was reduced to 2-3 minutes, matching the
> > times of the other cases.
> > -- The test revealed that the overhead in case-2 was not due to
> > commit_ts fetching (GetTupleCommitTs).
> > -- The additional action in case-2 was the error logging of all 10
> > million update_differ conflicts.
> >
>
> According to me, this last point is key among all tests which will
> decide whether we should have a new subscription option like
> detect_conflict or not. I feel this is the worst case where all the
> row updates have conflicts and the majority of time is spent writing
> LOG messages. Now, for this specific case, if one wouldn't have
> enabled track_commit_timestamp then there would be no difference as
> seen in case-4. So, I don't see this as a reason to introduce a new
> subscription option like detect_conflicts, if one wants to avoid such
> an overhead, she shouldn't have enabled track_commit_timestamp in the
> first place to detect conflicts. Also, even without this, we would see
> similar overhead in the case of update/delete_missing where we LOG
> when the tuple to modify is not found.
>

Overall, it looks okay to get rid of the 'detect_conflict' parameter.
My only concern here is the purpose/use-cases of
'track_commit_timestamp'. Is the only purpose of enabling
'track_commit_timestamp' is to detect conflicts? I couldn't find much
in the doc on this. Can there be a case where a user wants to enable
'track_commit_timestamp' for any other purpose without enabling
subscription's conflict detection?

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-08-05 04:44:03 Re: Logical Replication of sequences
Previous Message vignesh C 2024-08-05 04:29:54 Re: Logical Replication of sequences