Re: Conflict detection and logging in logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: Re: Conflict detection and logging in logical replication
Date: 2024-08-05 03:48:55
Message-ID: CAA4eK1Jb7ipDpRpqDg-zBehe1rNj4vv5gLsQjo9OvLGc9+ZKsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 2, 2024 at 6:28 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>
> Performance tests done on the v8-0001 and v8-0002 patches, available at [1].
>

Thanks for doing the detailed tests for this patch.

> The purpose of the performance tests is to measure the impact on
> logical replication with track_commit_timestamp enabled, as this
> involves fetching the commit_ts data to determine
> delete_differ/update_differ conflicts.
>
> Fortunately, we did not see any noticeable overhead from the new
> commit_ts fetch and comparison logic. The only notable impact is
> potential overhead from logging conflicts if they occur frequently.
> Therefore, enabling conflict detection by default seems feasible, and
> introducing a new detect_conflict option may not be necessary.
>
...
>
> Test 1: create conflicts on Sub using pgbench.
> ----------------------------------------------------------------
> Setup:
> - Both publisher and subscriber have pgbench tables created as-
> pgbench -p $node1_port postgres -qis 1
> - At Sub, a subscription created for all the changes from Pub node.
>
> Test Run:
> - To test, ran pgbench for 15 minutes on both nodes simultaneously,
> which led to concurrent updates and update_differ conflicts on the
> Subscriber node.
> Command used to run pgbench on both nodes-
> ./pgbench postgres -p 8833 -c 10 -j 3 -T 300 -P 20
>
> Results:
> For each case, note the “tps” and total time taken by the apply-worker
> on Sub to apply the changes coming from Pub.
>
> Case1: track_commit_timestamp = off, detect_conflict = off
> Pub-tps = 9139.556405
> Sub-tps = 8456.787967
> Time of replicating all the changes: 19min 28s
> Case 2 : track_commit_timestamp = on, detect_conflict = on
> Pub-tps = 8833.016548
> Sub-tps = 8389.763739
> Time of replicating all the changes: 20min 20s
>

Why is there a noticeable tps (~3%) reduction in publisher TPS? Is it
the impact of track_commit_timestamp = on or something else?

> Case3: track_commit_timestamp = on, detect_conflict = off
> Pub-tps = 8886.101726
> Sub-tps = 8374.508017
> Time of replicating all the changes: 19min 35s
> Case 4: track_commit_timestamp = off, detect_conflict = on
> Pub-tps = 8981.924596
> Sub-tps = 8411.120808
> Time of replicating all the changes: 19min 27s
>
> **The difference of TPS between each case is small. While I can see a
> slight increase of the replication time (about 5%), when enabling both
> track_commit_timestamp and detect_conflict.
>

The difference in TPS between case 1 and case 2 is quite visible.
IIUC, the replication time difference is due to the logging of
conflicts, right?

> Test2: create conflict using a manual script
> ----------------------------------------------------------------
> - To measure the precise time taken by the apply-worker in all cases,
> create a test with a table having 10 million rows.
> - To record the total time taken by the apply-worker, dump the
> current time in the logfile for apply_handle_begin() and
> apply_handle_commit().
>
> Setup:
> Pub : has a table ‘perf’ with 10 million rows.
> Sub : has the same table ‘perf’ with its own 10 million rows (inserted
> by 1000 different transactions). This table is subscribed for all
> changes from Pub.
>
> Test Run:
> At Pub: run UPDATE on the table ‘perf’ to update all its rows in a
> single transaction. (this will lead to update_differ conflict for all
> rows on Sub when enabled).
> At Sub: record the time(from log file) taken by the apply-worker to
> apply all updates coming from Pub.
>
> Results:
> Below table shows the total time taken by the apply-worker
> (apply_handle_commit Time - apply_handle_begin Time ).
> (Two test runs for each of the four cases)
>
> Case1: track_commit_timestamp = off, detect_conflict = off
> Run1 - 2min 42sec 579ms
> Run2 - 2min 41sec 75ms
> Case 2 : track_commit_timestamp = on, detect_conflict = on
> Run1 - 6min 11sec 602ms
> Run2 - 6min 25sec 179ms
> Case3: track_commit_timestamp = on, detect_conflict = off
> Run1 - 2min 34sec 223ms
> Run2 - 2min 33sec 482ms
> Case 4: track_commit_timestamp = off, detect_conflict = on
> Run1 - 2min 35sec 276ms
> Run2 - 2min 38sec 745ms
>
> ** In the case-2 when both track_commit_timestamp and detect_conflict
> are enabled, the time taken by the apply-worker is ~140% higher.
>
> Test3: Case when no conflict is detected.
> ----------------------------------------------------------------
> To measure the time taken by the apply-worker when there is no
> conflict detected. This test is to confirm if the time overhead in
> Test1-Case2 is due to the new function GetTupleCommitTs() which
> fetches the origin and timestamp information for each row in the table
> before applying the update.
>
> Setup:
> - The Publisher and Subscriber both have an empty table to start with.
> - At Sub, the table is subscribed for all changes from Pub.
> - At Pub: Insert 10 million rows and the same will be replicated to
> the Sub table as well.
>
> Test Run:
> At Pub: run an UPDATE on the table to update all rows in a single
> transaction. (This will NOT hit the update_differ on Sub because now
> all the tuples have the Pub’s origin).
>
> Results:
> Case1: track_commit_timestamp = off, detect_conflict = off
> Run1 - 2min 39sec 261ms
> Run2 - 2min 30sec 95ms
> Case 2 : track_commit_timestamp = on, detect_conflict = on
> Run1 - 2min 38sec 985ms
> Run2 - 2min 46sec 624ms
> Case3: track_commit_timestamp = on, detect_conflict = off
> Run1 - 2min 59sec 887ms
> Run2 - 2min 34sec 336ms
> Case 4: track_commit_timestamp = off, detect_conflict = on
> Run1 - 2min 33sec 477min
> Run2 - 2min 37sec 677ms
>
> Test Summary -
> -- The duration for case-2 was reduced to 2-3 minutes, matching the
> times of the other cases.
> -- The test revealed that the overhead in case-2 was not due to
> commit_ts fetching (GetTupleCommitTs).
> -- The additional action in case-2 was the error logging of all 10
> million update_differ conflicts.
>

According to me, this last point is key among all tests which will
decide whether we should have a new subscription option like
detect_conflict or not. I feel this is the worst case where all the
row updates have conflicts and the majority of time is spent writing
LOG messages. Now, for this specific case, if one wouldn't have
enabled track_commit_timestamp then there would be no difference as
seen in case-4. So, I don't see this as a reason to introduce a new
subscription option like detect_conflicts, if one wants to avoid such
an overhead, she shouldn't have enabled track_commit_timestamp in the
first place to detect conflicts. Also, even without this, we would see
similar overhead in the case of update/delete_missing where we LOG
when the tuple to modify is not found.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2024-08-05 03:58:44 Re: pg_combinebackup does not detect missing files
Previous Message Peter Smith 2024-08-05 03:44:41 Re: Pgoutput not capturing the generated columns