From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Subject: | Re: Conflict detection and logging in logical replication |
Date: | 2024-08-02 12:58:13 |
Message-ID: | CABdArM5gjD_BB3VjVYkrHLnEORX0sgKO4sgnhgaDfkpqB41APw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Performance tests done on the v8-0001 and v8-0002 patches, available at [1].
The purpose of the performance tests is to measure the impact on
logical replication with track_commit_timestamp enabled, as this
involves fetching the commit_ts data to determine
delete_differ/update_differ conflicts.
Fortunately, we did not see any noticeable overhead from the new
commit_ts fetch and comparison logic. The only notable impact is
potential overhead from logging conflicts if they occur frequently.
Therefore, enabling conflict detection by default seems feasible, and
introducing a new detect_conflict option may not be necessary.
Please refer to the following for detailed test results:
For all the tests, created two server nodes, one publisher and one as
subscriber. Both the nodes are configured with below settings -
wal_level = logical
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
~~~~
Test 1: create conflicts on Sub using pgbench.
----------------------------------------------------------------
Setup:
- Both publisher and subscriber have pgbench tables created as-
pgbench -p $node1_port postgres -qis 1
- At Sub, a subscription created for all the changes from Pub node.
Test Run:
- To test, ran pgbench for 15 minutes on both nodes simultaneously,
which led to concurrent updates and update_differ conflicts on the
Subscriber node.
Command used to run pgbench on both nodes-
./pgbench postgres -p 8833 -c 10 -j 3 -T 300 -P 20
Results:
For each case, note the “tps” and total time taken by the apply-worker
on Sub to apply the changes coming from Pub.
Case1: track_commit_timestamp = off, detect_conflict = off
Pub-tps = 9139.556405
Sub-tps = 8456.787967
Time of replicating all the changes: 19min 28s
Case 2 : track_commit_timestamp = on, detect_conflict = on
Pub-tps = 8833.016548
Sub-tps = 8389.763739
Time of replicating all the changes: 20min 20s
Case3: track_commit_timestamp = on, detect_conflict = off
Pub-tps = 8886.101726
Sub-tps = 8374.508017
Time of replicating all the changes: 19min 35s
Case 4: track_commit_timestamp = off, detect_conflict = on
Pub-tps = 8981.924596
Sub-tps = 8411.120808
Time of replicating all the changes: 19min 27s
**The difference of TPS between each case is small. While I can see a
slight increase of the replication time (about 5%), when enabling both
track_commit_timestamp and detect_conflict.
Test2: create conflict using a manual script
----------------------------------------------------------------
- To measure the precise time taken by the apply-worker in all cases,
create a test with a table having 10 million rows.
- To record the total time taken by the apply-worker, dump the
current time in the logfile for apply_handle_begin() and
apply_handle_commit().
Setup:
Pub : has a table ‘perf’ with 10 million rows.
Sub : has the same table ‘perf’ with its own 10 million rows (inserted
by 1000 different transactions). This table is subscribed for all
changes from Pub.
Test Run:
At Pub: run UPDATE on the table ‘perf’ to update all its rows in a
single transaction. (this will lead to update_differ conflict for all
rows on Sub when enabled).
At Sub: record the time(from log file) taken by the apply-worker to
apply all updates coming from Pub.
Results:
Below table shows the total time taken by the apply-worker
(apply_handle_commit Time - apply_handle_begin Time ).
(Two test runs for each of the four cases)
Case1: track_commit_timestamp = off, detect_conflict = off
Run1 - 2min 42sec 579ms
Run2 - 2min 41sec 75ms
Case 2 : track_commit_timestamp = on, detect_conflict = on
Run1 - 6min 11sec 602ms
Run2 - 6min 25sec 179ms
Case3: track_commit_timestamp = on, detect_conflict = off
Run1 - 2min 34sec 223ms
Run2 - 2min 33sec 482ms
Case 4: track_commit_timestamp = off, detect_conflict = on
Run1 - 2min 35sec 276ms
Run2 - 2min 38sec 745ms
** In the case-2 when both track_commit_timestamp and detect_conflict
are enabled, the time taken by the apply-worker is ~140% higher.
Test3: Case when no conflict is detected.
----------------------------------------------------------------
To measure the time taken by the apply-worker when there is no
conflict detected. This test is to confirm if the time overhead in
Test1-Case2 is due to the new function GetTupleCommitTs() which
fetches the origin and timestamp information for each row in the table
before applying the update.
Setup:
- The Publisher and Subscriber both have an empty table to start with.
- At Sub, the table is subscribed for all changes from Pub.
- At Pub: Insert 10 million rows and the same will be replicated to
the Sub table as well.
Test Run:
At Pub: run an UPDATE on the table to update all rows in a single
transaction. (This will NOT hit the update_differ on Sub because now
all the tuples have the Pub’s origin).
Results:
Case1: track_commit_timestamp = off, detect_conflict = off
Run1 - 2min 39sec 261ms
Run2 - 2min 30sec 95ms
Case 2 : track_commit_timestamp = on, detect_conflict = on
Run1 - 2min 38sec 985ms
Run2 - 2min 46sec 624ms
Case3: track_commit_timestamp = on, detect_conflict = off
Run1 - 2min 59sec 887ms
Run2 - 2min 34sec 336ms
Case 4: track_commit_timestamp = off, detect_conflict = on
Run1 - 2min 33sec 477min
Run2 - 2min 37sec 677ms
Test Summary -
-- The duration for case-2 was reduced to 2-3 minutes, matching the
times of the other cases.
-- The test revealed that the overhead in case-2 was not due to
commit_ts fetching (GetTupleCommitTs).
-- The additional action in case-2 was the error logging of all 10
million update_differ conflicts.
-- To confirm that the additional time was due to logging, I conducted
another test. I removed the "ReportApplyConflict()" call for
update_differ in the code and re-ran test1-case2 (which initially took
~6 minutes). Without conflict logging, the duration was reduced to
"2min 56sec 758 ms".
Test4 - Code Profiling
----------------------------------------------------------------
To narrow down the cause of the time overhead in Test2-case2, did code
profiling patches. Used same setup and test script as Test2.
The overhead in (track_committs=on and detect_conflict=on) case is not
introduced by the commit timestamp fetching(e.g. GetTupleCommitTs).
The main overhead comes from the log reporting which happens when
applying each change:
|--16.57%--ReportApplyConflict
|--13.17%--errfinish
--11.53%--EmitErrorReport
--11.41%--send_message_to_server_log ...
...
...
|--0.74%--GetTupleCommitTs"
Thank you Hou-San for helping in Test1 and conducting Test4.
Thanks,
Nisha
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2024-08-02 13:13:36 | Re: On disable_cost |
Previous Message | Robert Haas | 2024-08-02 12:17:16 | Re: On disable_cost |