Quick Links

RE: long-standing data loss bug in initial sync of logical replication

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Subject:	RE: long-standing data loss bug in initial sync of logical replication
Date:	2025-03-13 08:02:50
Message-ID:	OSCPR01MB149661EA973D65EBEC2B60D98F5D32@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Hackers,

Our team (mainly Shlok) did a performance testing with several workloads. Let me
share them on -hackers. We did it for master/REL_17 branches, and in this post
master's one will be discussed.

The observed trend is:
We observed that the performance regression exists primarily during frequent
execution of publication DDL statements that modify published tables. This is
expected due to the necessary cache rebuild and distribution overhead involved.
The regression is small or nearly nonexistent in scenarios where DDLs do not
affect published tables or when the frequency of such DDL statements is low.

Used source
========
The base code was HEAD plus some modifications, which could selectively invalidate
a relsync caches. It is now pushed by 3abe9d. The compared patch was v16.

We did five benchmarks, let me share one by one.

-----

Workload A: No DDL operation done in concurrent session
======================================
In this workload, number of concurrent transactions were varied, but none of them
contained DDL commands. Decoding time of all transactions were measured and compared.
We expected that the performance would not be changed because any of caches could
be invalidated. Actual workload is noted in [1] and runner is attached.

Below table contains a result. We could not find notable degradations.

Concurrent txn | Head (sec) | Patch (sec) | Degradation (%)
------------------ | ------------ | ------------ | ----------------
50 | 0.013196 | 0.013314 | 0.8968
100 | 0.014531 | 0.014728 | 1.3558
500 | 0.018079 | 0.017969 | -0.6066
1000 | 0.023087 | 0.023772 | 2.9670
2000 | 0.031311 | 0.031750 | 1.4010

-----

Workload B: DDL is happening but is unrelated to publication
=======================================
In this workload, one of concurrent transactions contained a DDL, but it did not
related with the publication and publishing tables. We also expected that the
performance would not be changed.
Actual workload is noted in [2] and runner is attached.

Below table contains a result. The patch we proposed distributes invalidation
messages to concurrent decoding transactions. It would be roughly proportional to
the concurrency, and what we observed proves the theory. Since inval messages
does not invalidate relsync caches, the difference is not so large.

Concurrent txn | Head (sec) | Patch (sec) | Degradation (%)
------------------ | ------------ | ------------ | ----------------
50 | 0.013410 | 0.013217 | -1.4417
100 | 0.014694 | 0.015260 | 3.8496
1000 | 0.023211 | 0.025376 | 9.3289
2000 | 0.032954 | 0.036322 | 10.2213

-----

Workload C. DDL is happening on publication but on unrelated table
===========================================
In this workload, one of concurrent transactions contained a DDL which altered
the using publication. But it just ADD/DROP table which was not being decoded.
Actual workload is noted in [3] and runner is attached.

Below table contains a result. Since the commit 3abe9dc, no need to rebuild the
whole of relsync cache anymore for the unrelated publish actions. Thus the
degradation was mostly same as B.

Concurrent txn | Head (sec) | Patch (sec) | Degradation (%)
------------------ | ------------ | ------------ | ----------------
50 | 0.013546 | 0.013409 | -1.0089
100 | 0.015225 | 0.015357 | 0.8648
500 | 0.017848 | 0.019300 | 8.1372
1000 | 0.023430 | 0.025152 | 7.3497
2000 | 0.032041 | 0.035877 | 11.9723

-----

Workload D. DDL is happening on the related published table,
and one insert is done per invalidation
=========================================
In this workload, one of concurrent transactions contained a DDL which altered
the using publication. Also, it DROP/ADD table which was being decoded. Actual
workload is noted in [4] and runner is attached.

Below table contains a result. Apart from B and C, we could expect that this
workload had huge degradation, because each distributed message would require
the rebuild of relsync caches. This meant that caches were discarded and re-built
for every transaction. And the result showed around 300% regression for 2000
concurrent transactions.

IIUC it is difficult to avoid the regression with current design.

Concurrent txn | Head (sec) | Patch (sec) | Degradation (%)
------------------ | ------------ | ------------ | ----------------
50 | 0.013944 | 0.016460 | 18.0384
100 | 0.014952 | 0.020160 | 34.8322
500 | 0.018535 | 0.043122 | 132.6577
1000 | 0.023426 | 0.072215 | 208.2628
2000 | 0.032055 | 0.131884 | 311.4314

-----

Workload E. DDL is happening on the related published table,
and 1000 inserts are done per invalidation
===========================================
This workload was mostly same ad D, but the number of inserted tuples was 1000x.
We expected that rebuilding caches is not so dominant in the workload so that
the regression would be small.
Actual workload is noted in [5] and runner is attached.

Below contains result. Apart from D. there were not huge regression. This reasonable
result because decoding insertion 1000 times occupied much CPU time.

Concurrent txn | Head (sec) | Patch (sec) | Degradation (%)
------------------ | ------------ | ------------ | ----------------
50 | 0.093019 | 0.108820 | 16.9869
100 | 0.188367 | 0.199621 | 5.9741
500 | 0.967896 | 0.970674 | 0.2870
1000 | 1.658552 | 1.803991 | 8.7691
2000 | 3.482935 | 3.682771 | 5.7376

Thanks again Shlok to measure data.

[1]:
1. Created a publisher on a single table, say 'tab_conc1';
2. 'n +1' sessions are running in parallel
3. Now:
All 'n' sessions :
BEGIN;
Insert a row in table 'tab_conc1';
In a session :
Insert a row in table 'tab_conc1';
Insert a row in table 'tab_conc1'
All 'n' sessions :
Insert a row in table 'tab_conc1';
COMMIT;
4. run 'pg_logical_slot_get_binary_changes' to get the decoding changes.

[2]:
1. Created a publisher on a single table, say 'tab_conc1';
2. 'n +1' sessions are running in parallel
3. Now:
All 'n' sessions :
BEGIN;
Insert a row in table 'tab_conc1'
In a session :
BEGIN; ALTER TABLE t1 ADD COLUMN b int; COMMIT;
BEGIN; ALTER TABLE t1 DROP COLUMN b; COMMIT;
All 'n' sessions :
Insert a row in table 'tab_conc1';
COMMIT;
4. run 'pg_logical_slot_get_binary_changes' to get the decoding changes.

[3]
Steps:
1. Created a publisher on a table, say 'tab_conc1', 't1';
2. 'n +1' sessions are running in parallel
3. Now:
All 'n' sessions :
BEGIN;
Insert a row in table 'tab_conc1'
In a session :
BEGIN; ALTER PUBLICATION regress_pub1 DROP TABLE t1; COMMIT;
BEGIN; ALTER PUBLICATION regress_pub1 ADD TABLE t1; COMMIT;
All 'n' sessions :
Insert a row in table 'tab_conc1';
COMMIT;
4. run 'pg_logical_slot_get_binary_changes' to get the decoding changes.

[4]:
1. Created a publisher on a single table, say 'tab_conc1';
2. 'n + 1' sessions are running in parallel
3. Now:
All 'n' sessions :
BEGIN;
Insert a row in table 'tab_conc1'
In a session :
BEGIN; Alter publication DROP 'tab_conc1'; COMMIT;
BEGIN; Alter publication ADD 'tab_conc1'; COMMIT;
All 'n' sessions :
Insert a row in table 'tab_conc1';
COMMIT;
4. run 'pg_logical_slot_get_binary_changes' to get the decoding changes.

[5]:
1. Created a publisher on a single table, say 'tab_conc1';
2. 'n +1' sessions are running in parallel
3. Now:
All 'n' sessions :
BEGIN;
Insert 1000 rows in table 'tab_conc1'
In a session :
BEGIN; ALTER PUBLICATION regress_pub1 DROP 'tab_conc1'; COMMIT;
BEGIN; ALTER PUBLICATION regress_pub1 ADD 'tab_conc1'; COMMIT;
All 'n' sessions :
Insert 1000 rows in table 'tab_conc1';
COMMIT;
4. run 'pg_logical_slot_get_binary_changes' to get the decoding changes.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
A.pl.txt	text/plain	2.5 KB
B.pl.txt	text/plain	2.6 KB
C.pl.txt	text/plain	2.6 KB
D.pl.txt	text/plain	2.6 KB
E.pl.txt	text/plain	2.8 KB

In response to

RE: long-standing data loss bug in initial sync of logical replication at 2025-03-13 05:29:30 from Hayato Kuroda (Fujitsu)

Responses

RE: long-standing data loss bug in initial sync of logical replication at 2025-03-13 08:42:07 from Hayato Kuroda (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zhijie Hou (Fujitsu)	2025-03-13 08:30:08	RE: 001_rep_changes.pl succeeds after a long time
Previous Message	Richard Guo	2025-03-13 07:53:08	Re: Wrong results with subquery pullup and grouping sets