From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Subject: | RE: long-standing data loss bug in initial sync of logical replication |
Date: | 2025-03-13 08:42:07 |
Message-ID: | OSCPR01MB14966EB5F3B416E4689FB5A67F5D32@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
> Our team (mainly Shlok) did a performance testing with several workloads. Let
> me
> share them on -hackers. We did it for master/REL_17 branches, and in this post
> master's one will be discussed.
I posted benchmark results for master [1]. In this post contains a result for
back branch, especially REL_17_STABLE.
The observed trend is the same as master's one:
Frequent DDL for publishing tables can cause huge regression, but this is expected.
For other cases, it is small or does not exist.
Used source
===========
The base code was HEAD of REL_17_STABLE, and compared patch was v16.
The large difference is that master tries to preserve relsync caches as much as
possible, but REL_17_STABLE discards them more aggressively.
Please refer recent commit, 3abe9d and 588acf6.
Executed workloads were mostly same as master's case.
-----
Workload A: No DDL operation done in concurrent session
======================================
No regression was observed in the workload.
Concurrent txn | Head (sec) | Patch (sec) | Degradation (%)
------------------ | ------------ | ------------ | ----------------
50 | 0.013706 | 0.013398 | -2.2496
100 | 0.014811 | 0.014821 | 0.0698
500 | 0.018288 | 0.018318 | 0.1640
1000 | 0.022613 | 0.022622 | 0.0413
2000 | 0.031812 | 0.031891 | 0.2504
-----
Workload B: DDL is happening but is unrelated to publication
========================================
Small regression was observed when the concurrency was huge. Because the DDL
transaction would send inval messages to all the concurrent transactions.
Concurrent txn | Head (sec) | Patch (sec) | Degradation (%)
------------------ | ------------ | ------------ | ----------------
50 | 0.013159 | 0.013305 | 1.1120
100 | 0.014718 | 0.014725 | 0.0476
500 | 0.018134 | 0.019578 | 7.9628
1000 | 0.022762 | 0.025228 | 10.8324
2000 | 0.032326 | 0.035638 | 10.2467
-----
Workload C. DDL is happening on publication but on unrelated table
============================================
We did not run the workload because we expected this could be same results as D.
588acf6 is needed to optimize the workload.
-----
Workload D. DDL is happening on the related published table,
and one insert is done per invalidation
=========================================
This workload had huge regression same as the master branch. This is expected
because distributed invalidation messages require all concurrent transactions
to rebuild relsync caches.
Concurrent txn | Head (sec) | Patch (sec) | Degradation (%)
------------------ | ------------ | ------------ | ----------------
50 | 0.013496 | 0.015588 | 15.5034
100 | 0.015112 | 0.018868 | 24.8517
500 | 0.018483 | 0.038714 | 109.4536
1000 | 0.023402 | 0.063735 | 172.3524
2000 | 0.031596 | 0.110860 | 250.8720
-----
Workload E. DDL is happening on the related published table,
and 1000 inserts are done per invalidation
============================================
The regression seen by D. cannot be observed. This is same as master's case and
expected because decoding 1000 tuples requires much time.
Concurrent txn | Head (sec) | Patch (sec) | Degradation (%)
------------------ | ------------ | ------------ | ----------------
50 | 0.093019 | 0.108820 | 16.9869
100 | 0.188367 | 0.199621 | 5.9741
500 | 0.967896 | 0.970674 | 0.2870
1000 | 1.658552 | 1.803991 | 8.7691
2000 | 3.482935 | 3.682771 | 5.7376
Best regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2025-03-13 08:42:31 | Re: Test to dump and restore objects left behind by regression |
Previous Message | Rushabh Lathia | 2025-03-13 08:37:47 | Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints |