Re: speed up a logical replica setup

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-06-30 13:00:00
Message-ID: 0dffca12-bf17-4a7a-334d-225569de5e6e@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Peter and Euler,

17.06.2024 14:04, Peter Eisentraut wrote:
> On 07.06.24 05:49, Euler Taveira wrote:
>> Here it is a patch series to fix the issues reported in recent discussions. The
>> patches 0001 and 0003 aim to fix the buildfarm issues. The patch 0002 removes
>> synchronized failover slots on subscriber since it has no use. I also included
>> an optional patch 0004 that improves the usability by checking both servers if
>> it already failed in any subscriber check.
>
> I have committed 0001, 0002, and 0003.  Let's keep an eye on the buildfarm to see if that stabilizes things.  So far
> it looks good.
>
> For 0004, I suggest inverting the result values from check_publisher() and create_subscriber() so that it returns true
> if the check is ok.

As a recent buildfarm failure [1] shows, that test addition introduced
new instability:
### Starting node "node_s"
# Running: pg_ctl -w -D
/home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_createsubscriber/data/t_040_pg_createsubscriber_node_s_data/pgdata
-l
/home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_createsubscriber/log/040_pg_createsubscriber_node_s.log
-o --cluster-name=node_s start
waiting for server to start.... done
server started
# Postmaster PID for node "node_s" is 416482
error running SQL: 'psql:<stdin>:1: ERROR:  skipping slot synchronization as the received slot sync LSN 0/30047F0 for
slot "failover_slot" is ahead of the standby position 0/3004708'
while running 'psql -XAtq -d port=51506 host=/tmp/pqWohdD5Qj dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT
pg_sync_replication_slots()' at /home/bf/bf-build/piculet/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 2126.

I could reproduce this failure with:
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -517,6 +517,7 @@ WalReceiverMain(char *startup_data, size_t startup_data_len)
                      * let the startup process and primary server know about
                      * them.
                      */
+pg_usleep(300000);
                     XLogWalRcvFlush(false, startpointTLI);

make -s check -C src/bin/pg_basebackup/ PROVE_TESTS="t/040*"

# +++ tap check in src/bin/pg_basebackup +++
t/040_pg_createsubscriber.pl .. 22/? # Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 23.
t/040_pg_createsubscriber.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
All 23 subtests passed

Test Summary Report
-------------------
t/040_pg_createsubscriber.pl (Wstat: 7424 Tests: 23 Failed: 0)
  Non-zero exit status: 29
  Parse errors: No plan found in TAP output
Files=1, Tests=23,  4 wallclock secs ( 0.01 usr  0.01 sys +  0.49 cusr  0.44 csys =  0.95 CPU)

Moreover, this test may suffer from autovacuum:
echo "
autovacuum_naptime = 1
autovacuum_analyze_threshold = 1
" > /tmp/temp.config
TEMP_CONFIG=/tmp/temp.config make -s check -C src/bin/pg_basebackup/ PROVE_TESTS="t/040*"

# +++ tap check in src/bin/pg_basebackup +++
t/040_pg_createsubscriber.pl .. 24/?
#   Failed test 'failover slot is synced'
#   at t/040_pg_createsubscriber.pl line 273.
#          got: ''
#     expected: 'failover_slot'
t/040_pg_createsubscriber.pl .. 28/? # Looks like you failed 1 test of 33.
t/040_pg_createsubscriber.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/33 subtests

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2024-06-28%2004%3A42%3A48

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2024-06-30 13:17:04 Extension using Meson as build system
Previous Message Joel Jacobson 2024-06-30 11:24:15 Re: Optimize numeric.c mul_var() using the Karatsuba algorithm