RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Alexander Lakhin' <exclusion(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-07-03 05:12:25
Message-ID: OSBPR01MB25528300C71FDD83EA1DCA12F5DD2@OSBPR01MB2552.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Alexander and other hackers,

> As a recent buildfarm failure [1] shows, that test addition introduced
> new instability:
> ### Starting node "node_s"
> # Running: pg_ctl -w -D
> /home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_c
> reatesubscriber/data/t_040_pg_createsubscriber_node_s_data/pgdata
> -l
> /home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_c
> reatesubscriber/log/040_pg_createsubscriber_node_s.log
> -o --cluster-name=node_s start
> waiting for server to start.... done
> server started
> # Postmaster PID for node "node_s" is 416482
> error running SQL: 'psql:<stdin>:1: ERROR: skipping slot synchronization as the
> received slot sync LSN 0/30047F0 for
> slot "failover_slot" is ahead of the standby position 0/3004708'
> while running 'psql -XAtq -d port=51506 host=/tmp/pqWohdD5Qj
> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT
> pg_sync_replication_slots()' at
> /home/bf/bf-build/piculet/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster
> .pm line 2126.
>
> I could reproduce this failure with:
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -517,6 +517,7 @@ WalReceiverMain(char *startup_data, size_t
> startup_data_len)
> * let the startup process and primary server know
> about
> * them.
> */
> +pg_usleep(300000);
> XLogWalRcvFlush(false, startpointTLI);
>
> make -s check -C src/bin/pg_basebackup/ PROVE_TESTS="t/040*"
>
> # +++ tap check in src/bin/pg_basebackup +++
> t/040_pg_createsubscriber.pl .. 22/? # Tests were run but no plan was declared
> and done_testing() was not seen.
> # Looks like your test exited with 29 just after 23.
> t/040_pg_createsubscriber.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
> All 23 subtests passed
>
> Test Summary Report
> -------------------
> t/040_pg_createsubscriber.pl (Wstat: 7424 Tests: 23 Failed: 0)
> Non-zero exit status: 29
> Parse errors: No plan found in TAP output
> Files=1, Tests=23, 4 wallclock secs ( 0.01 usr 0.01 sys + 0.49 cusr 0.44
> csys = 0.95 CPU)

I thought it was not related with the feature of pg_createsubscriber. It was
related with the slotsync function. In the first place, the error message was
output with the below backtrace.

```
synchronize_one_slot
synchronize_slots
pg_sync_replication_slots
```

And I found the error could be reproduced by the attached script with the source
modification by Alexander [1]. According to my analysis, this could be caused when
1) a replication slot is created with failover = true and XLOG_RUNNING_XACT record is generated, and
2) pg_sync_replication_slots() is called *before* the record is flushed on the
standby. To stabilize the test, we can call wait_for_replay_catchup() after the slot
creation on the primary.

> Moreover, this test may suffer from autovacuum:
> echo "
> autovacuum_naptime = 1
> autovacuum_analyze_threshold = 1
> " > /tmp/temp.config
> TEMP_CONFIG=/tmp/temp.config make -s check -C src/bin/pg_basebackup/
> PROVE_TESTS="t/040*"
>
> # +++ tap check in src/bin/pg_basebackup +++
> t/040_pg_createsubscriber.pl .. 24/?
> # Failed test 'failover slot is synced'
> # at t/040_pg_createsubscriber.pl line 273.
> # got: ''
> # expected: 'failover_slot'
> t/040_pg_createsubscriber.pl .. 28/? # Looks like you failed 1 test of 33.
> t/040_pg_createsubscriber.pl .. Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/33 subtests

Thanks for reporting. I could reproduce the failure, and some BF animals said NG.
According to them, the root cause seemed that slot synchronization was failed.

```
LOG: statement: SELECT pg_sync_replication_slots()
LOG: could not sync slot "failover_slot" as remote slot precedes local slot
DETAIL: Remote slot has LSN 0/3006890 and catalog xmin 743, but local slot has LSN 0/3006890 and catalog xmin 744.
```

Based on that, I considered a scenario why the slot could not be synchronized.
I felt this was not caused by the pg_createsubscriber.

1. At initial stage, the xmin of the physical slot is 743, and nextXid of the
primary is also 743.
2. Autovacuum worker starts a new transaction. nextXid is incremented to 744.
3. Tries to creates a logical replication slot with failover=true *before the
transaction at step2 is replicated to the standby*.
4. While creating the slot, the catalog_xmin must be determined.
The initial candidate is nextXid (= 744), but the oldest xmin of replication
slots (=743) is used if it is older than nextXid. So 743 is chosen in this case.
This operaion is done in CreateInitDecodingContext()->GetOldestSafeDecodingContext().
5. After that, the transaction at step2 is reached to the standby node and it
updates the nextXid.
6. Finally runs pg pg_sync_replication_slots() on the standby. It finds a failover
slot on the primary and tries to create on the standby. However, the
catalog_xmin on the primary (743) is older than the nextXid of the standby (744)
so that it skips to create a slot.

To avoid the issue, we can disable the autovacuuming while testing.

# Descriptions for attached files

An attached script can be used to reproduce the first failure without pg_createsubscriber.
It requires to modify the code like [1].

0001 patch can reduce the wait time of test. See [3]. I know the approach is bit hacky but
it worked.
0002 patch waits until the WAL record is replicated. This fixes a first failure.
0003 patch disables autovacuum for node_p and node_s. I think node_p is enough, but did
like that just in case. This fixes a second failure.

[1]: https://www.postgresql.org/message-id/0dffca12-bf17-4a7a-334d-225569de5e6e%40gmail.com
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-07-02%2008%3A45%3A39
[3]: https://www.postgresql.org/message-id/OSBPR01MB25521B15BF950D2523BBE143F5D32%40OSBPR01MB2552.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment Content-Type Size
v2-0001-emit-dummy-message-while-setting-up-the-publisher.patch application/octet-stream 1.6 KB
v2-0002-wait-until-RUNNING_XACT-is-replicated.patch application/octet-stream 1018 bytes
v2-0003-disable-autovacuum-while-testing.patch application/octet-stream 1.1 KB
repro_skipping_slot_sync.sh application/octet-stream 767 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2024-07-03 05:16:58 Re: Conflict Detection and Resolution
Previous Message Hayato Kuroda (Fujitsu) 2024-07-03 03:51:48 RE: speed up a logical replica setup