Re: speed up a logical replica setup

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: speed up a logical replica setup
Date: 2024-07-09 11:00:00
Message-ID: bde6ac67-69cc-c104-5ab6-dd4f5deadf24@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Amit and Kuroda-san,

03.07.2024 14:02, Amit Kapila wrote:
> Pushed 0002 and 0003. Let's wait for a discussion on 0001.

Please look at another failure of the test [1]:
[13:28:05.647](2.460s) not ok 26 - failover slot is synced
[13:28:05.648](0.001s) #   Failed test 'failover slot is synced'
#   at /home/bf/bf-build/skink-master/HEAD/pgsql/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl line 307.
[13:28:05.648](0.000s) #          got: ''
#     expected: 'failover_slot'

with 040_pg_createsubscriber_node_s.log containing:
2024-07-08 13:28:05.369 UTC [3985464][client backend][0/2:0] LOG: statement: SELECT pg_sync_replication_slots()
2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] LOG: could not sync slot "failover_slot" as remote slot
precedes local slot
2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] DETAIL:  Remote slot has LSN 0/30047B8 and catalog xmin
743, but local slot has LSN 0/30047B8 and catalog xmin 744.

I could not reproduce it locally, but I've discovered that that subtest
somehow depends on pg_createsubscriber executed for the
'primary contains unmet conditions on node P' check. For example with this
test modification:
@@ -249,7 +249,7 @@ command_fails(
         $node_p->connstr($db1), '--socket-directory',
         $node_s->host, '--subscriber-port',
         $node_s->port, '--database',
-        $db1, '--database',
+        'XXX', '--database',
         $db2
     ],
     'primary contains unmet conditions on node P');

I see the same failure:
2024-07-09 10:19:43.284 UTC [938890] 040_pg_createsubscriber.pl LOG:  statement: SELECT pg_sync_replication_slots()
2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl LOG:  could not sync slot "failover_slot" as remote slot
precedes local slot
2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl DETAIL:  Remote slot has LSN 0/3004780 and catalog xmin
743, but local slot has LSN 0/3004780 and catalog xmin 744.

Thus maybe even a normal pg_createsubscriber run can affect the primary
server (it's catalog xmin) differently?

One difference I found in the logs, is that the skink failure's
regress_log_040_pg_createsubscriber contains:
pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 1 remain

Though for a successful run I see locally (I can't find logs of
successful test runs on skink):
pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 0 remain

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-07-08%2013%3A16%3A35

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-07-09 11:34:26 Re: 回复: An implementation of multi-key sort
Previous Message Junwang Zhao 2024-07-09 10:57:48 Re: Address the -Wuse-after-free warning in ATExecAttachPartition()