Re: speed up a logical replica setup

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-05-24 09:37:05
Message-ID: CANhcyEWGfp7_AGg2zZUgJF_VYTCix01yeY8ZX9woz+03WCMPRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 22 May 2024 at 16:50, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, May 20, 2024 at 4:30 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> > >
> > > I was trying to test this utility when 'sync_replication_slots' is on
> > > and it gets in an ERROR loop [1] and never finishes. Please find the
> > > postgresql.auto used on the standby attached. I think if the standby
> > > has enabled sync_slots, you need to pass dbname in
> > > GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> > > there are already synced slots on the standby (either due to
> > > 'sync_replication_slots' or users have used
> > > pg_sync_replication_slots() before invoking pg_createsubscriber),
> > > those would be retained as it is on new subscriber and lead to
> > > unnecessary WAL retention and dead rows.
> > >
> > > [1]
> > > 2024-04-30 11:50:43.239 IST [12536] LOG: slot sync worker started
> > > 2024-04-30 11:50:43.247 IST [12536] ERROR: slot synchronization
> > > requires dbname to be specified in primary_conninfo
> >
> > Hi,
> >
> > I tested the scenario posted by Amit in [1], in which retaining synced
> > slots lead to unnecessary WAL retention and ERROR. This is raised as
> > the second open point in [2].
> > The steps to reproduce the issue:
> > (1) Setup physical replication with sync slot feature turned on by
> > setting sync_replication_slots = 'true' or using
> > pg_sync_replication_slots() on the standby node.
> > For physical replication setup, run pg_basebackup with -R and -d option.
> > (2) Create a logical replication slot on primary node with failover
> > option as true. A corresponding slot is created on standby as part of
> > sync slot feature.
> > (3) Run pg_createsubscriber on standby node.
> > (4) On Checking for the replication slot on standby node, I noticed
> > that the logical slots created in step 2 are retained.
> > I have attached the script to reproduce the issue.
> >
> > I and Kuroda-san worked to resolve open points. Here are patches to
> > solve the second and third point in [2].
> > Patches proposed by Euler are also attached just in case, but they
> > were not modified.
> >
> > v2-0001: not changed
> >
>
> Shouldn't we modify it as per the suggestion given in the email [1]? I
> am wondering if we can entirely get rid of checking the primary
> business and simply rely on recovery_timeout and keep checking
> server_is_in_recovery(). If so, we can modify the test to use
> non-default recovery_timeout (say 180s or something similar if we have
> used it at any other place). As an additional check we can ensure that
> constent_lsn is present on standby.
>

I have made changes as per your suggestion. I have used
pg_last_wal_receive_lsn to get lsn last wal received on standby and
check if consistent_lsn is already present on the standby.
I have only made changes in v3-0001. v3-0002, v3-0003, v3-0004 and
v3-0005 are not modified.

Thanks and Regards,
Shlok Kyal

Attachment Content-Type Size
v3-0005-Avoid-outputing-some-messages-in-dry_run-mode.patch application/octet-stream 7.7 KB
v3-0003-Disable-slot-sync-during-the-convertion.patch application/octet-stream 1.5 KB
v3-0001-Improve-the-code-that-checks-if-the-recovery-is-f.patch application/octet-stream 4.9 KB
v3-0004-Drop-replication-slots-which-had-been-synchronize.patch application/octet-stream 3.5 KB
v3-0002-Improve-the-code-that-checks-if-the-primary-slot-.patch application/octet-stream 2.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-05-24 09:58:40 Re: Convert node test compile-time settings into run-time parameters
Previous Message Anthonin Bonnefoy 2024-05-24 09:35:10 Possible incorrect row estimation for Gather paths