Re: speed up a logical replica setup

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Shlok Kyal" <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Peter Eisentraut" <peter(at)eisentraut(dot)org>, "Bharath Rupireddy" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "Andres Freund" <andres(at)anarazel(dot)de>, "Ashutosh Bapat" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, "vignesh C" <vignesh21(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-05-23 15:13:07
Message-ID: 7aa94b4a-9139-4b24-a184-dd17b38a0c8f@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote:
> On Wed, May 22, 2024 at 8:46 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:
> >
> > Following the same line that simplifies the code, we can: (a) add a loop in
> > check_subscriber() that waits until walreceiver is available on subscriber or
> > (b) use a timeout. The main advantage of (a) is that the primary slot is already
> > available but I'm afraid we need a escape mechanism for the loop (timeout?).
> >
>
> Sorry, it is not clear to me why we need any additional loop in
> check_subscriber(), aren't we speaking about the problem in
> check_publisher() function?

The idea is to use check_subscriber() to check pg_stat_walreceiver. Once this
view returns a row and primary_slot_name is set on standby, the referred
replication slot name should be active on primary. Hence, the query on
check_publisher() make sure that the referred replication slot is in use on
primary.

> Why in the first place do we need to ensure that primary_slot_name is
> active on the primary? You mentioned something related to WAL
> retention but I don't know how that is related to this tool's
> functionality. If at all, we are bothered about WAL retention on the
> primary that should be the WAL corresponding to consistent_lsn
> computed by setup_publisher() but this check doesn't seem to ensure
> that.

Maybe it is a lot of checks. I'm afraid there isn't a simple way to get and
make sure the replication slot is used by the physical replication. I mean if
there is primary_slot_name = 'foo' on standby, there is no guarantee that the
replication slot 'foo' exists on primary. The idea is to get the exact
replication slot name used by physical replication to drop it. Once I posted a
patch it should be clear. (Another idea is to relax this check and rely only on
primary_slot_name to drop this replication slot on primary. The replication slot
might not exist and it shouldn't return an error in this case.)

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-05-23 15:30:25 Re: First draft of PG 17 release notes
Previous Message Tom Lane 2024-05-23 15:10:49 Re: about cross-compiling issue