Re: speed up a logical replica setup

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, "Shlok Kyal" <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc: "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Peter Eisentraut" <peter(at)eisentraut(dot)org>, "Bharath Rupireddy" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "Andres Freund" <andres(at)anarazel(dot)de>, "Ashutosh Bapat" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, "vignesh C" <vignesh21(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-05-22 15:16:24
Message-ID: 5da4b286-e570-4341-8262-92d763f6b1a4@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:
> >
> > v2-0001: not changed
> >
>
> Shouldn't we modify it as per the suggestion given in the email [1]? I
> am wondering if we can entirely get rid of checking the primary
> business and simply rely on recovery_timeout and keep checking
> server_is_in_recovery(). If so, we can modify the test to use
> non-default recovery_timeout (say 180s or something similar if we have
> used it at any other place). As an additional check we can ensure that
> constent_lsn is present on standby.

That's exactly what I want to propose as Tomas convinced me offlist that less is
better when we don't have a useful recovery progress reporting mechanism to make
sure it is still working on the recovery and we should wait.

> > v2-0002: not changed
> >
>
> We have added more tries to see if the primary_slot_name becomes
> active but I think it is still fragile because it is possible on slow
> machines that the required slot didn't become active even after more
> retries. I have raised the same comment previously [2] and asked an
> additional question but didn't get any response.

Following the same line that simplifies the code, we can: (a) add a loop in
check_subscriber() that waits until walreceiver is available on subscriber or
(b) use a timeout. The main advantage of (a) is that the primary slot is already
available but I'm afraid we need a escape mechanism for the loop (timeout?).

I'll summarize all issues as soon as I finish the review of sync slot support. I
think we should avoid new development if we judge that the item can be
documented as a limitation for this version. Nevertheless, I will share patches
so you can give your opinion on whether it is an open item or new development.

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-05-22 15:29:34 Re: An implementation of multi-key sort
Previous Message Robert Haas 2024-05-22 15:16:01 Re: Shared detoast Datum proposal