From: | "Euler Taveira" <euler(at)eulerto(dot)com> |
---|---|
To: | "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, "Shlok Kyal" <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
Cc: | "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Peter Eisentraut" <peter(at)eisentraut(dot)org>, "Bharath Rupireddy" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "Andres Freund" <andres(at)anarazel(dot)de>, "Ashutosh Bapat" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, "vignesh C" <vignesh21(at)gmail(dot)com> |
Subject: | Re: speed up a logical replica setup |
Date: | 2024-05-22 15:16:24 |
Message-ID: | 5da4b286-e570-4341-8262-92d763f6b1a4@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:
> >
> > v2-0001: not changed
> >
>
> Shouldn't we modify it as per the suggestion given in the email [1]? I
> am wondering if we can entirely get rid of checking the primary
> business and simply rely on recovery_timeout and keep checking
> server_is_in_recovery(). If so, we can modify the test to use
> non-default recovery_timeout (say 180s or something similar if we have
> used it at any other place). As an additional check we can ensure that
> constent_lsn is present on standby.
That's exactly what I want to propose as Tomas convinced me offlist that less is
better when we don't have a useful recovery progress reporting mechanism to make
sure it is still working on the recovery and we should wait.
> > v2-0002: not changed
> >
>
> We have added more tries to see if the primary_slot_name becomes
> active but I think it is still fragile because it is possible on slow
> machines that the required slot didn't become active even after more
> retries. I have raised the same comment previously [2] and asked an
> additional question but didn't get any response.
Following the same line that simplifies the code, we can: (a) add a loop in
check_subscriber() that waits until walreceiver is available on subscriber or
(b) use a timeout. The main advantage of (a) is that the primary slot is already
available but I'm afraid we need a escape mechanism for the loop (timeout?).
I'll summarize all issues as soon as I finish the review of sync slot support. I
think we should avoid new development if we judge that the item can be
documented as a limitation for this version. Nevertheless, I will share patches
so you can give your opinion on whether it is an open item or new development.
--
Euler Taveira
EDB https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2024-05-22 15:29:34 | Re: An implementation of multi-key sort |
Previous Message | Robert Haas | 2024-05-22 15:16:01 | Re: Shared detoast Datum proposal |