Quick Links

Re: speed up a logical replica setup

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Euler Taveira <euler(at)eulerto(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: speed up a logical replica setup
Date:	2024-03-26 21:29:19
Message-ID:	5d5dd4cd-6359-4109-88e8-c8e13035ae16@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 3/26/24 21:17, Euler Taveira wrote:
> On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
>> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
>> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
>> waited long enough?
>
> It was an attempt to decoupled a connection failure (that keeps streaming the
> WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
> is gone during the standby recovery process, there is a way to bail out. The
> recovery-timeout is 0 (infinite) by default so you have an infinite wait without
> this check. The idea behind this implementation is to avoid exiting in this
> critical code path. If it times out here you might have to rebuild the standby
> and start again.

- This seems like something that should definitely be documented in the
comment before wait_for_end_recovery(). At the moment it only talks
about timeout, and nothing about NUM_CONN_ATTEMPTS.

- The NUM_CONN_ATTEMPTS name seems rather misleading, considering it
does not really count connection attempts, but number of times we have
not seen 1 in pg_catalog.pg_stat_wal_receiver.

- Not sure I follow the logic - it tries to avoid exiting by setting
infinite timeout, but it still exists based on NUM_CONN_ATTEMPTS. Isn't
that somewhat contradictory?

- Isn't the NUM_CONN_ATTEMPTS actually making it more fragile, i.e. more
likely to exit? For example, what if there's a short networking hiccup,
so that the standby can't connect to the primary.

- It seems a bit strange that even with the recovery timeout set, having
the limit of 10 "connection attempts" effectively establishes a separate
hard-coded limit of 10 seconds. Seems a bit surprising if I set recovery
limit to 1 minute, and it just dies after 10 seconds.

> Amit suggested [1] that we use a value as recovery-timeout but
> how high is a good value? I've already saw some long recovery process using
> pglogical equivalent that timeout out after hundreds of minutes. Maybe I'm too
> worried about a small percentage of cases and we should use 1h as default, for
> example. It would reduce the complexity since the recovery process lacks some
> progress indicators (LSN is not sufficient in this case and there isn't a
> function to provide the current state -- stop applying WAL, reach target, new
> timeline, etc).
>
> If we remove the pg_stat_wal_receiver check, we should avoid infinite recovery
> by default otherwise we will have some reports saying the tool is hanging when
> in reality the primary has gone and WAL should be streamed.
>

I don't think there's a default timeout value that would work for
everyone. Either it's going to be too short for some cases, or it'll
take too long for some other cases.

I think there are two obvious default values for the timeout - infinity,
and 60 seconds, which is the default we use for other CLI tools (like
pg_ctl and so on). Considering the negative impact of exiting, I'd say
it's better to default to infinity. It's always possible to Ctrl-C or
terminate the process in some other way, if needed.

As for people complaining about infinite recovery - perhaps it'd be
sufficient to mention this in the messages printed by the tool, to make
it clearer. Or maybe even print something in the loop, because right now
it's entirely silent so it's easy to believe it's stuck. Perhaps not on
every loop, but at least in verbose mode it should print something.

>> IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling
>> pg_createsubscriber, and that should do the trick.
>
> That's a good idea. Tests are not exercising the recovery-timeout option.
>
>> Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing
>> things like ubsan/valgrind already use to deal with exactly this kind of
>> timeout problem.
>>
>> Or is there a deeper problem with deciding if the system is in recovery?
>
> As I said with some recovery progress indicators it would be easier to make some
> decisions like wait a few seconds because the WAL has already been applied and
> it is creating a new timeline. The recovery timeout decision is a shot in the
> dark because we might be aborting pg_createsubscriber when the target server is
> about to set RECOVERY_STATE_DONE.
>

Isn't it enough to check data in pg_stat_replication on the primary?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: speed up a logical replica setup at 2024-03-26 20:17:15 from Euler Taveira

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Melanie Plageman	2024-03-26 21:45:59	Re: Combine Prune and Freeze records emitted by vacuum
Previous Message	Dmitry Dolgov	2024-03-26 20:59:16	Re: pg_stat_statements and "IN" conditions