From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Cary Huang <cary(dot)huang(at)highgo(dot)ca>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Switching XLog source from archive to streaming when primary available |
Date: | 2024-03-23 09:22:55 |
Message-ID: | CALj2ACVVM8c9pbCgmyxvViFy=f==QNHhPFSGjvq0Sz7Y0beZsw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 18, 2024 at 11:38 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote:
> > Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a
> > conflict in meson.build. Please see the attached v23 patch.
>
> I've been reading this patch, and this is a very tricky one. Please
> be *very* cautious.
Thanks for looking into this.
> +#streaming_replication_retry_interval = 0 # time after which standby
> + # attempts to switch WAL source from archive to
> + # streaming replication in seconds; 0 disables
>
> This stuff allows a minimal retry interval of 1s. Could it be useful
> to have more responsiveness here and allow lower values than that?
> Why not switching the units to be milliseconds?
Nathan had a different view on this to have it on the order of seconds
- https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13.
If set to a too low value, the frequency of standby trying to connect
to primary increases. IMO, the order of seconds seems fine.
> + if (streaming_replication_retry_interval <= 0 ||
> + !StandbyMode ||
> + currentSource != XLOG_FROM_ARCHIVE)
> + return SWITCH_TO_STREAMING_NONE;
>
> Hmm. Perhaps this should mention why we don't care about the
> consistent point.
Are you asking why we don't care whether the standby reached a
consistent point when switching to streaming mode due to this new
parameter? If this is the ask, the same applies when a standby
typically switches to streaming replication (get WAL
from primary) today, that is when receive from WAL archive finishes
(no more WAL left there) or fails for any reason. The standby doesn't
care about the consistent point even today, it just trusts the WAL
source and makes the switch.
> + /* See if we can switch WAL source to streaming */
> + if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE)
> + wal_source_switch_state = SwitchWALSourceToPrimary();
>
> Rather than a routine that returns as result the value to use for the
> GUC, I'd suggest to let this routine set the GUC as there is only one
> caller of SwitchWALSourceToPrimary(). This can also include a check
> on SWITCH_TO_STREAMING_NONE, based on what I'm reading that.
Firstly, wal_source_switch_state is not a GUC, it's a static variable
to be used across WaitForWALToBecomeAvailable calls. And, if you are
suggesting to turn SwitchWALSourceToPrimary so that it sets
wal_source_switch_state directly, I'd not do that because when
wal_source_switch_state is not SWITCH_TO_STREAMING_NONE, the function
gets called unnecessarily.
> - if (lastSourceFailed)
> + if (lastSourceFailed ||
> + wal_source_switch_state == SWITCH_TO_STREAMING)
>
> Hmm. This one may be tricky. I'd recommend a separation between the
> failure in reading from a source and the switch to a new "forced"
> source.
Separation would just add duplicate code. Moreover, the code wrapped
within if (lastSourceFailed) doesn't do any error handling or such, it
just resets a few stuff from the previous source and sets the next
source.
FWIW, please check [1] (and the discussion thereon) for how the
lastSourceFailed flag is being used to consume all the streamed WAL in
pg_wal directly upon detecting promotion trigger file. Therefore, I
see no problem with the way it is right now for this new feature.
[1]
/*
* Data not here yet. Check for trigger, then wait for
* walreceiver to wake us up when new WAL arrives.
*/
if (CheckForStandbyTrigger())
{
/*
* Note that we don't return XLREAD_FAIL immediately
* here. After being triggered, we still want to
* replay all the WAL that was already streamed. It's
* in pg_wal now, so we just treat this as a failure,
* and the state machine will move on to replay the
* streamed WAL from pg_wal, and then recheck the
* trigger and exit replay.
*/
lastSourceFailed = true;
> + if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING)
> + readFrom = XLOG_FROM_PG_WAL;
> + else
> + readFrom = currentSource == XLOG_FROM_ARCHIVE ?
> + XLOG_FROM_ANY : currentSource;
>
> WALSourceSwitchState looks confusing here, and are you sure that this
> is actualy correct? Shouldn't we still try a READ_FROM_ANY or a read
> from the archives even with a streaming pending.
Please see the discussion starting from
https://www.postgresql.org/message-id/20221008215221.GA894639%40nathanxps13.
We wanted to keep the existing behaviour the same when we
intentionally switch source to streaming from archive due to the
timeout. The existing behaviour is to exhaust WAL in pg_wal before
switching the source to streaming after failure to fetch from archive.
When wal_source_switch_state is SWITCH_TO_STREAMING_PENDING, the
currentSource is already XLOG_FROM_ARCHIVE (To clear the dust off
here, I've added an assert now in the attached new v24 patch). And, we
don't want to pass XLOG_FROM_ANY to XLogFileReadAnyTLI to again fetch
from the archive. Hence, we choose readFrom = XLOG_FROM_PG_WAL to
specifically tell XLogFileReadAnyTLI read from pg_wal directly.
> By the way, I am not convinced that what you have is the best
> interface ever. This assumes that we'd always want to switch to
> streaming more aggressively. Could there be a point in also
> controlling if we should switch to pg_wal/ or just to archiving more
> aggressively as well, aka be able to do the opposite switch of WAL
> source? This design looks somewhat limited to me. The origin of the
> issue is that we don't have a way to control the order of the sources
> consumed by WAL replay. Perhaps something like a replay_source_order
> that uses a list would be better, with elements settable to archive,
> pg_wal and streaming?
Intention of this feature is to provide a way for the streaming
standby to quickly detect when the primary is up and running without
having to wait until either all the WAL in the archive is over or a
failure to fetch from archive happens. Advantages of this feature are:
1) it can make the recovery a bit faster (if fetching from archive
adds up costs with different storage types, IO costs and network
delays), thus can reduce the replication lag 2) primary (if using
replication slot based streaming replication setup) doesn't have to
keep the required WAL for the standby for longer durations, thus
reducing the risk of no space left on disk issues.
IMHO, it makes sense to have something like replay_source_order if
there's any use case that arises in future requiring the standby to
intentionally switch to pg_wal or archive. But not as part of this
feature.
Please see the attached v24 patch. I've added an assertion that the
current source is archive before calling XLogFileReadAnyTLI if
wal_source_switch_state is SWITCH_TO_STREAMING_PENDING. I've also
added the new enum WALSourceSwitchState to typedefs.list to make
pgindent adjust it correctly.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v24-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch | application/octet-stream | 21.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2024-03-23 10:59:17 | Re: MIN/MAX functions for a record |
Previous Message | Bertrand Drouvot | 2024-03-23 09:04:45 | Re: Introduce XID age and inactive timeout based replication slot invalidation |