Quick Links

Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

From:	"Hsu, John" <hsuchen(at)amazon(dot)com>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Date:	2022-02-25 21:52:03
Message-ID:	b7561922-c241-2e3c-0da9-cd01bf4e6ebf@amazon.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

On 2/25/22 11:38 AM, Nathan Bossart wrote:

> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Fri, Feb 25, 2022 at 08:31:37PM +0530, Bharath Rupireddy wrote:
>> Thanks Satya and others for the inputs. Here's the v1 patch that
>> basically allows async wal senders to wait until the sync standbys
>> report their flush lsn back to the primary. Please let me know your
>> thoughts.
> I haven't had a chance to look too closely yet, but IIUC this adds a new
> function that waits for synchronous replication. This new function
> essentially spins until the synchronous LSN has advanced.
>
> I don't think it's a good idea to block sending any WAL like this. AFAICT
> it is possible that there will be a lot of synchronously replicated WAL
> that we can send, and it might just be the last several bytes that cannot
> yet be replicated to the asynchronous standbys. І believe this patch will
> cause the server to avoid sending _any_ WAL until the synchronous LSN
> advances.
>
> Perhaps we should instead just choose the SendRqstPtr based on the current
> synchronous LSN. Presumably there are other things we'd need to consider,
> but in general, I think we ought to send as much WAL as possible for a
> given call to XLogSendPhysical().

I think you're right that we'll avoid sending any WAL until sync_lsn
advances. We could setup a contrived situation where the async-walsender
never advances because it terminates before the flush_lsn of the
synchronous_node catches up. And when the async-walsender restarts,
it'll start with the latest flushed on the primary and we could go into
a perpetual loop.

I took a look at the patch and tested basic streaming with async
replicas ahead of the synchronous standby and with logical clients as
well and it works as expected.

>
> ereport(LOG,
>            (errmsg("async standby WAL sender with request LSN %X/%X
is waiting as sync standbys are ahead with flush LSN %X/%X",
>                    LSN_FORMAT_ARGS(flushLSN),
LSN_FORMAT_ARGS(sendRqstPtr)),
>            errhidestmt(true)));

I think this log formatting is incorrect.
s/sync standbys are ahead/sync standbys are behind/ and I think you need
to swap flushLsn and sendRqstPtr

When a walsender is waiting for the lsn on the synchronous replica to
advance and a database stop is issued to the writer, the pg_ctl stop
isn't able to proceed and the database seems to never shutdown.

> Assert(priority >= 0);

What's the point of the assert here?

Also the comments/code refer to AsyncStandbys, however it's also used
for logical clients, which may or may not be standbys. Don't feel too
strongly about the naming here but something to note.

> if (!ShouldWaitForSyncRepl())
> return;
> ...
> for (;;)
> {
> // rest of work
> }

If we had a walsender already waiting for an ack, and the conditions of
ShouldWaitForSyncRepl() change, such as disabling
async_standbys_wait_for_sync_replication or synchronous replication
it'll still wait since we never re-check the condition.

postgres=# select wait_event from pg_stat_activity where wait_event like
'AsyncWal%';
wait_event
--------------------------------------
AsyncWalSenderWaitForSyncReplication
AsyncWalSenderWaitForSyncReplication
AsyncWalSenderWaitForSyncReplication
(3 rows)

postgres=# show synchronous_standby_names;
synchronous_standby_names
---------------------------

(1 row)

postgres=# show async_standbys_wait_for_sync_replication;
async_standbys_wait_for_sync_replication
------------------------------------------
off
(1 row)

>    LWLockAcquire(SyncRepLock, LW_SHARED);
>    flushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
>    LWLockRelease(SyncRepLock);

Should we configure this similar to the user's setting of
synchronous_commit instead of just flush? (SYNC_REP_WAIT_WRITE,
SYNC_REP_WAIT_APPLY)

Thanks,

John H

In response to

Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) at 2022-02-25 19:38:19 from Nathan Bossart

Responses

Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) at 2022-02-26 09:16:56 from Bharath Rupireddy

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2022-02-25 22:00:12	Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Previous Message	Melanie Plageman	2022-02-25 21:31:23	Re: why do hash index builds use smgrextend() for new splitpoint pages