Quick Links

Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

From:	"Hsu, John" <hsuchen(at)amazon(dot)com>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Date:	2022-03-01 02:04:59
Message-ID:	e87ddfa6-18a2-4093-737d-e031b94b1a7e@amazon.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> The async walsender looks at flush LSN from
> walsndctl->lsn[SYNC_REP_WAIT_FLUSH]; after it comes up and decides to
> send the WAL up to it. If there are no sync replicats after it comes
> up (users can make sync standbys async without postmaster restart
> because synchronous_standby_names is effective with SIGHUP), then it
> doesn't wait at all and continues to send WAL. I don't see any problem
> with it. Am I missing something here? Assuming I understand the code correctly, we have: > SendRqstPtr =
GetFlushRecPtr(NULL); In this contrived example let's say
walsndctl->lsn[SYNC_REP_WAIT_FLUSH] is always 60s behind
GetFlushRecPtr() and for whatever reason, if the walsender hasn't
replicated anything in 30s it'll terminate and re-connect. If
GetFlushRecPtr() keeps advancing and is always 60s ahead of the sync
LSN's then we would never stream anything, even though it's advanced
past what is safe to stream previously.
> I will correct it. "async standby WAL sender with request LSN %X/%X is > waiting as sync standbys are ahead with flush LSN %X/%X", >
LSN_FORMAT_ARGS(sendRqstP), LSN_FORMAT_ARGS(flushLSN). I will think >
more about having better wording of these messages, any suggestions > here?
"async standby WAL sender with request LSN %X/%X is waiting for sync
standbys at LSN %X/%X to advance past it" Not sure if that's really
clearer...

> I too observed this once or twice. It looks like the walsender isn't
> detecting postmaster death in for (;;) with WalSndWait. Not sure if >
this is expected or true with other wait-loops in walsender code. Any >
more thoughts here? Unfortunately I haven't had a chance to dig into it
more although iirc I hit it fairly often. Thanks, John H

In response to

Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) at 2022-02-28 18:57:32 from Nathan Bossart

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	osumi.takamichi@fujitsu.com	2022-03-01 02:19:12	RE: Optionally automatically disable logical replication subscriptions on error
Previous Message	osumi.takamichi@fujitsu.com	2022-03-01 02:04:10	RE: Failed transaction statistics to measure the logical replication progress