From: | "Hsu, John" <hsuchen(at)amazon(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) |
Date: | 2022-03-01 02:04:59 |
Message-ID: | e87ddfa6-18a2-4093-737d-e031b94b1a7e@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> The async walsender looks at flush LSN from
> walsndctl->lsn[SYNC_REP_WAIT_FLUSH]; after it comes up and decides to
> send the WAL up to it. If there are no sync replicats after it comes
> up (users can make sync standbys async without postmaster restart
> because synchronous_standby_names is effective with SIGHUP), then it
> doesn't wait at all and continues to send WAL. I don't see any problem
> with it. Am I missing something here? Assuming I understand the code correctly, we have: > SendRqstPtr =
GetFlushRecPtr(NULL); In this contrived example let's say
walsndctl->lsn[SYNC_REP_WAIT_FLUSH] is always 60s behind
GetFlushRecPtr() and for whatever reason, if the walsender hasn't
replicated anything in 30s it'll terminate and re-connect. If
GetFlushRecPtr() keeps advancing and is always 60s ahead of the sync
LSN's then we would never stream anything, even though it's advanced
past what is safe to stream previously.
> I will correct it. "async standby WAL sender with request LSN %X/%X is > waiting as sync standbys are ahead with flush LSN %X/%X", >
LSN_FORMAT_ARGS(sendRqstP), LSN_FORMAT_ARGS(flushLSN). I will think >
more about having better wording of these messages, any suggestions > here?
"async standby WAL sender with request LSN %X/%X is waiting for sync
standbys at LSN %X/%X to advance past it" Not sure if that's really
clearer...
> I too observed this once or twice. It looks like the walsender isn't
> detecting postmaster death in for (;;) with WalSndWait. Not sure if >
this is expected or true with other wait-loops in walsender code. Any >
more thoughts here? Unfortunately I haven't had a chance to dig into it
more although iirc I hit it fairly often. Thanks, John H
From | Date | Subject | |
---|---|---|---|
Next Message | osumi.takamichi@fujitsu.com | 2022-03-01 02:19:12 | RE: Optionally automatically disable logical replication subscriptions on error |
Previous Message | osumi.takamichi@fujitsu.com | 2022-03-01 02:04:10 | RE: Failed transaction statistics to measure the logical replication progress |