From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com> |
Subject: | Re: Logical replication timeout problem |
Date: | 2023-02-08 20:02:35 |
Message-ID: | 20230208200235.esfoggsmuvf4pugt@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-02-08 10:30:37 -0800, Andres Freund wrote:
> On 2023-02-08 10:18:41 -0800, Andres Freund wrote:
> > I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
> > consider what happens if e.g. the origin filter filters out entire
> > transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases
> > we'll be lucky because we'll return quickly to XLogSendLogical(), but not
> > reliably.
>
> Is it actually the right thing to check SyncRepRequested() in that logic? It's
> quite common to set up syncrep so that individual users or transactions opt
> into syncrep, but to leave the default disabled.
>
> I don't really see an alternative to making this depend solely on
> sync_standbys_defined.
Hacking on a rough prototype how I think this should rather look, I had a few
questions / remarks:
- We probably need to call UpdateProgress from a bunch of places in decode.c
as well? Indicating that we're lagging by a lot, just because all
transactions were in another database seems decidedly suboptimal.
- Why should lag tracking only be updated at commit like points? That seems
like it adds odd discontinuinities?
- The mix of skipped_xact and ctx->end_xact in WalSndUpdateProgress() seems
somewhat odd. They have very overlapping meanings IMO.
- there's no UpdateProgress calls in pgoutput_stream_abort(), but ISTM there
should be? It's legit progress.
- That's from 6912acc04f0: I find LagTrackerRead(), LagTrackerWrite() quite
confusing, naming-wise. IIUC "reading" is about receiving confirmation
messages, "writing" about the time the record was generated. ISTM that the
current time is a quite poor approximation in XLogSendPhysical(), but pretty
much meaningless in WalSndUpdateProgress()? Am I missing something?
- Aren't the wal_sender_timeout / 2 checks in WalSndUpdateProgress(),
WalSndWriteData() missing wal_sender_timeout <= 0 checks?
- I don't really understand why f95d53edged55 added !end_xact to the if
condition for ProcessPendingWrites(). Is the theory that we'll end up in an
outer loop soon?
Attached is a current, quite rough, prototype. It addresses some of the points
raised, but far from all. There's also several XXXs/FIXMEs in it. I changed
the file-ending to .txt to avoid hijacking the CF entry.
Greetings,
Andres Freund
Attachment | Content-Type | Size |
---|---|---|
v1-0001-WIP-Initial-sketch-of-progress-update-rework.txt | text/plain | 26.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bagga, Rishu | 2023-02-08 20:04:52 | Re: SLRUs in the main buffer pool - Page Header definitions |
Previous Message | Peter Smith | 2023-02-08 19:08:27 | Re: Deadlock between logrep apply worker and tablesync worker |