Re: Logical replication timeout problem

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2023-02-08 05:27:52
Message-ID: 20230208052752.h5ugnhd5ew655ymv@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-02-03 10:13:54 +0530, Amit Kapila wrote:
> I am planning to push this to HEAD sometime next week (by Wednesday).
> To backpatch this, we need to fix it in some non-standard way, like
> without introducing a callback which I am not sure is a good idea. If
> some other committers vote to get this in back branches with that or
> some different idea that can be backpatched then we can do that
> separately as well. I don't see this as a must-fix in back branches
> because we have a workaround (increase timeout) or users can use the
> streaming option (for >=14).

I just saw the commit go in, and a quick scan over it makes me think neither
this commit, nor f95d53eded, which unfortunately was already backpatched, is
the right direction. The wrong direction likely started quite a bit earlier,
with 024711bb544.

It feels quite fundamentally wrong that bascially every output plugin needs to
call a special function in nearly every callback.

In 024711bb544 there was just one call to OutputPluginUpdateProgress() in
pgoutput.c. Quite tellingly, it just updated pgoutput, without touching
test_decoding.

Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another.

This makes no sense. There's lots of output plugins out there. There's an
increasing number of callbacks. This isn't a maintainable path forward.

If we want to call something to maintain state, it has to be happening from
central infrastructure.

It feels quite odd architecturally that WalSndUpdateProgress() ends up
flushing out writes - that's far far from obvious.

I don't think:
/*
* Wait until there is no pending write. Also process replies from the other
* side and check timeouts during that.
*/
static void
ProcessPendingWrites(void)

Is really a good name. What are we processing? What are we actually waiting
for - because we don't actually wait for the data to sent out or anything,
just that they're in a network buffer.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-02-08 05:36:08 Re: Exit walsender before confirming remote flush in logical replication
Previous Message Dilip Kumar 2023-02-08 05:03:38 Re: Improve WALRead() to suck data directly from WAL buffers when possible