Re: Logical replication timeout problem

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication timeout problem
Date: 2022-01-22 11:11:42
Message-ID: CAA4eK1L2xNjQ7A6Wok00ai=7YB+YbVZCP18LywntDiFhFazqtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 21, 2022 at 10:45 PM Fabrice Chapuis
<fabrice636861(at)gmail(dot)com> wrote:
>
> I keep your patch 0001 and I add these two calls in function WalSndUpdateProgress without modifying WalSndKeepaliveIfNecessary, it works too.
> What do your think of this patch?
>

I think this will also work. Here, the point was to just check what is
the exact problem and the possible approach to solve it, the actual
patch might be different from these ideas. So, let me try to summarize
the problem and the possible approach to solve it so that others can
also share their opinion.

Here, the problem is that we don't send keep-alive messages for a long
time while processing large transactions during logical replication
where we don't send any data of such transactions (say because the
table modified in the transaction is not published). We do try to send
the keep_alive if necessary at the end of the transaction (via
WalSndWriteData()) but by that time the subscriber-side can timeout
and exit.

Now, one idea to solve this problem could be that whenever we skip
sending any change we do try to update the plugin progress via
OutputPluginUpdateProgress(for walsender, it will invoke
WalSndUpdateProgress), and there it tries to process replies and send
keep_alive if necessary as we do when we send some data via
OutputPluginWrite(for walsender, it will invoke WalSndWriteData). I
don't know whether it is a good idea to invoke such a mechanism for
every change we skip to send or we should do it after we skip sending
some threshold of continuous changes. I think later would be
preferred. Also, we might want to introduce a new parameter
send_keep_alive to this API so that there is flexibility to invoke
this mechanism as we don't need to invoke it while we are actually
sending data and before that, we just update the progress via this
API.

Thoughts?

Note: I have added Simon and Petr J. to this thread as they introduced
the API OutputPluginUpdateProgress in commit 024711bb54 and know this
part of code/design well but ideas suggestions from everyone are
welcome.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2022-01-22 13:18:33 Re: Document atthasmissing default optimization avoids verification table scan
Previous Message Amit Kapila 2022-01-22 09:41:46 Re: Skipping logical replication transactions on subscriber side