From: | "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com> |
Subject: | RE: Logical replication timeout problem |
Date: | 2022-03-09 02:26:14 |
Message-ID: | OS3PR01MB62750A1360AB7DF6E8F40A909E0A9@OS3PR01MB6275.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 8, 2022 at 3:52 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> I've looked at the patch and have a question:
Thanks for your review and comments.
> +void
> +SendKeepaliveIfNecessary(LogicalDecodingContext *ctx, bool skipped) {
> + static int skipped_changes_count = 0;
> +
> + /*
> + * skipped_changes_count is reset when processing changes that do not
> + * need to be skipped.
> + */
> + if (!skipped)
> + {
> + skipped_changes_count = 0;
> + return;
> + }
> +
> + /*
> + * After continuously skipping SKIPPED_CHANGES_THRESHOLD
> changes, try to send a
> + * keepalive message.
> + */
> + #define SKIPPED_CHANGES_THRESHOLD 10000
> +
> + if (++skipped_changes_count >= SKIPPED_CHANGES_THRESHOLD)
> + {
> + /* Try to send a keepalive message. */
> + OutputPluginUpdateProgress(ctx, true);
> +
> + /* After trying to send a keepalive message, reset the flag. */
> + skipped_changes_count = 0;
> + }
> +}
>
> Since we send a keepalive after continuously skipping 10000 changes, the
> originally reported issue can still occur if skipping 10000 changes took more than
> the timeout and the walsender didn't send any change while that, is that right?
Yes, theoretically so.
But after testing, I think this value should be conservative enough not to reproduce
this bug.
After the previous discussion[1], it is currently considered that it is better
to directly set a conservative threshold than to calculate the threshold based
on wal_sender_timeout.
Regards,
Wang wei
From | Date | Subject | |
---|---|---|---|
Next Message | wangw.fnst@fujitsu.com | 2022-03-09 02:27:35 | RE: Logical replication timeout problem |
Previous Message | wangw.fnst@fujitsu.com | 2022-03-09 02:25:15 | RE: Logical replication timeout problem |