From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | petr(dot)jelinek(at)2ndquadrant(dot)com |
Cc: | pgsql-hackers(at)postgresql(dot)org, andres(at)anarazel(dot)de |
Subject: | Re: Walsender timeouts and large transactions |
Date: | 2017-05-30 09:02:19 |
Message-ID: | 20170530.180219.188282269.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote in <e082a56a-fd95-a250-3bae-0fff93832510(at)2ndquadrant(dot)com>
> Hi,
>
> We have had issue with walsender timeout when used with logical decoding
> and the transaction is taking long time to be decoded (because it
> contains many changes)
>
> I was looking today at the walsender code and realized that it's because
> if the network and downstream are fast enough, we'll always take fast
> path in WalSndWriteData which does not do reply or keepalive processing
> and is only reached once the transaction has finished by other code. So
> paradoxically we die of timeout because everything was fast enough to
> never fall back to slow code path.
>
> I propose we only use fast path if the last processed reply is not older
> than half of walsender timeout, if it is then we'll force the slow code
> path to process the replies again. This is similar logic that we use to
> determine if to send keepalive message. I also added CHECK_INTERRUPRS
> call to fast code path because otherwise walsender might ignore them for
> too long on large transactions.
>
> Thoughts?
+ TimestampTz now = GetCurrentTimestamp();
I think it is not recommended to read the current time too
frequently, especially within a loop that hates slowness. (I
suppose that a loop that can fill up a send queue falls into that
category.) If you don't mind a certain amount of additional
complexity for eliminating the possible slowdown by the check,
timeout would be usable. Attached patch does almost the same
thing with your patch but without busy time check.
What do you think about this?
# I saw that SIGQUIT doens't work for active publisher, which I
# think mention in another thread.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
Fix-walsender-timeouts-by-timeout.patch | text/x-patch | 2.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2017-05-30 09:26:00 | Re: [COMMITTERS] Re: pgsql: Code review focused on new node types added by partitioning supp |
Previous Message | Simon Riggs | 2017-05-30 08:24:46 | Re: pg_resetwal is broken if run from v10 against older version of PG data directory |