From: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
---|---|
To: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org, andres(at)anarazel(dot)de |
Subject: | Re: Walsender timeouts and large transactions |
Date: | 2017-05-30 13:44:25 |
Message-ID: | 97e5e70a-87d2-dee1-0482-7e3609822095@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 30/05/17 11:02, Kyotaro HORIGUCHI wrote:
> At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote in <e082a56a-fd95-a250-3bae-0fff93832510(at)2ndquadrant(dot)com>
>> Hi,
>>
>> We have had issue with walsender timeout when used with logical decoding
>> and the transaction is taking long time to be decoded (because it
>> contains many changes)
>>
>> I was looking today at the walsender code and realized that it's because
>> if the network and downstream are fast enough, we'll always take fast
>> path in WalSndWriteData which does not do reply or keepalive processing
>> and is only reached once the transaction has finished by other code. So
>> paradoxically we die of timeout because everything was fast enough to
>> never fall back to slow code path.
>>
>> I propose we only use fast path if the last processed reply is not older
>> than half of walsender timeout, if it is then we'll force the slow code
>> path to process the replies again. This is similar logic that we use to
>> determine if to send keepalive message. I also added CHECK_INTERRUPRS
>> call to fast code path because otherwise walsender might ignore them for
>> too long on large transactions.
>>
>> Thoughts?
>
> + TimestampTz now = GetCurrentTimestamp();
>
> I think it is not recommended to read the current time too
> frequently, especially within a loop that hates slowness. (I
> suppose that a loop that can fill up a send queue falls into that
Yeah that was my main worry for the patch as well, although given that
the loop does tuple processing it might not be very noticeable.
> category.) If you don't mind a certain amount of additional
> complexity for eliminating the possible slowdown by the check,
> timeout would be usable. Attached patch does almost the same
> thing with your patch but without busy time check.
>
> What do you think about this?
>
I think we could do it that way.
> # I saw that SIGQUIT doens't work for active publisher, which I
> # think mention in another thread.
Ah missed that email I guess, but that's the missing CHECK_INTERRUPTS();
in the fast-path which btw your updated patch is missing as well.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-05-30 14:02:56 | Re: [HACKERS] [PATCH] relocation truncated to fit: citus build failure on s390x |
Previous Message | Tom Lane | 2017-05-30 13:41:57 | Re: [COMMITTERS] Re: pgsql: Code review focused on new node types added by partitioning supp |