Re: Logical replication keepalive flood

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: abbas(dot)butt(at)enterprisedb(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, zahid(dot)iqbal(at)enterprisedb(dot)com
Subject: Re: Logical replication keepalive flood
Date: 2021-06-07 07:23:53
Message-ID: 20210607.162353.1202919828973013934.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Sat, 5 Jun 2021 16:08:00 +0500, Abbas Butt <abbas(dot)butt(at)enterprisedb(dot)com> wrote in
> Hi,
> I have observed the following behavior with PostgreSQL 13.3.
>
> The WAL sender process sends approximately 500 keepalive messages per
> second to pg_recvlogical.
> These keepalive messages are totally un-necessary.
> Keepalives should be sent only if there is no network traffic and a certain
> time (half of wal_sender_timeout) passes.
> These keepalive messages not only choke the network but also impact the
> performance of the receiver,
> because the receiver has to process the received message and then decide
> whether to reply to it or not.
> The receiver remains busy doing this activity 500 times a second.

I can reproduce the problem.

> On investigation it is revealed that the following code fragment in
> function WalSndWaitForWal in file walsender.c is responsible for sending
> these frequent keepalives:
>
> if (MyWalSnd->flush < sentPtr &&
> MyWalSnd->write < sentPtr &&
> !waiting_for_ping_response)
> WalSndKeepalive(false);

The immediate cause is pg_recvlogical doesn't send a reply before
sleeping. Currently it sends replies every 10 seconds intervals.

So the attached first patch stops the flood.

That said, I don't think it is not intended that logical walsender
sends keep-alive packets with such a high frequency. It happens
because walsender actually doesn't wait at all because it waits on
WL_SOCKET_WRITEABLE because the keep-alive packet inserted just before
is always pending.

So as the attached second, we should try to flush out the keep-alive
packets if possible before checking pg_is_send_pending().

Any one can "fix" the issue but I think each of them is reasonable by
itself.

Any thoughts, suggestions and/or opinions?

regareds.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pg_recvlogical_send_reply_before_sleep.patch text/x-patch 465 bytes
walsender_flush_keepalive_packet_before_sleep.patch text/x-patch 590 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-06-07 07:30:57 Re: contrib/pg_visibility fails regression under CLOBBER_CACHE_ALWAYS
Previous Message Anton Voloshin 2021-06-07 07:16:18 back-port one-line gcc-10+ warning fix to REL_10_STABLE