Quick Links

Re: Logical replication keepalive flood

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	Abbas Butt <abbas(dot)butt(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zahid Iqbal <zahid(dot)iqbal(at)enterprisedb(dot)com>
Subject:	Re: Logical replication keepalive flood
Date:	2021-06-10 06:48:00
Message-ID:	CAA4eK1LZ0dPr43w2-K0svhYM6y9+9pTiYRxOcXuuQqEgCfQKGA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jun 10, 2021 at 11:42 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Thu, 10 Jun 2021 15:00:16 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> > At Wed, 9 Jun 2021 17:32:25 +0500, Abbas Butt <abbas(dot)butt(at)enterprisedb(dot)com> wrote in
> > >
> > > On Wed, Jun 9, 2021 at 2:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > Is it possible that the write/flush location is not
> > > > updated at the pace at which we expect?
> >
> > Yes. MyWalSnd->flush/write are updated far frequently but still
> > MyWalSnd->write is behind sentPtr by from thousands of bytes up to
> > less than 1 block (1block = 8192 bytes). (Flush lags are larger than
> > write lags, of course.)
>
> For more clarity, I changed the previous patch a bit and retook numbers.
>
> Total records: 19476
> 8: 2 / 4 / 2: 4648 / 302472
> 16: 5 / 10 / 5: 5427 / 139872
> 24: 3006 / 6015 / 3028: 4739 / 267215
> 187: 2 / 0 / 50: 1 / 398
>
> While a 10 seconds run of pgbench, it walsender reads 19476 records
> and calls logical_read_xlog_page() 3028 times, and the mean of write
> lag is 4739 bytes and flush lag is 267215 bytes (really?), as the
> result most of the record fetch causes a keep alive. (The WAL contains
> many FPIs).
>

Good analysis. I think this analysis has shown that walsender is
sending messages at top speed as soon as they are generated. So, I am
wondering why there is any need to wait/sleep in such a workload. One
possibility that occurred to me RecentFlushPtr is not updated and or
we are not checking it aggressively. To investigate on that lines, can
you check the behavior with the attached patch? This is just a quick
hack patch to test whether we need to really wait for WAL a bit
aggressively.

--
With Regards,
Amit Kapila.

Attachment	Content-Type	Size
walsnd_check_wait_required_1.patch	application/octet-stream	1002 bytes

In response to

Re: Logical replication keepalive flood at 2021-06-10 06:12:31 from Kyotaro Horiguchi

Responses

Re: Logical replication keepalive flood at 2021-06-11 01:37:47 from Kyotaro Horiguchi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	tsunakawa.takay@fujitsu.com	2021-06-10 07:08:37	RE: Transactions involving multiple postgres foreign servers, take 2
Previous Message	Kyotaro Horiguchi	2021-06-10 06:12:31	Re: Logical replication keepalive flood