Re: Logical replication keepalive flood

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Abbas Butt <abbas(dot)butt(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zahid Iqbal <zahid(dot)iqbal(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Logical replication keepalive flood
Date: 2021-09-19 05:46:49
Message-ID: CAA4eK1Ja2XmK59Czv1V+tfOgU4mcFfDwTtTgO02Wd=rO02JbiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 17, 2021 at 3:03 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Thu, Sep 16, 2021 at 10:36 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > I think here the reason is that the first_lsn of a transaction is
> > always equal to end_lsn of the previous transaction (See comments
> > above first_lsn and end_lsn fields of ReorderBufferTXN).
>
> That may be the case, but those comments certainly don't make this clear.
>
> >I have not
> > debugged but I think in StreamLogicalLog() the cur_record_lsn after
> > receiving 'w' message, in this case, will be equal to endpos whereas
> > we expect to be greater than endpos to exit. Before the patch, it will
> > always get the 'k' message where we expect the received lsn to be
> > equal to endpos to conclude that we can exit. Do let me know if your
> > analysis differs?
> >
>
> Yes, pg_recvlogical seems to be relying on receiving a keepalive for
> its "--endpos" logic to work (and the 006 test is relying on '' record
> output from pg_recvlogical in this case).
> But is it correct to be relying on a keepalive for this?
>

I don't think this experiment/test indicates that pg_recvlogical's
"--endpos" relies on keepalive. It would just print the records till
--endpos and then exit. In the test under discussion, as per
confirmation by Hou-San, the BEGIN record received has the same LSN as
--endpos, so it would just output that and exit which is what is
mentioned in pg_recvlogical docs as well (If there's a record with LSN
exactly equal to lsn, the record will be output).

I think here the test case could be a culprit. In the original commit
eb2a6131be [1], where this test of the second time using
pg_recvlogical was added there were no additional Inserts (# Insert
some rows after $endpos, which we won't read.) which were later added
by a different commit 8222a9d9a1 [2]. I am not sure if the test added
by commit [2] was a good idea. It seems to be working due to the way
keepalives are being sent but otherwise, it can fail as per the
current design of pg_recvlogical.

[1]:
commit eb2a6131beccaad2b39629191508062b70d3a1c6
Author: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Date: Tue Mar 21 14:04:49 2017 +0000

Add a pg_recvlogical wrapper to PostgresNode

[2]:
commit 8222a9d9a12356349114ec275b01a1a58da2b941
Author: Noah Misch <noah(at)leadboat(dot)com>
Date: Wed May 13 20:42:09 2020 -0700

In successful pg_recvlogical, end PGRES_COPY_OUT cleanly.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-09-19 14:56:08 Re: Undocumented AT TIME ZONE INTERVAL syntax
Previous Message A Z 2021-09-19 05:42:32 Improved PostgreSQL Mathematics Support.