Re: pg_replication_slot_advance to return NULL instead of 0/0 if slot not advanced

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_replication_slot_advance to return NULL instead of 0/0 if slot not advanced
Date: 2018-06-06 14:57:22
Message-ID: 4ff6b139-1647-cad9-1477-3b6aa44c666d@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 06/06/18 04:04, Michael Paquier wrote:
> On Tue, Jun 05, 2018 at 01:00:30PM +0200, Petr Jelinek wrote:
>> I didn't say anything about CreateDecodingContext though. I am talking
>> about the fact that we then use the same variable as input to
>> XLogReadRecord later in the logical slot code path. The whole point of
>> having restart_lsn for logical slot is to have correct start point for
>> XLogReadRecord so using the confirmed_flush there is wrong (and has been
>> wrong since the original commit).
>
> OK, I finally see the point you are coming at after a couple of hours
> brainstorming about it, and indeed using confirmed_lsn is logically
> incorrect so the current code is inconsistent with what
> DecodingContextFindStartpoint() does, so we rely on restart_lsn to scan
> for all the records and the decoder state is here to make sure that the
> slot's confirmed_lsn is updated to a consistent position. I have added
> your feedback in the attached (indented and such), which results in some
> simplifications with a couple of routines.
>
> I am attaching as well the patch I sent yesterday. 0001 is candidate
> for a back-patch, 0002 is for HEAD to fix the slot advance stuff.
>
> There is another open item related to slot advancing here:
> https://www.postgresql.org/message-id/2840048a-1184-417a-9da8-3299d207a1d7%40postgrespro.ru
> And per my tests the patch is solving this item as well. I have just
> used the test mentioned in the thread which:
> 1) creates a slot
> 2) does some activity to generate a couple of WAL pages
> 3) advances the slot at page boundary
> 4) Moves again the slot.
> This test crashes on HEAD at step 4, and not with the attached.
>
> What do you think?

Seems reasonable to me.

I think the only thing to note about the patches from my side is that we
probably don't want to default to restart_lsn for the
pg_logical_replication_slot_advance() return value (when nothing was
done) but rather the confirmed_lsn. As it is in current patch if we call
the function repeatedly and one call moved slot forward but the next one
didn't the return value will go backwards as restart_lsn tends to be
behind the confirmed one.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-06-06 15:10:11 Re: PATCH pass PGOPTIONS to pg_regress
Previous Message Peter Eisentraut 2018-06-06 14:53:01 Re: Why is fncollation in FunctionCallInfoData rather than fmgr_info?