Quick Links

Re: pg_replication_slot_advance to return NULL instead of 0/0 if slot not advanced

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pg_replication_slot_advance to return NULL instead of 0/0 if slot not advanced
Date:	2018-06-05 04:20:58
Message-ID:	20180605042058.GA5840@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jun 04, 2018 at 11:51:35AM +0200, Petr Jelinek wrote:
> On 01/06/18 21:13, Michael Paquier wrote:
>> - startlsn = MyReplicationSlot->data.confirmed_flush;
>> + if (OidIsValid(MyReplicationSlot->data.database))
>> + startlsn = MyReplicationSlot->data.confirmed_flush;
>> + else
>> + startlsn = MyReplicationSlot->data.restart_lsn;
>> +
>> if (moveto < startlsn)
>> {
>> ReplicationSlotRelease();
>
> This part looks correct for the checking that we are not moving
> backwards. However, there is another existing issue with this code which
> is that we are later using the confirmed_flush (via startlsn) as start
> point of logical decoding (XLogReadRecord parameter in
> pg_logical_replication_slot_advance) which is not correct. The
> restart_lsn should be used for that. I think it would make sense to fix
> that as part of this patch as well.

I am not sure I understand what you are coming at here. Could you
explain your point a bit more please as 9c7d06d is yours? When creating
the decoding context, all other code paths use the confirmed LSN as a
start point if not explicitely defined by the caller of
CreateDecodingContext, as it points to the last LSN where a transaction
has been committed and decoded. The backward check is also correct to
me, for which I propose to add a comment block like that:
+ /*
+ * Check if the slot is not moving backwards. Physical slots rely
+ * simply on restart_lsn as a minimum point, while logical slots
+ * have confirmed consumption up to confirmed_lsn, meaning that
+ * in both cases data older than that is not available anymore.
+ */
+ if (OidIsValid(MyReplicationSlot->data.database))
+ minlsn = MyReplicationSlot->data.confirmed_flush;
+ else
+ minlsn = MyReplicationSlot->data.restart_lsn;

Any tests I do are showing me that using confirmed_lsn would not matter
much? as we want the slot's consumer to still decode transactions whose
commits happened after the point where the slot has been advanced to.
So let's make sure that we are on the same page for the starting
LSN used.

On top of that, the locking issues in CreateInitDecodingContext() and
DecodingContextFindStartpoint go back to 9.4. So I would be inclined to
get 0001 applied first as a bug fix on all branches, still that's a
minor issue so there could be arguments for just doing it on HEAD. I am
as well fully open to suggestions for the extra comments which document
the use of ReplicationSlotControlLock and mutex for in-memory slot data.
Any thoughts about those two last points?
--
Michael

Attachment	Content-Type	Size
0001-Fix-and-document-lock-handling-for-in-memory-replica.patch	text/x-diff	4.7 KB

In response to

Re: pg_replication_slot_advance to return NULL instead of 0/0 if slot not advanced at 2018-06-04 09:51:35 from Petr Jelinek

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2018-06-05 04:22:58	Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Previous Message	Melanie Plageman	2018-06-05 04:10:38	Bug in either collation docs or code