Re: Possible bug in logical replication.

From: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Subject: Re: Possible bug in logical replication.
Date: 2018-06-20 07:59:28
Message-ID: 87woutq467.fsf@ars-thinkpad
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Michael Paquier <michael(at)paquier(dot)xyz> writes:

> On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:
>> On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
>> It seems to me that we still want to have the slot forwarding finish in
>> this case even if this is interrupted. Petr, isn't that the intention
>> here?
>
> I have been chewing a bit more on the proposed patch, finishing with the
> attached to close the loop. Thoughts?

Sorry for being pedantic, but it seems to me worthwhile to mention *why*
we need decoding machinery at all -- like I wrote:

+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
+ * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples).

Also,

> * The slot's restart_lsn is used as start point for reading records,

This is clearly seen from the code, I propose to remove this.

> * while confirmed_lsn is used as base point for the decoding context.

And as I wrote, this doesn't matter as changes are not produced.

> * The LSN position to move to is checked by doing a per-record scan and
> * logical decoding which makes sure that confirmed_lsn is updated to a
> * LSN which allows the future slot consumer to get consistent logical
> - * changes.
> + * changes. As decoding is done with fast_forward mode, no changes are
> + * actually generated.

confirmed_lsn is always updated to `moveto` unless we run out of WAL
earlier (and unless we try to move slot backwards, which is obviously
forbidden) -- consistent changes are practically irrelevant
here. Moreover, we can directly set confirmed_lsn and still have
consistent changes further as restart_lsn and xmin of the slot are not
touched. What we actually do here is trying to advance *restart_lsn and
xmin* as far as we can but up to the point which ensures that decoding
can assemble a consistent snapshot allowing to fully decode all COMMITs
since updated `confirmed_flush_lsn`. All this happens in
SnapBuildProcessRunningXacts.

> It seems to me that we still want to have the slot forwarding finish in
> this case even if this is interrupted. Petr, isn't that the intention
> here?

Probably, though I am not sure what is the point of this. Ok, I keep
this check in the updated (with your comments) patch and CC'ing Petr.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
lslot_advance_comments.patch text/x-diff 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergei Kornilov 2018-06-20 08:02:05 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Rajkumar Raghuwanshi 2018-06-20 07:52:26 ERROR: ORDER/GROUP BY expression not found in targetlist