From: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
---|---|
To: | Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Possible bug in logical replication. |
Date: | 2018-06-21 10:18:44 |
Message-ID: | 70586d95-965e-18b0-ac99-3050ec1fe88e@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 20/06/18 09:59, Arseny Sher wrote:
>
> Michael Paquier <michael(at)paquier(dot)xyz> writes:
>
>> On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:
>>> On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
>>> It seems to me that we still want to have the slot forwarding finish in
>>> this case even if this is interrupted. Petr, isn't that the intention
>>> here?
>>
>> I have been chewing a bit more on the proposed patch, finishing with the
>> attached to close the loop. Thoughts?
>
> Sorry for being pedantic, but it seems to me worthwhile to mention *why*
> we need decoding machinery at all -- like I wrote:
>
> + * While we could just do LogicalConfirmReceivedLocation updating
> + * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
> + * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples).
>
+1
> Also,
>
>> * The slot's restart_lsn is used as start point for reading records,
>
> This is clearly seen from the code, I propose to remove this.
Given there was just bug fix done for this I guess the extra clarity
there does not hurt.
>
>> * The LSN position to move to is checked by doing a per-record scan and
>> * logical decoding which makes sure that confirmed_lsn is updated to a
>> * LSN which allows the future slot consumer to get consistent logical
>> - * changes.
>> + * changes. As decoding is done with fast_forward mode, no changes are
>> + * actually generated.
>
> confirmed_lsn is always updated to `moveto` unless we run out of WAL
> earlier (and unless we try to move slot backwards, which is obviously
> forbidden) -- consistent changes are practically irrelevant
> here. Moreover, we can directly set confirmed_lsn and still have
> consistent changes further as restart_lsn and xmin of the slot are not
> touched. What we actually do here is trying to advance *restart_lsn and
> xmin* as far as we can but up to the point which ensures that decoding
> can assemble a consistent snapshot allowing to fully decode all COMMITs
> since updated `confirmed_flush_lsn`. All this happens in
> SnapBuildProcessRunningXacts.
>
Those are two different things, here is consistent snapshot for logical
decoding without which we can't decode and that's handled by restart_lsn
and xmin. But the consistent stream of data for the consumer is handled
by confirmed_lsn (and the comment says that). You have to take into
account that next use of the slot can consume data (ie may be done via
one of the other interfaces and not by move). So I think what Michael
has here is correct.
>> It seems to me that we still want to have the slot forwarding finish in
>> this case even if this is interrupted. Petr, isn't that the intention
>> here?
>
Well, it seems wasteful to just exit there if we already finished all
the requested work, also gives some consistency with the coding of
get/peek_changes. Not very important for the functionality either way.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2018-06-21 10:28:01 | Re: Sample values for pg_stat_statements |
Previous Message | Amit Langote | 2018-06-21 09:48:50 | Re: bug with expression index on partition |