Re: Physical replication slot advance is not persistent

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, simon(at)2ndquadrant(dot)com
Subject: Re: Physical replication slot advance is not persistent
Date: 2020-01-28 08:01:14
Message-ID: 20200128080114.GB145179@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 21, 2020 at 02:07:30PM +0900, Michael Paquier wrote:
> On Mon, Jan 20, 2020 at 11:00:14AM -0800, Andres Freund wrote:
>> Hm. I'm think testing this with real LSNs is a better idea. What if the
>> node actually already is past FF/FFFFFFFF at this point? Quite unlikely,
>> I know, but still. I.e. why not get the current LSN after the INSERT,
>> and assert that the slot, after the restart, is that?
>
> Sure. If not disabling autovacuum in the test, we'd just need to make
> sure if that advancing is at least ahead of the INSERT position.

Actually, as the advancing happens only up to this position we just
need to make sure that the LSN reported by the slot is the same as the
position advanced to. I have switched the test to just do that
instead of using a fake LSN.

> Anyway, I am still not sure if we should got down the road to just
> mark the slot as dirty if any advancing is done and let the follow-up
> checkpoint to the work, if the advancing function should do the slot
> flush, or if we choose one and make the other an optional choice (not
> for back-branches, obviously. Based on my reading of the code, my
> guess is that a flush should happen at the end of the advancing
> function.

I have been chewing on this point for a couple of days, and as we may
actually crash between the moment the slot is marked as dirty and the
moment the slot information is made consistent, we still have a risk
to have the slot go backwards even if the slot information is saved.
The window is much narrower, but well, the docs of logical decoding
mention that this risk exists. And the patch becomes much more
simple without changing the actual behavior present since the feature
has been introduced for logical slots. There could be a point in
having a new option to flush the slot information, or actually a
separate function to flush the slot information, but let's keep that
for a future possibility.

So attached is an updated patch which addresses the problem just by
marking a physical slot as dirty if any advancing is done. Some
documentation is added about the fact that an advance is persistent
only at the follow-up checkpoint. And the tests are fixed to not use
a fake LSN but instead advance to the latest LSN position produced.

Any objections?
--
Michael

Attachment Content-Type Size
v7-0001-Make-physical-slot-advance-to-be-persistent.diff text/x-diff 6.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-01-28 08:02:36 Re: Allow to_date() and to_timestamp() to accept localized names
Previous Message Amit Kapila 2020-01-28 08:00:29 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions