Re: Unnecessary confirm work on logical replication

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: emre(at)hasegeli(dot)com, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unnecessary confirm work on logical replication
Date: 2023-11-03 21:34:43
Message-ID: bccca330-6958-d01f-09e1-7042f2dcf895@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/11/23 14:58, Emre Hasegeli wrote:
>> In fact, the WAL sender always starts reading WAL from restart_lsn,
>> which in turn is always <= confirmed_flush_lsn. While reading WAL, WAL
>> sender may read XLOG_RUNNING_XACTS WAL record with lsn <=
>> confirmed_flush_lsn. While processing XLOG_RUNNING_XACTS record it may
>> update its restart_lsn and catalog_xmin with current_lsn = lsn fo
>> XLOG_RUNNING_XACTS record. In this situation current_lsn <=
>> confirmed_flush_lsn.
>
> This can only happen when the WAL sender is restarted. However in
> this case, the restart_lsn and catalog_xmin should have already been
> persisted by the previous run of the WAL sender.
>
> I still doubt these calls are necessary. I think there is a
> complicated chicken and egg problem here. Here is my logic:
>
> 1) LogicalConfirmReceivedLocation() is called explicitly when
> confirmed_flush is sent by the replication client.
>
> 2) LogicalConfirmReceivedLocation() is the only place that updates
> confirmed_flush.
>
> 3) The replication client can only send a confirmed_flush for a
> current_lsn it has already received.
>
> 4) These two functions have already run for any current_lsn the
> replication client has received.
>
> 5) These two functions call LogicalConfirmReceivedLocation() only if
> current_lsn <= confirmed_flush.
>
> Thank you for your patience.
>

Hi Emre,

I was going through my TODO list of messages, and I stumbled on this
thread from a couple months back. Do you still think this is something
we should do?

I see there was some discussion about whether this update is needed, or
whether current_lsn can ever be <= confirmed_flush_lsn. I think you may
be right this can't happen, but I wonder if we could verify that by an
assert in a convenient place (instead of just removing the update).

Also, did you try to quantify how expensive this is? The update seems
very cheap, but I guess you just noticed by eye-balling the code, which
is fine ofc. Even if it's cheap/not noticeable, it still may be worth
removing so as not to confuse people improving the code in the future.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-11-03 21:56:42 Draft back-branch release notes are up
Previous Message Andres Freund 2023-11-03 21:16:01 Re: meson documentation build open issues