Re: Taking into account syncrep position in flush_lsn reported by apply worker

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Arseny Sher <ars(at)neon(dot)tech>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Taking into account syncrep position in flush_lsn reported by apply worker
Date: 2024-08-20 20:55:12
Message-ID: ff0c1433-6294-4643-9f4d-b445913dd9e4@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14/08/2024 16:54, Arseny Sher wrote:
> On 8/13/24 06:35, Amit Kapila wrote:
>> On Mon, Aug 12, 2024 at 3:43 PM Arseny Sher <ars(at)neon(dot)tech> wrote:
>>>
>>> Sorry for the poor formatting of the message above, this should be
>>> better:
>>>
>>> Hey. Currently synchronous_commit is disabled for logical apply worker
>>> on the ground that reported flush_lsn includes only locally flushed data
>>> so slot (publisher) preserves everything higher than this, and so in
>>> case of subscriber restart no data is lost. However, imagine that
>>> subscriber is made highly available by standby to which synchronous
>>> replication is enabled. Then reported flush_lsn is ignorant of this
>>> synchronous replication progress, and in case of failover data loss may
>>> occur if subscriber managed to ack flush_lsn ahead of syncrep.
>>
>> Won't the same can be achieved by enabling the synchronous_commit
>> parameter for a subscription?
>
> Nope, because it would force WAL flush and wait for replication to the
> standby in the apply worker, slowing down it. The logic missing
> currently is not to wait for the synchronous commit, but still mind its
> progress in the flush_lsn reporting.

I think this patch makes sense. I'm not sure we've actually made any
promises on it, but it feels wrong that the slot's LSN might be advanced
past the LSN that's been has been acknowledged by the replica, if
synchronous replication is configured. I see little downside in making
that promise.

> + /*
> + * If synchronous replication is configured, take into account its position.
> + */
> + if (SyncRepStandbyNames != NULL && SyncRepStandbyNames[0] != '\0')
> + {
> + LWLockAcquire(SyncRepLock, LW_SHARED);
> + local_flush = Min(local_flush, WalSndCtl->lsn[SYNC_REP_WAIT_FLUSH]);
> + LWLockRelease(SyncRepLock);
> + }
> +

Should probably use the SyncStandbysDefined() macro here. Or check
WalSndCtl->sync_standbys_defined like SyncRepWaitForLSN() does; not sure
which would be more appropriate here.

Should the synchronous_commit setting also affect this?

Please also check if the docs need to be updated, or if a paragraph
should be added somewhere on this behavior.

A TAP test case would be nice. Not sure how complicated it will be, but
if not too complicated, it'd be nice to include it in check-world.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-08-20 21:21:33 Re: Remove dependence on integer wrapping
Previous Message Jelte Fennema-Nio 2024-08-20 19:55:31 Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs