Re: protocol-level wait-for-LSN

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Cc: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: protocol-level wait-for-LSN
Date: 2024-11-04 10:08:03
Message-ID: CAEze2WibWJVmjtSzsSgirvxbNQz_1W8KcuEmC4Ro5X=ODqhffw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 30 Oct 2024, 18:45 Jelte Fennema-Nio, <postgres(at)jeltef(dot)nl> wrote:
>
> On Wed, 30 Oct 2024 at 18:18, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> wrote:
> > The idea is great, I have been wanting something like this for a long
> > time. For future proofing it might be a good idea to not require the
> > communicated-waited value to be a LSN.
>
> Yours and Matthias' feedback make total sense I think. From an
> implementation perspective I think there are a few things necessary to
> enable these wider usecases:
> 1. The token should be considered opaque for clients (should be documented)

I disagree. It is critical that a consumer knows what to do with the
output. Blindly passing it around is not a valid strategy: In my
example of keeping track of replication slots the client also has to
keep track of every cluster ID to make it work correctly, as every
postgres instance may only know about a subset of other PG instances:
A client would have to know how to discern and how to merge the
returned set of [cluster_id, LSN] pairs into its own view of a global
progress:

Say, you connect to cluster A, which receives changes from clusters X
and Y, cluster B, which receives from X and Z, and cluster C, which
receives from all of X, Y, and Z. Cluster B should ignore [Y_ID, Lsn],
as keeping the [cluster id, LSN] pair around would be sensitive to
resource attacks, but the client will have to merge the response from
that scluster to make sure it doesn't accidentally "go back in time"
when it switches from cluster A or B to another cluster with the "wait
for this minimal replication state" 'token'.

> > Even without sharding LSN might not be a final choice. Right now on
> > the primary the visibility order is not LSN order. So if a connection
> > does synchronous_commit = off commit, the write location is not even
> > going to see the commit. By publishing the end of the commit record it
> > would be better. But I assume at some point we would like to have a
> > consistent visibility order, which quite likely means using something
> > other than LSN as the logical clock.

Or have CSN=LSN -based snapshots on the primary, too, as that also
would solve the unordered visibility issue on the primary, as well as
the unacknowledged read issue.

> I was going to say that the default could probably still be LSN, but
> this makes me doubt that. Is there some other token that we can send
> now that we could "wait" on instead of the LSN, which would work for.
> If not, I think LSN is still probably a good choice as the default. Or
> maybe only as a default in case synchronous_commit != off.

I don't see how we can have anything but LSN as 'wait-for-this'
condition, as everything else could appear out-of-order in the WAL (we
don't allow the record to be modified during
XLogInsert()/ReserveXLogInsertLocation()), and WAL is our one source
of truth for change capture.

PS. I have other complaints about timestamp-based
replication/snapshots, but unless someone thinks otherwise and/or it
is made relevant I'll consider that off-topic.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2024-11-04 10:36:25 Re: [PATCH] Rename trim_array() for consistency with the rest of array_* functions
Previous Message Bertrand Drouvot 2024-11-04 10:07:37 Re: Clear padding in PgStat_HashKey keys