From: | Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: protocol-level wait-for-LSN |
Date: | 2024-10-30 17:17:47 |
Message-ID: | CANwKhkO1nKj4EdY_s4hFte3RTZpZvAZ2z0yG2ezbnqCrms+K+w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 28 Oct 2024 at 17:51, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
> This is something I hacked together on the way back from pgconf.eu.
> It's highly experimental.
>
> The idea is to do the equivalent of pg_wal_replay_wait() on the protocol
> level, so that it is ideally fully transparent to the application code.
> The application just issues queries, and they might be serviced by a
> primary or a standby, but there is always a correct ordering of reads
> after writes.
The idea is great, I have been wanting something like this for a long
time. For future proofing it might be a good idea to not require the
communicated-waited value to be a LSN.
In a sharded database a Lamport timestamp would allow for sequential
consistency. Lamport timestamp is just some monotonically increasing
value that is eagerly shared between all communicating participants,
including clients. For a single cluster LSNs work fine for this
purpose. But with multiple shards LSNs will not work, unless arranged
as a vector clock which is what I think Matthias proposed.
Even without sharding LSN might not be a final choice. Right now on
the primary the visibility order is not LSN order. So if a connection
does synchronous_commit = off commit, the write location is not even
going to see the commit. By publishing the end of the commit record it
would be better. But I assume at some point we would like to have a
consistent visibility order, which quite likely means using something
other than LSN as the logical clock.
I see the patch names the field LSN, but on the protocol level and for
the client library this is just an opaque 127 byte token. So basically
I'm thinking the naming could be more generic. And for a complete
Lamport timestamp implementation we would need the capability of
extracting the last seen value and another set-if-greater update
operation.
--
Ants Aasma
www.cybertec-postgresql.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2024-10-30 17:17:49 | Re: Parallel heap vacuum |
Previous Message | Andres Freund | 2024-10-30 16:45:27 | Re: AIO writes vs hint bits vs checksums |