Re: Sequence Access Methods, round two

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Sequence Access Methods, round two
Date: 2024-06-20 06:12:32
Message-ID: ZnPIUPMmp5TzBPC2@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 26, 2024 at 03:21:29PM +0900, Michael Paquier wrote:
> The next plan is to hopefully be able to trigger a discussion at the
> next pgconf.dev at the end of May, but let's see how it goes.

I am a bit behind an update of this thread, but there has been an
unconference on the topic at the last pgconf.dev. This is based on my
own notes written down after the session, so if there are gaps, feel
free to correct me. The session was called "Sequences & Clusters",
and was in two parts, with the first part covering this thread, and
the second part covering the problem of sequences with logical
replication for upgrade cases. I've taken a lot of time with the 1st
part (sorry about that Amit K.!) still the second part has reached an
agreement about what to do next there, and this is covered by this
thread these days:
https://www.postgresql.org/message-id/CAA4eK1LC%2BKJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ%40mail.gmail.com

My overall feeling before this session was that I did not feel that
folks grabbed the problem I was trying to solve, and, while it did not
feel that the end of the session completely filled the gaps, and least
folks finished with some idea of the reason why I've been trying
something here.

First, I have spoken for a few minutes about the use-cases I've been
trying to solve, where parts of it involve Postgres-XC, an
auto-proclaimed multi-master solution fork of Postgres, where
sequences are handled by patching src/backend/commands/sequence.c to
retrieve values from a GTM (global transaction manager, source of
truth for value uniqueness shared by all the nodes), something I got
my hands on between 2009~2012 (spoiler: people tend to like more
scaling out clusters 12 years later). Then explained why Postgres is
not good in this area. The original idea is that we want to be able
for some applications to scale out Postgres across many hosts while
making it transparent to the user's applications. By that, imagine a
huge big box where users can connect to a single point, but
underground any connection could involve a connection to a cluster of
N PostgreSQL nodes, N being rather large (say N > 10k?).

Why would we want that? One problem behind such configurations is
that there is no way to make the values transparent for the
application without applying schema changes (attribute defaults, UUIDs
but these are large, for example), meaning that schemas cannot really
be migrated as-they-are from one space (be it a Postgres cluster of 1
or more nodes) to a second space (with less more or more nodes), and
having to manipulate clusters with ALTER SEQUENCE commands to ensure
that there is no overlap in value does not help much to avoid support
at 3AM in case of sudden value conflicts because an application has
gone wild, especially if the node fleet needs to be elastic and
flexible (yep, there's also that). Note that there are also limits
with generated columns that feed from the in-core sequence computation
of Postgres where all the sequence data is stored in a pseudo-heap
table, relying on buffer locks to make sure that in-place updates are
concurrent safe. So this thread is about extending the set of
possibilities in this area for application developers to control how
sequences are computed.

First here is a summary of the use cases that have been mentioned
where a custom computation is handy, based on properties that I've
understood from the conversation:
- Control of computation of values on a node and cluster-basis,
usually coming with three properties (put a snowflake ID here):
-- Global component, usually put in the first bits to force an
ordering of the values across all the nodes. For snowflakes, this is
covered by a timestamp, to which an offset can be applied.
-- Local component, where a portion of the value bits are decided
depending on the node where the value is computed.
-- Local incrementation, where the last bits in the value are used to
loop if the two first ones happen to be equal, to ensure uniqueness.
- Cache range of values at node level or session level, retrieved from
a unique source shared by multiple nodes. The range of values is
retrieved from a single source (PostgreSQL node itself), cached in a
shared pool in a node or just a backend context for consumption by a
session.
- Transactional behavior to minimize value gaps, which is something I
have mentioned but I'm a bit meh on this property as value uniqueness
is key, while users have learnt to live with value gaps. Still the
APIs can make that possible if autovacuum is able to understand that
some clean up needs to happen.

Another set of things that have been mentioned:
- Is it even correct to call this concept an access method? Should a
different keyword be used? This depends on the stack layer where the
callbacks associated to a sequence are added, I assume. Still, based
on the infrastructure that we already have in place for tables and
indexes (commands, GUCs), this is still kind of the correct concept to
me because we can rely on a lot of existing infrastructure, but I also
get that depending on one's view the opinion diverges.
- More pluggable layers. The final picture will most likely involve
multiple layers of APIs, and not only what's proposed here, with
xpoints mentioned about:
-- Custom data types. My answer on this one is that this will need to
be controlled by a different clause. I think that this is a different
feature than the "access method" approach proposed here that would
need to happen on top of what's here, where the point is to control
the computation (and anything I've seen lately would unlock up to 64b
of computation space hidden behind integer-like data types). Other
cluster products out there have also a concept of user-related data
types, which have to be integer-like.
-- Custom nextval() functions. Here we are going to need a split
between the in-core portion of sequences related to system catalogs
and the facilities that can be accessed once a sequence OID is known.
The patch proposed plugs into nextval_internal() for two reasons:
being able to let CACHE be handled by the core code and not the AM,
and easier support for generated columns with the existing types where
nextval_internal() is called from the executor. This part, also,
is going to require a new SQL clause. Perhaps something will happen
at some point in the SQL specification itself to put some guidelines,
who knows.

While on it, I have noticed a couple of conflicts while rebasing, so
attached is a refreshed patch set.

Thanks,
--
Michael

Attachment Content-Type Size
v6-0001-Switch-pg_sequence_last_value-to-report-a-tuple-a.patch text/x-diff 5.8 KB
v6-0002-Remove-FormData_pg_sequence_data-from-init_params.patch text/x-diff 9.2 KB
v6-0003-Integrate-addition-of-attributes-for-sequences-wi.patch text/x-diff 11.1 KB
v6-0004-Refactor-code-for-in-core-local-sequences.patch text/x-diff 54.0 KB
v6-0005-Sequence-access-methods-backend-support.patch text/x-diff 60.9 KB
v6-0006-Sequence-access-methods-dump-restore-support.patch text/x-diff 21.0 KB
v6-0007-Sequence-access-methods-core-documentation.patch text/x-diff 9.6 KB
v6-0008-snowflake-Add-sequence-AM-based-on-it.patch text/x-diff 28.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2024-06-20 06:32:57 Re: Add pg_get_acl() function get the ACL for a database object
Previous Message Peter Eisentraut 2024-06-20 05:55:07 pg_combinebackup --clone doesn't work