From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Sequence Access Methods, round two |
Date: | 2024-02-27 01:27:13 |
Message-ID: | Zd06cdyl0sqPrrVC@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 22, 2024 at 05:36:00PM +0100, Tomas Vondra wrote:
> I took a quick look at this patch series, mostly to understand how it
> works and how it might interact with the logical decoding patches
> discussed in a nearby thread.
Thanks. Both discussions are linked.
> 0001
> ------
>
> I think this bit in pg_proc.dat is not quite right:
>
> proallargtypes => '{regclass,bool,int8}', proargmodes => '{i,o,o}',
> proargnames => '{seqname,is_called,last_value}',
>
> the first argument should not be "seqname" but rather "seqid".
Ah, right. There are not many system functions that use regclass as
arguments, but the existing ones refer more to IDs, not names.
> 0002, 0003
> ------------
> seems fine, cosmetic changes
Applied these ones as 449e798c77ed and 6e951bf98e2e.
> 0004
> ------
>
> I don't understand this bit in AlterSequence:
>
> last_value = newdataform->last_value;
> is_called = newdataform->is_called;
>
> UnlockReleaseBuffer(buf);
>
> /* Check and set new values */
> init_params(pstate, stmt->options, stmt->for_identity, false,
> seqform, &last_value, &reset_state, &is_called,
> &need_seq_rewrite, &owned_by);
>
> Why set the values only to pass them to init_params(), which will just
> overwrite them anyway? Or do I get this wrong?
The values of "last_value" and is_called may not get updated depending
on the options given in the ALTER SEQUENCE query, and they need to use
as initial state what's been returned from their last heap lookup.
> Also, isn't "reset_state" just a different name for (original) log_cnt?
Yep. That's quite the point. That's an implementation detail
depending on the interface a sequence AM should use, but the main
argument behind this change is that log_cnt is a counter to decide
when to WAL-log the changes of a relation, but I have noticed that all
the paths of init_params() don't care about log_cnt as being a counter
at all: we just want to know if the state of a sequence should be
reset. Form_pg_sequence_data is a piece that only the in-core "local"
sequence AM cares about in this proposal.
> 0005
> ------
>
> I don't quite understand what "elt" stands for :-(
>
> stmt->tableElts = NIL;
>
> Do we need AT_AddColumnToSequence? It seems to work exactly like
> AT_AddColumn. OTOH we do have AT_AddColumnToView too ...
Yeah, that's just cleaner to use a separate one, to be able to detect
the attributes in the DDL deparsing pieces when gathering these pieces
with event triggers. At least that's my take once you extract the
piece that a sequence AM may need a table AM to store its data with
its own set of attributes (a sequence AM may as well not need a local
table for its data).
> Thinking about this code:
>
> case T_CreateSeqStmt:
> EventTriggerAlterTableStart(parsetree);
> address = DefineSequence(pstate, (CreateSeqStmt *) parsetree);
> /* stashed internally */
> commandCollected = true;
> EventTriggerAlterTableEnd();
> break;
>
> Does this actually make sense? I mean, are sequences really relations?
> Or was that just a side effect of storing the state in a heap table
> (which is more of an implementation detail)?
This was becoming handy when creating custom attributes for the
underlying table used by a sequence.
Sequences are already relations (views are also relations), we store
them in pg_class. Now sequences can also use tables internally to
store their data, like the in-core "local" sequence AM defined in the
patch. At least that's the split done in this patch set.
> 0007
> ------
> I wonder why heap_create_with_catalog needs to do this (check that it's
> a sequence):
>
> if ((RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) ||
> relkind == RELKIND_SEQUENCE)
>
> Presumably this is to handle sequences that use heap to store the state?
> Maybe the comment should explain that. Also, will the other table AMs
> need to do something similar, just in case some sequence happens to use
> that table AM (which seems out of control of the table AM)?
Okay, I can see why this part can be confusing with the state of
things in v2. In DefineRelation(), heap_create_with_catalog() passes
down the OID of the sequence access method when creating a sequence,
not the OID of the table AM it may rely on. There's coverage for that
in the regression tests if you remove the check, see the "Try to drop
and fail on dependency" in create_am.sql.
You have a good point here: there could be a dependency between a
table AM and a sequence AM that may depend on it. The best way to
tackle that would be to add a DEPENDENCY_NORMAL on the amhandler of
the table AM when dealing with a sequence amtype in
CreateAccessMethod() in this design. Does that make sense?
(This may or may not make sense depending on how the design problem
related to the relationship between a sequence AM and its optional
table AM is tackled, of course, but at least it makes sense to me in
the scope of the design of this patch set.)
> I don't understand why DefineSequence need to copy the string:
>
> stmt->accessMethod = seq->accessMethod ? pstrdup(seq->accessMethod)
> : NULL;
That's required to pass down the correct sequence AM for
DefineRelation() when creating the pg_class entry of a sequence.
> RelationInitTableAccessMethod now does not need to handle sequences, or
> rather should not be asked to handle sequences. Is there a risk we'd
> pass a sequence to the function anyway? Maybe an assert / error would be
> appropriate?
Hmm. The risk sounds legit. This is something where an assertion
based on RELKIND_HAS_TABLE_AM() would be useful. Same argument for
RelationInitSequenceAccessMethod() with RELKIND_HAS_SEQUENCE_AM()
suggested below. I've added these, for now.
> This bit in RelationBuildLocalRelation looks a bit weird ...
>
> if (RELKIND_HAS_TABLE_AM(relkind))
> RelationInitTableAccessMethod(rel);
> else if (relkind == RELKIND_SEQUENCE)
> RelationInitSequenceAccessMethod(rel);
>
> It's not a fault of this patch, but shouldn't we now have something like
> RELKIND_HAS_SEQUENCE_AM()?
Perhaps, I was not sure. This would just be a check on
RELKIND_SEQUENCE, but perhaps that's worth having at the end, and this
makes the code more symmetric in the relcache, for one. The comment
at the top of RELKIND_HAS_TABLE_AM is wrong with 0007 in place anyway.
> logical decoding / replication
> --------------------------------
> Now, regarding the logical decoding / replication, would introducing the
> sequence AM interfere with that in some way? Either in general, or with
> respect to the nearby patch.
I think it does not. The semantics of the existing in-core "local"
sequence AM are not changed. So what's here is just a large
refactoring shaped around the current semantics of the existing
computation method. Perhaps it should be smarter about some aspects,
but that's not something we'll know about until folks start
implementing their own custom methods. On my side, being able to plug
in a custom callback into nextval_internal() is the main taker.
> That is, what would it take to support logical replication of sequences
> with some custom sequence AM? I believe that requires (a) synchronizing
> the initial value, and (b) decoding the sequence WAL and (c) apply the
> decoded changes. I don't think the sequence AM breaks any of this, as
> long as it allows selecting "current value", decoding the values from
> WAL, sending them to the subscriber, etc.
Sure, that may make sense to support, particularly if one uses a
sequence AM that uses a computation method that may not be unique
across nodes, and where you may want to copy them. I don't think that
this is problem for something like the proposal of this thread or
what the other thread does, they can tackle separate areas (the
logirep patch has a lot of value for rolling upgrades where one uses
logical replication to create the new node and somebody does not want
to bother with a custom computation).
> I guess the decoding would be up to the RMGR, and this patch maintains
> the 1:1 mapping of sequences to relfilenodes, right? (That is, CREATE
> and ALTER SEQUENCE would still create a new relfilenode, which is rather
> important to decide if a sequence change is transactional.)
Yeah, one "local" sequence would have one relfilenode. A sequence AM
may want something different, like not using shared buffers, or just
not use a relfilenode at all.
> It seems to me this does not change the non-transactional behavior of
> sequences, right?
This patch set does nothing about the non-transactional behavior of
sequences. That seems out of scope to me from the start of what I
have sent here.
> alternative sequence AMs
> --------------------------
> I understand one of the reasons for adding sequence AMs is to allow
> stuff like global/distributed sequences, etc. But will people actually
> use that?
Good question. I have users who would be happy with that, hiding
behind sequences custom computations rather than plug in a bunch of
default expressions to various attributes. You can do that today, but
this has this limitations depending on how much control one has over
their applications (for example this cannot be easily achieved with
generated columns in a schema).
> For example, I believe Simon originally proposed this in 2016 because
> the plan was to implement distributed sequences in BDR on top of it. But
> I believe BDR ultimately went with a very different approach, not with
> custom sequence AMs. So I'm a bit skeptical this being suitable for
> other active-active systems ...
Snowflake IDs are popular AFAIK, thanks to the unicity of the values
across nodes.
> Especially when the general consensus seems to be that for active-active
> systems it's much better to use e.g. UUID, because that does not require
> any coordination between the nodes, etc.
That means being able to support something larger than 64b values as
these are 128b.
> I'm not claiming there are no use cases for sequence AMs, of course. For
> example the PRNG-based sequences mentioned by Mattias seems interesting.
> I don't know how widely useful that is, though, and if it's worth it
> (considering they managed to implement it in a different way).
Right. I bet that they just plugged a default expression to the
attributes involved. When it comes to users at a large scale, a
sequence AM makes the change more transparent, especially if DDL
queries are replication across multiple logical nodes.
> But I think it might be a good idea to implement a PoC of such sequence
> AM, if only to verify it can be implemented using the proposed code.
You mean the PRNG idea or something else? I have a half-baked
implementation for snowflake, actually. Would that be enough? I
still need to spend more hours on it to polish it. One part I found
more difficult than necessary with the patch set of this thread is the
APIs used in commands/sequence.c for the buffer manipulations,
requiring more duplications. Not impossible to do outside core, but
I've wanted more refactoring of the routines used by the "local"
sequence AM of this patch.
Plugging in a custom data type on top of the existing sequence objects
is something entirely different, where we will need a callback
separation anyway at the end, IMHO. This seems like a separate topic
to me at the end, as custom computations with 64b to store them is
enough based on what I've heard even for hundreds of nodes. I may be
wrong and may not think big enough, of course.
Attaching a v3 set, fixing one conflict, while on it.
--
Michael
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Switch-pg_sequence_last_value-to-report-a-tuple-a.patch | text/x-diff | 5.8 KB |
v3-0002-Remove-FormData_pg_sequence_data-from-init_params.patch | text/x-diff | 9.2 KB |
v3-0003-Integrate-addition-of-attributes-for-sequences-wi.patch | text/x-diff | 11.1 KB |
v3-0004-Move-code-for-local-sequences-to-own-file.patch | text/x-diff | 52.4 KB |
v3-0005-Sequence-access-methods-backend-support.patch | text/x-diff | 60.7 KB |
v3-0006-Sequence-access-methods-core-documentation.patch | text/x-diff | 9.6 KB |
v3-0007-Sequence-access-methods-dump-restore-support.patch | text/x-diff | 22.2 KB |
v3-0008-dummy_sequence_am-Example-of-sequence-AM.patch | text/x-diff | 10.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-02-27 01:27:24 | Re: Better error messages for %TYPE and %ROWTYPE in plpgsql |
Previous Message | David G. Johnston | 2024-02-27 00:54:05 | Re: Better error messages for %TYPE and %ROWTYPE in plpgsql |