Re: Index AM API cleanup

From: Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alex Wang <alex(dot)wang(at)enterprisedb(dot)com>
Subject: Re: Index AM API cleanup
Date: 2025-01-05 23:20:07
Message-ID: c1b4f44f-8644-4513-a945-c1c60c79ee28@illuminatedcomputing.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/26/24 08:10, Mark Dilger wrote:
> Paul, it seems what you are doing in v39-0001-Add-stratnum-GiST-support-function.patch is similar
to what I am doing in v17-0012-Convert-strategies-to-and-from-row-compare-types.patch.

Thank you inviting me to share some thoughts here! The goals of this patch make a lot of sense to
me. I'm very interested in making the non-btree AMs more useful, both to users and to Postgres core
itself. Here is a breakdown of what I think so far:

- Strategy Numbers
- Can you create unique GiST indexes or not?
- Updating opclasses
- Logical replication
- Empty ranges

Most of this is pretty general and not a line-by-line review of the patches. I'd like to do that
too, but this email is already long and late.

# Strategy Numbers

I think this is the latest re strategy numbers:

On 12/4/24 06:49, Peter Eisentraut wrote:
> On 27.11.24 13:57, Peter Eisentraut wrote:
>> I think, however, that we should rename RowCompareType. Otherwise, it's just going to be
>> confusing forevermore. I suggest to rename it simply to CompareType.
>
>> I'm going to try to code up the gist support on top of this patch set to make sure that it will
>> fit well. I'll report back.
>
> Here is a patch set in that direction. It renames RowCompareType to CompareType and updates the
> surrounding commentary a bit. And then I'm changing the gist strategy mapping to use the
> CompareType values instead of the RT* strategy numbers. Seeing this now, I like this a lot better
> than what we have now, because it makes it clearer in the API and the code what is a real strategy
> number and what's a different kind of thing. (This isn't entirely the above-mentioned integration
> of the gist support into your patch set yet, but it's a meaningful part of it.)

I see that this supports COMPARE_OVERLAPS and COMPARE_CONTAINED_BY, which provides most of what
foreign keys require. Great!

But I also need the intersect operator (*). There is no stratnum for that. Even if I added a
stratnum, it is not really about an *index*. With my current approach, that made me uncomfortable
using pg_amop, since I'd need to invent a new amoppurpose. Intersect isn't used for search or
ordering, just for foreign keys (and later for temporal update/delete). Similarly, This new enum is
COMPARE_*, and intersect isn't a comparison.

Is there anything we can do about that? Is pg_amop really only for indexes, or can I add a new
amoppurpose and use it? And does this enum need to be only for indexes, or can it be for other
things too? Maybe instead of COMPARE_* it can be OPERATION_*.

Today I just hardcode the built-in intersect operator for ranges and multiranges, but I would rather
have a solution that lets people define foreign keys on arbitrary types. Temporal primary keys
support arbitrary types (well other than the empty-range issue, which is something I think we can
resolve), as do temporal UPDATE and DELETE, so getting foreign keys to support them would complete
the picture. To me this goal is more faithful to Postgres's heritage as an "object-relational"
database, a philosophy that has made it so extensible.

As I mentioned, UPDATE/DELETE FOR PORTION OF also needs intersect. But it only needs a proc, not an
operator. For that I added another GiST support proc, intersect, that is just the same proc that
implements the operator. But if foreign keys had a way to get an operator for any type, then FOR
PORTION OF could use that too, because an operator implies a proc (but not vice versa).
Then I'd need one less support function.

Or here is another approach: I was thinking that instead of a `stratnum` support proc, used to get
strategy numbers (whether by well-known stratnum or by COMPARE_*), we could have a support proc that
*returns operator oids* (again, whether by well-known strategy number or by COMPARE_*). Instead of
`stratnum` we could call it something like `get_operator`. I can't think of any other use for
strategy numbers than looking up operators. The existing callsites of GistTranslateStratnum all
immediately call get_opfamily_member (and Peter's patches don't change that). What if we had a
higher-level wrapper of get_opfamily_member that took the same parameters, but used this new support
function (if defined) when called for non-btree AMs?

Then (1) foreign keys could use it get the intersect operator (2) I could drop the FOR PORTION OF
support proc. (The stratnum support proc only exists in v18devel, so it's still easy to change.)

One problem is: how do extension authors know the oids of their operators? I guess they could look
them up in pg_operator (via pg_depend?) But we should acknowledge that it's harder for them to get
oids than for core. Or maybe this isn't such a big deal after all: pg_operator is unique on
(oprname, oprleft, oprright, oprnamespace).

Are there other problems? It does seem like a risk to have a second "source of truth", especially
one that is implemented as code rather than data. What if the code changes? But still I like this
approach. Non-btree indexes can't really use pg_amop anyway, because amopstrategy doesn't mean
anything. I'm happy to make a commit replacing the stratnum support proc with get_operator as I've
described it.

But then what is its input? If it's a stratnum, there is no problem: I'll add an
RTIntersectStrategyNumber. If it's a COMPARE_*, then we still want a non-index-specific name for
this enum.

Either approach to getting an intersect operator seems fine to me: using pg_amop with a third
amoppurpose, or replacing stratnum with get_operator. But both seem to expand an index-specific
abstraction to include something new. What do you think?

# Can you create unique GiST indexes or not?

Postgres lets you ask about the capabilities of different AMs. For instance:

postgres=# select amname, pg_indexam_has_property(oid, 'can_unique') from pg_am where amtype = 'i';
amname | pg_indexam_has_property
--------+-------------------------
btree | t
hash | f
gist | f
gin | f
spgist | f
brin | f
(6 rows)

So if GiST can't support unique indexes, why can you do this?:

postgres=# create extension btree_gist;
CREATE EXTENSION
postgres=# create table t (id int, valid_at daterange, unique (id, valid_at without overlaps));
CREATE TABLE
postgres=# \d t
Table "public.t"
Column | Type | Collation | Nullable | Default
----------+-----------+-----------+----------+---------
id | integer | | |
valid_at | daterange | | |
Indexes:
"t_id_valid_at_key" UNIQUE (id, valid_at WITHOUT OVERLAPS)

And also:

postgres=# select indisunique from pg_index where indexrelid = 't_id_valid_at_key'::regclass;
indisunique
-------------
t

But:

postgres=# create unique index idx_t on t using gist (id, valid_at);
ERROR: access method "gist" does not support unique indexes

It seems like we have an inconsistency, as Matthew van de Meent brought up here (cc'd):
https://www.postgresql.org/message-id/CAEze2WiD%2BU1BuJDLGL%3DFXxa8hDxNALVE6Jij0cNXjp10Q%3DnZHw%40mail.gmail.com

The reason is that GiST *might* support unique indexes, but it depends on the opclass. (And note
GiST has supported effectively-unique indexes via exclusion constraints for a long time.)

From another perspective, the error message is correct: you can't create a unique GiST index using
CREATE UNIQUE INDEX. You can only create a UNIQUE/PRIMARY KEY *constraint*, which creates a unique
GiST index to implement it. But that is not a very satisfying answer.

But since we are improving stratnums, we could allow the above command, even with what is already
committed today: just see if we can find an equals operator, and create the index. I'm interested in
working on that myself, but the other temporal patches are a higher priority.

In the meantime perhaps we can improve the error message to sound less contradictory. How about
"access method 'gist' does not support unique indexes without a WITHOUT OVERLAPS constraint"?

And anyway, creating simply-unique GiST indexes is not very valuable, when you could just create a
btree index instead. The main reason is legalistic, so we aren't guilty of contradicting our error
message. It doesn't satisfy Matthew's use-case. He wants to avoid long exclusive locks by doing
CREATE INDEX CONCURRENTLY then ADD CONSTRAINT USING INDEX. But then you'd need to CREATE UNIQUE
INDEX with overlaps for the last element, not equals. There is no syntax for that. Perhaps we could
borrow the syntax for constraints: CREATE UNIQUE INDEX idx ON t USING gist (id, valid_at WITHOUT
OVERLAPS). It's not part of the standard though, and it might have parse conflicts with COLLATE or
an opclass name.

But that *still* doesn't get there, because (1) the index has to enforce (temporal) uniqueness
before you add the constraint (2) it needs to allow CONCURRENTLY. Exclusion constraints don't
enforce their checks while updating the index, but with a separate search (IIRC), and the details
would probably have to move from pg_constraint.conexclop to pg_index.indexclop (which doesn't exist
today). To pre-create *any* exclusion constraint index, we'd need a more general syntax than WITHOUT
OVERLAPS. Something like (id WITH =, valid_at WITH &&).

And then that has to work with CONCURRENTLY. You *can* create a GiST index CONCURRENTLY today, but
not one that enforces anything. There was work already to allow CONCURRENTLY to REINDEX exclusion
constraint indexes, but I assume it's tricky, because it has stalled twice:
see
https://www.postgresql.org/message-id/flat/CAB7nPqS%2BWYN021oQHd9GPe_5dSVcVXMvEBW_E2AV9OOEwggMHw%40mail.gmail.com#e1a372074cfdf37bf9e5b4e29ddf7b2d
from 2012 and
https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee
from 2019. So there are some substantial gaps to fill, and I think the first step for now is just
updating the error message.

What else can we do here? We have pg_indexam_has_property, pg_index_has_property, and
pg_index_column_has_property. Do we want a pg_opclass_has_property?

Or here is a more modest suggestion: we could permit pg_index_has_property to answer can_unique
questions. Today it returns NULL for that property. Perhaps it should (by default) answer whatever
the AM can do. Then we could update gistproperty (which today only answers column-level inquiries),
and return indisunique. To be honest that feels like an oversight that should have been included in
the temporal primary key patch. I'll sign up to do that if people think it makes sense.

Another thought: should pg_indexam_has_property(783, 'can_unique') return NULL, on the theory that
NULL means "unknown", and as of pg 18 GiST uniqueness is no longer "false" but now "it depends"?
Maybe that is too cute. If we did that, probably it should be paired with a way to get a definitive
answer for the scenario you care about (without creating an index first). pg_opclass_has_property
could do that. Or maybe we add a second pg_indexam_has_property that also takes an array of opclass
oids?

# Logical replication

I had to teach logical replication how to handle temporal index constraints. It wants a unique
identifier for REPLICA IDENTITY. Even with WITHOUT OVERLAPS, the combination of index keys do
identify a unique record. But again, logical replication needs to know which operators implement
equals for non-btree indexes. I think the discussion in this thread has already covered this
sufficiently. Peter's COMPARE_* change doesn't introduce any problems from what I can see.

# Updating opclasses

Today you can alter an opfamily, but you can't alter an opclass. So to add stratnum support
functions, the btree_gist upgrade script does this:

ALTER OPERATOR FAMILY gist_int4_ops USING gist ADD
FUNCTION 12 (int4, int4) gist_stratnum_btree (int2) ;

Past upgrade scripts have done similar things.

But that doesn't achieve quite the same thing as putting the function in CREATE OPERATOR CLASS. Do
we want a way to attach a new support function to an *opclass*? There are some problems to solve
with staleness, the relcache for example I think, but nothing seems insurmountable. Maybe it risks
incoherent indexes, if you change how they are computed partway through, but if so it was probably
true already with ALTER OPERATOR FAMILY.

# Empty ranges

This is probably not relevant to this discussion, but just in case: in the v17 cycle we had to
revert temporal PKs because empty ranges allowed duplicate records (because 'empty' && 'empty' is
false, so there was no conflict). The solution was to reject empty ranges in a temporal PK or UNIQUE
constraint, where they really don't make sense. Is there something generalized the index AM API
could offer for this kind of thing?

Yours,

--
Paul ~{:-)
pj(at)illuminatedcomputing(dot)com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2025-01-06 01:57:23 Re: Proposal: add new API to stringinfo
Previous Message Peter Smith 2025-01-05 21:52:07 Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size