From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Andrew Dunstan <andrew(dot)dunstan(at)enterprisedb(dot)com>, Alex Wang <alex(dot)wang(at)enterprisedb(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
Subject: | Re: Index AM API cleanup |
Date: | 2024-10-30 10:54:22 |
Message-ID: | ee517290-bb63-41de-be95-c8733b337af7@eisentraut.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I want to call out one particular aspect that is central to this patch
series that needs more broader discussion and agreement.
The problem is that btree indexes (and to a lesser extent hash
indexes) are hard-coded in various places. This includes various
places in the optimizer (see patch v17-0018) that might be harder to
understand, but also more easy-to-understand places such as:
- Only hash and btree are supported for replica identity. (see patch
v17-0015)
- Only btree is supported for speculative insertion. (see patch v17-0016)
- Only btree is supported for foreign keys. (see patch v17-0017)
- Only btree can participate in merge join. (see patch v17-0020)
The problem in cases such as these, and some of them actually contain
code comments about this, is that we somehow need to know what the
equality operator (and in some cases ordering operator) for a type is.
And the way this is currently done is that the btree and hash strategy
numbers are hardcoded, and other index AMs are just not supported
because there is no way to find that out from those.
We faced a similar issue for the temporal primary keys feature. One
thing this feature had to overcome is that
- Only btree is supported for primary keys.
For that feature, we wanted to add support for gist indexes, and we
had to again find a way to get the equals operator (and also the
overlaps operator).
The solution we ended up with there is to add a support function to
gist that translates the strategy numbers used by the operator class
to "well-known" strategy numbers that we can understand (commit
7406ab623fe). It later turned out that this problem actually might be
a subset of the general problem being discussed here, so we can also
change that solution if we find a more general solution here.
So the question is, how do we communicate these fundamental semantics
of operators?
The solution we discussed for temporal primary keys is that we just
mandate that certain strategy numbers have fixed meanings. This is
how btree and hash work anyway, but this was never required for gist.
And there are many gist opclass implementations already out there that
don't use compatible strategy numbers, so that solution would not have
worked (at least without some upgrading pain). Hence the idea of
mapping the actual strategy numbers to a fixed set of well-known ones.
For the general case across all index AMs, we could similarly decree
that all index AMs use certain strategy numbers for fixed purposes.
But again, this already wouldn't work for gist and perhaps others
already out there. So here again we need some kind of mapping to
numbers with a fixed purpose. Or perhaps new index AMs can start out
with the "correct" numbers and only some existing ones would need the
mapping.
And then what do you map them to?
One idea would be to map them to the existing btree numbers. This
could work, but I always found it kind of confusing that you'd then
have multiple sets of strategy numbers in play, one native to the
index AM or opclass, and one sort of global one.
Another idea is to map them to another thing altogether, a new enum.
This is what this patch series does, essentially. But it turns out
(so argues the patch series), that this enum already exists, namely
typedef enum RowCompareType
{
/* Values of this enum are chosen to match btree strategy numbers */
ROWCOMPARE_LT = 1, /* BTLessStrategyNumber */
ROWCOMPARE_LE = 2, /* BTLessEqualStrategyNumber */
ROWCOMPARE_EQ = 3, /* BTEqualStrategyNumber */
ROWCOMPARE_GE = 4, /* BTGreaterEqualStrategyNumber */
ROWCOMPARE_GT = 5, /* BTGreaterStrategyNumber */
ROWCOMPARE_NE = 6, /* no such btree strategy */
} RowCompareType;
which in spite of the name, yes, sounds like exactly that kind of
thing. And it's also used in many places for this kind of thing
already.
The patch series adds index AM callbacks
amtranslatestrategy
amtranslaterctype
to convert between strategy number and RowCompareType, and also adds
ROWCOMPARE_NONE
ROWCOMPARE_INVALID
to handle cases where there is no mapping. I suppose for the temporal
PK use, we would need to add something like ROWCOMPARE_OVERLAPS.
So this is the idea. To take a step back, I can see the following
options:
1. Require all index AMs to use btree-compatible strategy numbers.
(Previously rejected, too much upgrading mess.)
2. Provide mapping between index AM strategy numbers and btree strategy
numbers. (This doesn't have a space for non-btree operations like
overlaps.)
3. Provide mapping between index AM strategy numbers and the existing
RT*StrategyNumber numbers. (We can expand the set as we want.)
(This is what the existing mapping function for gist does.)
4. Provide mapping between index AM strategy numbers and some
completely new set of numbers/symbols.
5. Provide mapping between index AM strategy numbers and
RowCompareType (with some small extensions). This is what this
patch does.
Other ideas? Thoughts?
From | Date | Subject | |
---|---|---|---|
Next Message | Nikita Malakhov | 2024-10-30 11:22:42 | Re: Avoid detoast overhead when possible |
Previous Message | Heikki Linnakangas | 2024-10-30 09:44:55 | Re: freespace.c modifies buffer without any locks |