Re: pgsql: Generalize hash and ordering support in amapi

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Generalize hash and ordering support in amapi
Date: 2025-03-04 17:35:42
Message-ID: 165328.1741109742@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Peter Eisentraut <peter(at)eisentraut(dot)org> writes:
> On 27.02.25 23:17, Mark Dilger wrote:
>> The logic in equality_ops_are_compatible() was trusting that equality
>> operators found in an opfamily for btree or hash were ok, but not
>> trusting operators found in opfamilies of other AMs. Now, after the
>> patch, other AMs can be marked as suitable. That's really the core of
>> what the flag means: "Can the system trust that equality operators
>> found in opfamilies of the AM are well-behaved", or something like
>> that.

> Yeah, what might be a good English identifier for that?

Looking at the call sites, the callers of equality_ops_are_compatible
basically are interested in whether the two operators agree on which
values are distinct. That can be rephrased as "equality satisfies
the transitive law": if A = B according to one operator, while B = C
according to the other operator, then A = C according to some relevant
member of the opfamily (which might be yet a third operator).

The single caller of comparison_ops_are_compatible is
ineq_histogram_selectivity, which knows it is dealing with
inequality operators (<, >, <=, or >=), and what it wants to know
is whether the two operators use the same sort ordering.

So really the properties we want the AM to promise are "all operators
within one of my opfamilies have the same notion of equality" or "all
operators within one of my opfamilies use the same sort ordering".
You could reduce this to one property that means slightly different
things depending on whether amcanorder, perhaps, since "same sort
ordering" implies "same notion of equality". Maybe call it
"amconsistentsemantics"? OTOH, requiring amcanorder might be
unpleasantly much, since an AM might have btree-like operator
semantics and yet not support ordered retrieval. So perhaps two
separate flags "amconsistentequality" and "amconsistentordering"
would be better.

In any case, my gripe was less about the name of the flag and more
about the lack of a clear specification of what it means. "does AM
support cross-type comparisons" doesn't get the job done. Maybe
we can fit

/* do operators within an opfamily have consistent equality semantics? */
bool amconsistentequality;
/* do operators within an opfamily have consistent ordering semantics? */
bool amconsistentordering;

>> * This is trivially true if they are the same operator. Otherwise,
>> * we look to see if they can be found in the same btree or hash
>> opfamily.
>>
>> * This is trivially true if they are the same operator. Otherwise,
>> * we look to see if they can be found in the same btree opfamily.
>>
>> I agree these comments need updating.

> Mark, can you suggest updated wording for those?

Maybe "Otherwise, we look to see if they both belong to an opfamily
that guarantees compatible semantics for equality" (or "for ordering"
in the second case).

BTW, the header comment for query_is_distinct_for also needs updated:

* would use itself. We use equality_ops_are_compatible() to check
* compatibility. That looks at btree or hash opfamily membership, and so
* should give trustworthy answers for all operators that we might need
* to deal with here.)

Also, I'm thinking that equality_ops_are_compatible and
comparison_ops_are_compatible now have a bit of a performance problem.
The original coding was intended to provide a cheap check before
expending the cycles to test op_in_opfamily. This patch has
completely blown that up, since GetIndexAmRoutineByAmId is *more*
expensive than op_in_opfamily. I suggest reversing things so that we
test op_in_opfamily first and only bother to look up the AM details
when we've verified that both operators belong to the same family.

regards, tom lane

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2025-03-04 17:58:20 pgsql: Add .gitignore entry for ecpg test detritus.
Previous Message Tomas Vondra 2025-03-04 17:34:32 pgsql: Make FP_LOCK_SLOTS_PER_BACKEND look like a function