Re: POC, WIP: OR-clause support for indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, jian he <jian(dot)universality(at)gmail(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-10-04 13:34:23
Message-ID: CA+TgmoZ3tQ3Zwzo+9uXQL_tf0rDHQHKeZ4bGt1LDvL+_dqghUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 23, 2024 at 7:11 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> Makes sense. Please, check the attached patch freeing the consts list
> while returning NULL from match_orclause_to_indexcol().

Some review comments:

I agree with the comments already given to the effect that the patch
looks much better now. I was initially surprised to see this happening
in match_clause_to_indexcol() but after studying it I think it looks
like the right place. I think it makes sense to think about moving
forward with this, although it would be nice to get Tom's take if we
can.

I see that the patch makes no update to the header comment for
match_clause_to_indexcol() nor to the comment just above the cascade
of if-statements. I think both need to be updated.

More generally, many of the comments in this patch seem to just
explain what the code does, and I'd like to reiterate my usual
complaint: as far as possible, comments should explain WHY the code
does what it does. Certainly, in some cases there's nothing to be said
about that e.g. /* Lookup for operator to fetch necessary information
for the SAOP node */ isn't really saying anything non-obvious but it's
reasonable to have the comment here anyway. However, when there is
something more interesting to be said, then we should do that rather
than just reiterate what the reader who knows C can anyway see. For
instance, the lengthy comment beginning with "Iterate over OR
entries." could either be shorter and recapitulate less of the code
that follows, or it could say something more interesting about why
we're doing it like that.

+ /* We allow constant to be Const or Param */
+ if (!IsA(constExpr, Const) && !IsA(constExpr, Param))
+ break;

This restriction is a lot tighter than the one mentioned in the header
comment of match_clause_to_indexcol ("Our definition of const is
exceedingly liberal"). If there's a reason for that, the comments
should talk about it. If there isn't, it's better to be consistent.

+ /*
+ * Check operator is present in the opfamily, expression collation
+ * matches index collation. Also, there must be an array type in
+ * order to construct an array later.
+ */
+ if (!IndexCollMatchesExprColl(index->indexcollations[indexcol],
inputcollid) ||
+ !op_in_opfamily(matchOpno, index->opfamily[indexcol]) ||
+ !OidIsValid(arraytype))
+ break;

I spent some time wondering whether this was safe. The
IndexCollMatchesExprColl() guarantees that either the input collation
is equal to the index collation, or the index collation is 0. If the
index collation is 0 then that I *think* that guarantees that the
indexed type is non-collatable, but this could be a cross-type
comparison, and it's possible that the other type is collatable. In
that case, I don't think anything would prevent us from merging a
bunch of OR clauses with different collations into a single SAOP. I
don't really see how that could be a problem, because if the index is
of a non-collatable type, then presumably the operator doesn't care
about what the collation is, so it should all be fine, I guess? But
I'm not very confident about that conclusion.

I'm unclear what the current thinking is about the performance of this
patch, both as to planning and as to execution. Do we believe that
this transformation is a categorical win at execution-time? In theory,
OR format alllows for short-circuit execution, but because of the
Const-or-Param restriction above, I don't think that's mostly a
non-issue. But maybe not completely, because I can see from the
regression test changes that it's possible for us to apply this
transformation when the Param is set by an InitPlan or SubPlan. If we
have something like WHERE tenthous = 1 OR tenthous =
(very_expensive_computation() + 1), maybe the patch could lose,
because we'll have to do the very expensive calculation to evaluate
the SAOP, and the OR could stop as soon as we establish that tenthous
!= 1. If we only did the transformation when the Param is an external
parameter, then we wouldn't have this issue. Maybe this isn't worth
worrying about; I'm not sure. Are there any other cases where the
transformation can produce something that executes more slowly?

As far as planning time is concerned, I don't think this is going to
be too bad, because most of the work only needs to be done if there
are OR-clauses, and my intuition is that the optimization will often
apply in such cases, so it seems alright. But I wonder how much
testing has been done of adversarial cases, e.g. lots of non-indexable
clause in the query; or lots of OR clauses in the query but all of
them turn out on inspection to be non-indexable. My expectation would
be that there's no real problem here, but it would be good to verify
that experimentally.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2024-10-04 13:55:23 New PostgreSQL Contributors
Previous Message Pavel Stehule 2024-10-04 13:28:52 Re: SET ROLE and parameter status