Re: POC, WIP: OR-clause support for indexes

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, jian he <jian(dot)universality(at)gmail(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-10-09 09:31:49
Message-ID: 5d7a66e7-b256-41a7-905a-728c7ae54bce@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/4/24 20:34, Robert Haas wrote:
> On Mon, Sep 23, 2024 at 7:11 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>> Makes sense. Please, check the attached patch freeing the consts list
>> while returning NULL from match_orclause_to_indexcol().
> More generally, many of the comments in this patch seem to just
> explain what the code does, and I'd like to reiterate my usual
> complaint: as far as possible, comments should explain WHY the code
> does what it does. Certainly, in some cases there's nothing to be said
> about that e.g. /* Lookup for operator to fetch necessary information
> for the SAOP node */ isn't really saying anything non-obvious but it's
> reasonable to have the comment here anyway. However, when there is
> something more interesting to be said, then we should do that rather
> than just reiterate what the reader who knows C can anyway see. For
> instance, the lengthy comment beginning with "Iterate over OR
> entries." could either be shorter and recapitulate less of the code
> that follows, or it could say something more interesting about why
> we're doing it like that.
While I know Alexander is already working on this issue, the variants
provided in the attachment could offer some valuable insights (see 0001
and 0002).
>
> + /* We allow constant to be Const or Param */
> + if (!IsA(constExpr, Const) && !IsA(constExpr, Param))
> + break;
>
> This restriction is a lot tighter than the one mentioned in the header
> comment of match_clause_to_indexcol ("Our definition of const is
> exceedingly liberal"). If there's a reason for that, the comments
> should talk about it. If there isn't, it's better to be consistent.If we know the type of result we don't really need this additional
restriction. The only reason I had here is to avoid some strange and
ineffective cases like:

SELECT oid,typname FROM pg_type t1
WHERE typtypmod = ANY (ARRAY [1, 1+(
SELECT max(typtypmod) FROM pg_type t2
WHERE t1.typtypmod = t2.typtypmod)]);
QUERY PLAN
------------------------------------------------------------
Seq Scan on pg_type t1
Filter: (typtypmod = ANY (ARRAY[1, (1 + (SubPlan 2))]))
SubPlan 2
-> Result
InitPlan 1
-> Limit
-> Seq Scan on pg_type t2
Filter: (t1.typtypmod = typtypmod)

So, it is mostly about trade-off between benefit expected and planning
complexity. See a sketch of comment in 0003.

> I'm unclear what the current thinking is about the performance of this
> patch, both as to planning and as to execution. Do we believe that
> this transformation is a categorical win at execution-time? In theory,
> OR format alllows for short-circuit execution, but because of the
> Const-or-Param restriction above, I don't think that's mostly a
> non-issue. But maybe not completely, because I can see from the
> regression test changes that it's possible for us to apply this
> transformation when the Param is set by an InitPlan or SubPlan. If we
> have something like WHERE tenthous = 1 OR tenthous =
> (very_expensive_computation() + 1), maybe the patch could lose,
> because we'll have to do the very expensive calculation to evaluate
> the SAOP, and the OR could stop as soon as we establish that tenthous
> != 1. If we only did the transformation when the Param is an external
> parameter, then we wouldn't have this issue. Maybe this isn't worth
> worrying about; I'm not sure. Are there any other cases where the
> transformation can produce something that executes more slowly?
I have a couple of user reports in my pocket where changing the position
of the OR clause drastically (2-3 times) altered query execution time.
However, I think it is not a good way to optimise SQL queries the way
we use when coding in C.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
0001-Comments-for-the-0001-patch.patch text/x-patch 5.2 KB
0002-Comments-for-0002-patch.patch text/x-patch 2.9 KB
0003-Comment-on-restriction-of-OR-SAOP-element-type.patch text/x-patch 1.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Wienhold 2024-10-09 09:34:39 Re: pgindent fails with perl 5.40
Previous Message vignesh C 2024-10-09 09:23:57 Re: Pgoutput not capturing the generated columns