Re: POC, WIP: OR-clause support for indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-06-24 15:28:13
Message-ID: CA+TgmoY=NFe2gdLB5QOkU0ALfNUXrbdFWpVAd-WYRGAU1EZoCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 21, 2024 at 6:52 PM Alena Rybakina
<a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> It's hard to tell, but I think it might be one of the good places to apply transformation. Let me describe a brief conclusion on the two approaches.

This explanation is somewhat difficult for me to follow. For example:

> In the first approach, we definitely did not process the extra "OR" expressions in the first approach, since they were packaged as an Array. It could lead to the fact that less planning time would be spent on the optimizer.

I don't know what the "first approach" refers to, or what processing
the extra "OR" expressions means, or what it would mean to package OR
expressions as an array. If you made them into an SAOP then you'd have
an array *instead of* OR expressions, not OR expressions "packaged as
an array" but even then, they'd still be processed somewhere, unless
the patch was just wrong.

I think you should try writing this summary again and see if you can
make it a lot clearer and more precise.

I'm suspicious based that we should actually be postponing the
transformation even further. If, for example, the transformation is
advantageous for index scans and disadvantageous for bitmap scans, or
the other way around, then this approach can't help much: it either
does the transformation and all scan types are affected, or it doesn't
do it and no scan types are affected. But if you decided for each scan
whether to transform the quals, then you could handle that. Against
that, there might be increased planning cost. But, perhaps that could
be avoided somehow.

> What exactly is the strategy around OR-clauses with type differences?
> If I'm reading the code correctly, the first loop requires an exact
> opno match, which presumably implies that the constant-type elements
> are of the same type. But then why does the second loop need to use
> coerce_to_common_type?
>
> It needs to transform all similar constants to one type, because some constants of "OR" expressions can belong others, like the numeric and int types. Due to the fact that array structure demands that all types must be belonged to one type, so for this reason we applied this procedure.

The alternative that should be considered is not combining things if
the types don't match. If we're going to combine such things, we need
to be absolutely certain that type conversion cannot fail.

> I do not think this is acceptable. We should find a way to get the
> right operator into the ScalarArrayOpExpr without translating the OID
> back into a name and then back into an OID.
>
> I don’t really understand the reason why it’s better not to do this. Can you explain please?

One reason is that it is extra work to convert things to a name and
then back to an OID. It's got to be slower than using the OID you
already have.

The other reason is that it's error-prone. If somehow the second
lookup doesn't produce the same OID as the first lookup, bad things
will happen, possibly including security vulnerabilities. I see you've
taken steps to avoid that, like nailing down the schema, and that's
good, but it's not a good enough reason to do it like this. If we
don't have a function that can construct the node we need with the OID
rather than the name as an argument, we should invent one, not do this
sort of thing.

> Also, why is the array built with eval_const_expressions instead of
> something like makeArrayResult? There should be no need for general
> expression evaluation here if we are just dealing with constants.
>
> I'm not ready to answer this question right now, I need time to study the use of the makeArrayResult function in the optimizer.

OK. An important consideration here is that eval_const_expressions()
is prone to *fail* because it can call user-defined functions. We
really don't want this optimization to cause planner failure (or
queries to error out at any other stage, either). We also don't want
to end up with any security problems, which is another possible danger
when we call a function that can execute arbitrary code. It's better
to keep it simple and only do things that we know are simple and safe,
like assembling a bunch of datums that we already have into an array.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-06-24 15:29:42 Re: basebackups seem to have serious issues with FILE_COPY in CREATE DATABASE
Previous Message Tomas Vondra 2024-06-24 15:14:45 Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin