Re: Planner picks n² query plan when available

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Toto guyoyg <thomas(dot)bessou(at)hotmail(dot)fr>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Planner picks n² query plan when available
Date: 2024-11-22 16:54:25
Message-ID: 600982.1732294465@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> writes:
> On Thu, 21 Nov 2024 at 13:03, Toto guyoyg <thomas(dot)bessou(at)hotmail(dot)fr> wrote:
>> It looks like this could be improved/fixed by either/all of:
>>
>> 1. Using a hashset (or sort + binary search) for recheck (past a certain array length or even always) instead of always searching linearly

> IIUC, hashed "= ANY()" expressions were already implemented with
> commit 50e17ad2 (released in PG14) - look for
> EEOP_HASHED_SCALARARRAYOP in expression handling code.

I checked into that and verified that this test case isn't reaching
EEOP_HASHED_SCALARARRAYOP, because that commit only addressed the
scenario where the array argument of =ANY is a constant. (A
post-const-folding constant, but still a constant.)

To apply that optimization here, where the array is an output of a
subplan, we'd need some mechanism whereby ExecEvalHashedScalarArrayOp
could find out when the array has changed underneath it. That's
probably possible (by somehow extending the chgParam signaling logic),
but the details aren't obvious. The planner's decision about when
to apply hashing would become much less obvious, too.

>> 2. Fixing planner assuming that all arrays are of size 10, using instead actual or estimated sizes.

> IIUC, this was also already implemented with commit 9391f715 (released in PG17).

That's not helpful here either, since the arrays are built on-the-fly
rather than being stored in table columns (where stats could be
collected on them). You could imagine installing bespoke logic for
array_agg() that looks at the estimated rowcount for the sub-select,
perhaps.

The bottom line here is that improving these queries would take a
significant amount of work, and they just aren't very compelling
examples that seem to justify such effort. Folding the output of
a subquery into an array is largely an anti-pattern in SQL in the
first place.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-11-22 17:09:10 Re: Replace current implementations in crypt() and gen_salt() to OpenSSL
Previous Message Jacob Champion 2024-11-22 16:11:51 Re: Offsets of `struct Port` are no longer constant