From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Toto guyoyg <thomas(dot)bessou(at)hotmail(dot)fr> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Planner picks n² query plan when available |
Date: | 2024-11-21 12:58:43 |
Message-ID: | CAEze2WjrSp9=8mF9PpwFD9EEY=s+=N-hWJKFBvRj0GBz9UX6uA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 21 Nov 2024 at 13:03, Toto guyoyg <thomas(dot)bessou(at)hotmail(dot)fr> wrote:
>
> Offending O(n²) query:
I disagree with the O(n^2) claims. The number of live matched rows in
a single table's bitmap scan may be anywhere from 0 (leading to O(1)
complexity in the rescan) to 970_662_608_670 (= 226 tuples per page *
(2^32 - 1) pages), so such a rescan's complexity - given O(n)
complexity for always choosing to use a linear scan on the array to
find matches - would better be parameterized as O(n*m), for m live
tuples in the generated bitmap, or even O(n*m + k), as the bitmap may
contain k dead tuples which can also take some measurable amount of
effort to scan.
> 4. The EXPLAIN ANALYZE output shows that the planner always supposes that arrays are of size 10, instead of using the estimated sizes of subqueries they are created from, or actual size provided as argument.
Did you recheck this on a recently released version of PostgreSQL?
IIRC this was updated in PG17, and with `= ANY (ARRAY[...])`
expressions you will get the expected number of rows for that array
expression (assuming accurate statistics on your table).
> It looks like this could be improved/fixed by either/all of:
>
> 1. Using a hashset (or sort + binary search) for recheck (past a certain array length or even always) instead of always searching linearly
IIUC, hashed "= ANY()" expressions were already implemented with
commit 50e17ad2 (released in PG14) - look for
EEOP_HASHED_SCALARARRAYOP in expression handling code.
> 2. Fixing planner assuming that all arrays are of size 10, using instead actual or estimated sizes.
IIUC, this was also already implemented with commit 9391f715 (released in PG17).
> Taking it into account for recheck cond cost estimation. (This wouldn't solve recheck perf, but at least would make the planner aware that lossy heap scans and Hash index bitmap heap scans are n times more expensive than exact BTree bitmap heap scans, and hopefully make it avoid picking them.)
>
> Is my understanding correct?
I believe your understanding may be quite out of date. Not all planner
or executor features and optimizations are explicitly present in the
output of EXPLAIN, and the examples all indicate you may be working
with an outdated view of Postgres' capabilities.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2024-11-21 13:38:09 | Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes) |
Previous Message | Peter Eisentraut | 2024-11-21 12:56:36 | Re: SQL:2011 application time |