Re: BUG #17964: Missed query planner optimization

From: Mathias Kunter <mathiaskunter(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17964: Missed query planner optimization
Date: 2023-06-07 11:47:59
Message-ID: 94608857-707f-55ce-99a0-ac500aeb06f8@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> It's not a bug that the planner does not consider evaluating the join
> before the UNION

Yes, it's not a bug, but it's something which can be improved. If I
simply change the original query from this:

> SELECT ... WHERE id IN (SELECT ...);

into this:

> SELECT ... WHERE id = ANY(ARRAY(SELECT ...));

then Postgres uses an index scan, and the query is orders of magnitude
faster. Note that the planner actually correctly computes the estimated
costs for both variants, since I get:

> cost=769.11..1227.17 when using IN
> cost=86.45..86.65 when using ANY

See https://dbfiddle.uk/iOkiiTJJ

Also note that this issue doesn't only affect UNION queries. For
example, the following query will also execute orders of magnitude
faster if I simply replace IN with ANY:

> SELECT * FROM t WHERE x = 'a' OR y IN (SELECT ...);

Again, estimated costs say that using ANY should be faster:

> cost=8.30..2443.31 when using IN
> cost=56.45..350.69 when using ANY

See https://dbfiddle.uk/b9piwQr4

Hence, why doesn't the planner simply test whether it's beneficial to
replace IN with ANY? It seems that all which has to be done is to
compare the query plans for both possible execution variants. I guess
this should be rather simple to implement, isn't it?

Thanks

Mathias

Am 06.06.23 um 23:32 schrieb David Rowley:
> On Wed, 7 Jun 2023 at 04:44, PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
>> In the example below, the query planner uses a sequential scan (query 1)
>> even though it could use an index scan (query 2).
>>
>> EXPLAIN ANALYZE SELECT id, name FROM (SELECT id, name FROM table1 UNION
>> SELECT id, name FROM table2) AS q
>> WHERE id IN (SELECT id FROM table3);
>
>> EXPLAIN ANALYZE SELECT id, name FROM (SELECT id, name FROM table1 UNION
>> SELECT id, name FROM table2) AS q
>> WHERE id IN (1538,8836,5486,3464,2673);
>
> It's not a bug that the plannSer does not consider evaluating the join
> before the UNION, it's just an optimisation opportunity we don't
> currently explore.
>
> If you want that, then write:
>
> EXPLAIN ANALYZE SELECT id, name FROM table1 WHERE id IN (SELECT id
> FROM table3) UNION SELECT id, name FROM table2 WHERE id IN (SELECT id
> FROM table3);
>
> David

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2023-06-07 12:36:03 Re: BUG #17964: Missed query planner optimization
Previous Message PG Bug reporting form 2023-06-07 09:57:49 BUG #17967: [msvc-x86] Failed to build with error LNK2019: unresolved external symbol __BitScanReverse64 referen