Re: BUG #17964: Missed query planner optimization

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Mathias Kunter <mathiaskunter(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17964: Missed query planner optimization
Date: 2023-06-07 12:36:03
Message-ID: CAApHDvq_5v=G=9aS3QQwRUH-h1nf09Ycnzh=GQeDJtsYsQvd_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, 7 Jun 2023 at 23:48, Mathias Kunter <mathiaskunter(at)gmail(dot)com> wrote:
> Yes, it's not a bug, but it's something which can be improved. If I
> simply change the original query from this:
>
> > SELECT ... WHERE id IN (SELECT ...);
>
> into this:
>
> > SELECT ... WHERE id = ANY(ARRAY(SELECT ...));
>
> then Postgres uses an index scan, and the query is orders of magnitude
> faster. Note that the planner actually correctly computes the estimated
> costs for both variants, since I get:

What's going on here is that there is code which will convert
supported IN clauses into semi-joins. The first of your queries has
this done, but the 2nd query does not. The 2nd query, since the
semi-join conversion is not done, the qual later becomes eligible to
be pushed down into the union subquery meaning the non-matching rows
get filtered before the UNION is evaluated. We'll never attempt to
push joins (semi or otherwise) down into subqueries, so this is not
done for the first query.

The main problem here is that in some cases the first of your queries
will be faster and in other cases the 2nd will be faster. It'll depend
on how many rows are in each table. So, really to find the fastest
plan, the planner would have to consider both options. That would
unfortunately mean that we'd have to perform the join search once
without the semi-join pushed down and again with the semi-join pushed
down. The join search is going to be the slowest part of planning
when there are a few tables to join, so doing that twice could add
quite a bit of overhead to the planner. You might also then consider
how many times you'd need to perform the join search if there were,
say, 5 IN clauses. To exhaustively find the best plan we'd need to
try all possible combinations of converting each IN clause to a
semi-join vs leaving the qual alone. If the main query already had,
say 5 tables to join then that suddenly becomes a hugely costly query
to plan. Given that, I'm not all that sure you're likely to see us
making any improvements here. I suggest just rewriting the query in a
way that it executes more quickly for you.

David

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2023-06-07 13:00:22 Re: BUG #17967: [msvc-x86] Failed to build with error LNK2019: unresolved external symbol __BitScanReverse64 referen
Previous Message Mathias Kunter 2023-06-07 11:47:59 Re: BUG #17964: Missed query planner optimization