From: | Mathias Kunter <mathiaskunter(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14107: Major query planner bug regarding subqueries and indices |
Date: | 2016-04-29 12:15:38 |
Message-ID: | e2090738-62b9-7697-886a-b2a01a8c9482@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
> If you know that the
> sub-select isn't going to return very many rows, you could do
>
> SELECT ... FROM a WHERE a.x = ? OR a.y = ANY(ARRAY(SELECT ...));
Isn't the planner already doing something like this, since the following
query is using the index as expected:
SELECT ... FROM a WHERE a.x IN (SELECT ...);
> but this would blow up rather badly with a large sub-select result,
> so I'm not sure I want to try to make the planner transform it that
> way automatically.
Wouldn't it be possible then to use this optimization based on the
estimated result size of the subquery? I think this would almost always
be faster than a sequential scan anyway. I observed that using the
ANY(ARRAY(SELECT...)) syntax on my small test tables (100 K rows)
already improves the query time by a factor of more than 100, and it
will be even more when tables are large. Please consider implementing
this optimization!
> I don't actually see any way to do very much with your second example at
> all:
>
>> SELECT ... FROM a JOIN b ON (...) WHERE a.x = ? OR b.y = ?;
Assuming both joined tables contain a PK (or another unique column),
then it should be possible by replanning the query as:
SELECT ... FROM a JOIN b ON ((join_cond AND a.x = ?) OR (join_cond AND
b.y = ?));
(Let "join_cond" denote the original join condition here.) Now, the JOIN
implementation must be smart enough to handle OR conditions: First,
obtain the rows satisfying
join_cond AND a.x = ?
by using the index as usual. For each matching row, create the tuple
(a.id, b.id) and insert it into a search tree (or hash or whatever).
Then, obtain the rows satisfying
join_cond AND b.y = ?
For each matching row, query the search tree whether it already contains
the tuple (a.id, b.id), and only add the current row to the final result
if it doesn't.
From | Date | Subject | |
---|---|---|---|
Next Message | John McKown | 2016-04-29 14:47:29 | Re: BUG #14121: Constraint UNIQUE |
Previous Message | Michael Paquier | 2016-04-29 12:08:55 | Re: BUG #14109: pg_rewind fails to update target control file in one scenario |