From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Frédéric TERRAZZONI <frederic(dot)terrazzoni(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimization idea: merging multiple EXISTS(...) with constraint-based join removal |
Date: | 2015-07-27 23:22:45 |
Message-ID: | CAKJS1f9iUXKnkUU6fv5QLgN74r33_cTpo+QCTxzHJ=CMkBqvdg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 28 July 2015 at 09:37, Frédéric TERRAZZONI <frederic(dot)terrazzoni(at)gmail(dot)com
> wrote:
>
> SELECT * FROM t1
> WHERE EXISTS(
> SELECT 1 FROM t2, t3, t4
> WHERE t2.id = t1.t2_id
> AND t3.id = t2.t3_id
> AND t4.id = t3.t4_id
> AND t4.val = 'XYZ'
> ) AND EXISTS(
> SELECT 1 FROM t2, t3, t5
> WHERE t2.id = t1.t2_id
> AND t3.id = t2.t3_id
> AND t5.id = t3.t5_id
> AND t5.val = 'Blablabla'
> ) AND EXISTS(
> SELECT 1 FROM t6
> WHERE t6.id = t1.t6_id
> AND t6.val = 'Hello'
> )
>
> ...
>
> The resulting query is:
>
> SELECT * FROM t1
> WHERE EXISTS(
> SELECT 1 FROM t2 t2_a, t3 t3_a, t4 t4_a, t2 t2_b, t3 t3_b, t5,
> t6
> WHERE t2_a.id = t1.t2_id
> AND t3_a.id = t2_a.t3_id
> AND t4_a.id = t3_a.t4_id
> AND t4_a.val = 'XYZ'
> AND t2_b.id = t1.t2_id
> AND t3_b.id = t2_b.t3_id
> AND t5.id = t3_b.t5_id
> AND t5.val = 'Blablabla'
> AND t6.id = t1.t6_id
> AND t6.val = 'Hello'
> )
>
> My questions are:
> - Does PostgreSQL already supports this optimization ? Maybe it's not
> enabled in my case only?
>
No, there's nothing which supports that currently.
> - If yes, is my reasoning incorrect ? Can you point me my mistake ?
>
It sounds reasonable to me.
> - Otherwise is there any plan to add this optimization to PostgreSQL ?
>
>
I did propose the idea here
http://www.postgresql.org/message-id/CAApHDvopmWq4i2BCf0VqU4mYmxzHCYwfnUMat9TWuKzdvo7=8w@mail.gmail.com
but I didn't focus just with semi-joins. Without re-reading, I think I was
talking about any join that could be proved to not duplicate rows could be
"consolidated".
I do believe that this optimisation would be useful in more cases than most
people might think, for example:
UPDATE t1 SET col1 = (SELECT col1 FROM t2 WHERE t1.id=t2.id), col2 =
(SELECT col2 FROM t2 WHERE t1.id=t2.id), ...;
Of course, this query could have been written using UPDATE/FROM,
(non-standard), or UPDATE t1 SET (col1,col2) = (SELECT ...), which was only
added recently.
There's also the case of the view which just has 1 column missing, so the
consumer joins a table that's already been joined to in the view.
I think it would be quite nice to have this, and I don't think it would be
all that expensive for the planner to detect this.
I think the planner would have to do something like:
1. Scan simple_rte_array looking for relids which are the same as another
entry's.
2. If found, is the join condition the same as the other one?
3. Is there a unique index to prove that joining to this does not duplicate
rows, or is it a semi-join?
4. Remove relation and replace all Vars from the removed relation with the
one from the other table, mark relation as REL_DEAD.
I think 1 is quite cheap to perform, so normal queries wouldn't suffer much
of a slow-down from these extra checks, as most queries won't have self
joins.
Are you thinking of working on it?
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2015-07-27 23:40:42 | Re: Autonomous Transaction is back |
Previous Message | David Rowley | 2015-07-27 22:59:15 | Re: WIP: Make timestamptz_out less slow. |