From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Steve Crawford <scrawford(at)pinpointresearch(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Optimizing "exists" |
Date: | 2007-09-18 17:09:31 |
Message-ID: | 7267.1190135371@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Steve Crawford <scrawford(at)pinpointresearch(dot)com> writes:
> If the sub-select returns a large result set, will there be any benefit
> to adding "limit 1" to the sub-select or does the query planner
> automatically deduce that "limit 1" is the correct interpretation?
It does, although poking at it I notice a bit of a bug:
regression=# explain select * from tenk1 a join tenk1 b on a.unique1 = b.ten;
QUERY PLAN
--------------------------------------------------------------------------------------------
Merge Join (cost=2287.89..2438.58 rows=10000 width=488)
Merge Cond: (a.unique1 = b.ten)
-> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..1702.22 rows=10000 width=244)
-> Materialize (cost=2287.89..2412.89 rows=10000 width=244)
-> Sort (cost=2287.89..2312.89 rows=10000 width=244)
Sort Key: b.ten
-> Seq Scan on tenk1 b (cost=0.00..458.00 rows=10000 width=244)
(7 rows)
regression=# explain select exists(select * from tenk1 a join tenk1 b on a.unique1 = b.ten);
QUERY PLAN
---------------------------------------------------------------------------------------------
Result (cost=4822.00..4822.01 rows=1 width=0)
InitPlan
-> Nested Loop (cost=0.00..4822.00 rows=10000 width=488)
-> Seq Scan on tenk1 b (cost=0.00..458.00 rows=10000 width=244)
-> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..0.42 rows=1 width=244)
Index Cond: (a.unique1 = b.ten)
(6 rows)
The second case is correctly choosing a fast-start plan, but it's
reporting the sub-select's cost up to the next plan level as being the
full runtime instead of the expected partial runtime. That has no
bad effect here, but might in a more complex situation where the
estimated subselect cost affected upper join order or some such.
If you were up against such a situation, an explicit LIMIT 1 would
probably help:
regression=# explain select exists(select * from tenk1 a join tenk1 b on a.unique1 = b.ten limit 1);
QUERY PLAN
---------------------------------------------------------------------------------------------------
Result (cost=0.48..0.49 rows=1 width=0)
InitPlan
-> Limit (cost=0.00..0.48 rows=1 width=488)
-> Nested Loop (cost=0.00..4822.00 rows=10000 width=488)
-> Seq Scan on tenk1 b (cost=0.00..458.00 rows=10000 width=244)
-> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..0.42 rows=1 width=244)
Index Cond: (a.unique1 = b.ten)
(7 rows)
Same subplan, but a saner cost estimate at the upper level...
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Christian Schröder | 2007-09-18 17:14:33 | Performance Issues (was: "like" vs "substring" again) |
Previous Message | Martijn van Oosterhout | 2007-09-18 16:57:44 | Re: update command question? |