From: | Duane Lee - EGOVX <DLee(at)mail(dot)maricopa(dot)gov> |
---|---|
To: | "'Kevin Murphy'" <murphy(at)genome(dot)chop(dot)edu>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: performance of IN (subquery) |
Date: | 2004-08-27 17:44:51 |
Message-ID: | 64EDC403A1417B4299488BAE87CA7CBF01CD0F5A@maricopa_xcng0 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Have you thought about using existence checking: WHERE EXISTS (SELECT '1'
FROM FOO2 WHERE BAZ = BAZ2)
If the index exists on BAZ2 you might get away with a quick index only
check.
Duane
-----Original Message-----
From: Kevin Murphy [mailto:murphy(at)genome(dot)chop(dot)edu]
Sent: Thursday, August 26, 2004 3:24 PM
To: pgsql-general(at)postgresql(dot)org
Subject: [GENERAL] performance of IN (subquery)
I'm using PG 7.4.3 on Mac OS X.
I am disappointed with the performance of queries like 'select foo from
bar where baz in (subquery)', or updates like 'update bar set foo = 2
where baz in (subquery)'. PG always seems to want to do a sequential
scan of the bar table. I wish there were a way of telling PG, "use the
index on baz in your plan, because I know that the subquery will return
very few results". Where it really matters, I have been constructing
dynamic queries by looping over the values for baz and building a
separate query for each one and combining with a UNION (or just
directly updating, in the update case). Depending on the size of the
bar table, I can get speedups of hundreds or even more than a thousand
times, but it is a big pain to have to do this.
Any tips?
Thanks,
Kevin Murphy
Illustrated:
The query I want to do is very slow:
select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1);
-----------
7644
7644
(2 rows)
Time: 518.242 ms
The subquery is fast:
SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1;
------------
41209
25047
(2 rows)
Time: 3.268 ms
And using indexes on the main table is fast:
select bundle_id from build.elements
where elementid in (41209, 25047);
-----------
7644
7644
(2 rows)
Time: 2.468 ms
The plan for the slow query:
egenome_test=# explain analyze select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1);
egenome_test-# egenome_test(# egenome_test(# egenome_test(#
QUERY PLAN
\
------------------------------------------------------------------------
-------------------------------------------------------------
Hash Join (cost=70.33..72.86 rows=25 width=4) (actual
time=583.051..583.059 rows=2 loops=1)
Hash Cond: ("outer".element_id = "inner".elementid)
-> HashAggregate (cost=47.83..47.83 rows=25 width=4) (actual
time=0.656..0.658 rows=2 loops=1)
-> Hash Join (cost=22.51..47.76 rows=25 width=4) (actual
time=0.615..0.625 rows=2 loops=1)
Hash Cond: ("outer".superloc_id = "inner".superloc_id)
-> Seq Scan on superlocs_2 (cost=0.00..20.00 rows=1000
width=8) (actual time=0.004..0.012 rows=9 loops=1)
-> Hash (cost=22.50..22.50 rows=5 width=4) (actual
time=0.076..0.076 rows=0 loops=1)
-> Seq Scan on bundle_superlocs_2
(cost=0.00..22.50 rows=5 width=4) (actual time=0.024..0.033 rows=2
loops=1)
Filter: (protobundle_id = 1)
-> Hash (cost=20.00..20.00 rows=1000 width=8) (actual
time=581.802..581.802 rows=0 loops=1)
-> Seq Scan on elements (cost=0.00..20.00 rows=1000 width=8)
(actual time=0.172..405.243 rows=185535 loops=1)
Total runtime: 593.843 ms
(12 rows)
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2004-08-27 17:51:00 | Re: Stored procedure failure |
Previous Message | Joe Conway | 2004-08-27 17:28:58 | Re: R: R: space taken by a row & compressed data |