From: | Maxim Boguk <mboguk(at)masterhost(dot)ru> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Postgresql optimisator deoptimise queries sometime... |
Date: | 2008-09-03 11:19:55 |
Message-ID: | 48BE72DB.9090206@masterhost.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
postgresql version 8.3:
I found issue when optimisator had tried rollup subrequest (without attempt compare final cost with direct plan) and finished with bad plan.
The simplest test is below:
Preparing data:
testdb=# INSERT INTO table2 select (random()*99+1)::integer from generate_series(1,100000);
testdb=# drop table table2;
DROP TABLE
testdb=# drop table table1;
DROP TABLE
testdb=# CREATE TABLE table1 (id serial primary key);
CREATE TABLE
testdb=# INSERT INTO table1 select generate_series(1,50);
INSERT 0 50
testdb=# ANALYZE table1;
ANALYZE
testdb=# CREATE TABLE table2 (fk integer not null references table1(id));
CREATE TABLE
testdb=# INSERT INTO table2 select (random()*49+1)::integer from generate_series(1,50000);
INSERT 0 50000
testdb=# ANALYZE table2;
ANALYZE
Now lets try execute next query:
SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) from table1 where (select count(*) from table2 where table2.fk=table1.id)>1000
testdb=# EXPLAIN ANALYZE SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) from table1 where (select count(*) from table2 where table2.fk=table1.id)>1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on table1 (cost=0.00..56918.96 rows=17 width=4) (actual time=52.576..1450.798 rows=33 loops=1)
Filter: ((subplan) > 1000)
SubPlan
-> Aggregate (cost=849.50..849.51 rows=1 width=0) (actual time=17.459..17.460 rows=1 loops=50)
-> Seq Scan on table2 (cost=0.00..847.00 rows=1000 width=0) (actual time=0.029..16.022 rows=1000 loops=50)
Filter: (fk = $0)
-> Aggregate (cost=849.50..849.51 rows=1 width=0) (actual time=17.484..17.486 rows=1 loops=33)
-> Seq Scan on table2 (cost=0.00..847.00 rows=1000 width=0) (actual time=0.029..16.002 rows=1037 loops=33)
Filter: (fk = $0)
Total runtime: 1453.577 ms
(10 rows)
oops... grouping query executing twice per row... but this is ok i think.
Lets try more optimal query form (eliminate twice calling count(*) query...):
testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) as count from table1) as t1 where count>1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on table1 (cost=0.00..56918.96 rows=17 width=4) (actual time=53.473..1525.859 rows=33 loops=1)
Filter: ((subplan) > 1000)
SubPlan
-> Aggregate (cost=849.50..849.51 rows=1 width=0) (actual time=18.445..18.446 rows=1 loops=50)
-> Seq Scan on table2 (cost=0.00..847.00 rows=1000 width=0) (actual time=0.033..17.001 rows=1000 loops=50)
Filter: (fk = $0)
-> Aggregate (cost=849.50..849.51 rows=1 width=0) (actual time=18.262..18.263 rows=1 loops=33)
-> Seq Scan on table2 (cost=0.00..847.00 rows=1000 width=0) (actual time=0.033..16.777 rows=1037 loops=33)
Filter: (fk = $0)
Total runtime: 1526.027 ms
(10 rows)
Hey... i dont asked rollup subrequest... and calculate subplan twice again per row...
Workaround ofcource easy (add offset 0 to subquery) (And lead to lower estimated cost!):
testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) as count from table1 offset 0) as t1 where count>1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan t1 (cost=0.00..42477.75 rows=17 width=12) (actual time=35.393..908.265 rows=33 loops=1)
Filter: (t1.count > 1000)
-> Limit (cost=0.00..42477.12 rows=50 width=4) (actual time=16.925..908.092 rows=50 loops=1)
-> Seq Scan on table1 (cost=0.00..42477.12 rows=50 width=4) (actual time=16.921..907.950 rows=50 loops=1)
SubPlan
-> Aggregate (cost=849.50..849.51 rows=1 width=0) (actual time=18.148..18.149 rows=1 loops=50)
-> Seq Scan on table2 (cost=0.00..847.00 rows=1000 width=0) (actual time=0.032..16.719 rows=1000 loops=50)
Filter: (fk = $0)
Total runtime: 908.421 ms
(9 rows)
So i think it is bug if planner rollup internal query without even try compare cost with direct plan.
Writing such not easy to understand workarounds as 'offset 0' i think bad style.
==============================================================================================================================================================================
Even worse situation when internal subrequest can produce volatile results:
testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id and random()>0.1) as count from table1) as t1 where count>900;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on table1 (cost=0.00..73557.24 rows=17 width=4) (actual time=47.385..1346.824 rows=32 loops=1)
Filter: ((subplan) > 900)
SubPlan
-> Aggregate (cost=1097.84..1097.85 rows=1 width=0) (actual time=16.333..16.334 rows=1 loops=50)
-> Seq Scan on table2 (cost=0.00..1097.00 rows=333 width=0) (actual time=0.025..14.995 rows=900 loops=50)
Filter: ((fk = $0) AND (random() > 0.1::double precision))
-> Aggregate (cost=1097.84..1097.85 rows=1 width=0) (actual time=16.537..16.539 rows=1 loops=32)
-> Seq Scan on table2 (cost=0.00..1097.00 rows=333 width=0) (actual time=0.028..15.141 rows=934 loops=32)
Filter: ((fk = $0) AND (random() > 0.1::double precision))
Total runtime: 1346.972 ms
(10 rows)
This plan can produce just wrong result!!!! (wich is clear bug).
VS right plan:
testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id and random()>0.1) as count from table1 offset 0) as t1 where count>900;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Subquery Scan t1 (cost=0.00..54894.38 rows=17 width=12) (actual time=31.181..800.898 rows=35 loops=1)
Filter: (t1.count > 900)
-> Limit (cost=0.00..54893.75 rows=50 width=4) (actual time=14.992..800.754 rows=50 loops=1)
-> Seq Scan on table1 (cost=0.00..54893.75 rows=50 width=4) (actual time=14.988..800.602 rows=50 loops=1)
SubPlan
-> Aggregate (cost=1097.84..1097.85 rows=1 width=0) (actual time=16.003..16.004 rows=1 loops=50)
-> Seq Scan on table2 (cost=0.00..1097.00 rows=333 width=0) (actual time=0.025..14.725 rows=898 loops=50)
Filter: ((fk = $0) AND (random() > 0.1::double precision))
Total runtime: 801.021 ms
(9 rows)
--
Maxim Boguk
From | Date | Subject | |
---|---|---|---|
Next Message | Asko Oja | 2008-09-03 13:17:54 | Re: Oracle and Postgresql |
Previous Message | Thomas Finneid | 2008-09-03 09:42:49 | Re: plpgsql returning resultset |