From: | Mark Kirkwood <markir(at)hnz(dot)co(dot)nz> |
---|---|
To: | undisclosed-recipients:; |
Subject: | slow sort for simple join |
Date: | 1999-11-30 23:14:41 |
Message-ID: | 199911302314.MAA05232@hudev0.hnz.co.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Dear List,
( I had submitted this in pg-general but it is more appropriate here ...
- well, hopefully anyway)
I have been attempting to get this simple query to execute in less
than 12 secs :
select d0.d0f1,
count(f.f1)
from dim0 d0,
fact0 f
where d0.d0key = f.d0key
and d0.d0f1 between '1996-05-01' and '1996-05-31'
group by d0.d0f1 ;
this scans 100000 rows from fact0 ( 3000000 row table indexed on d0key)
and 31 rows from dim0 ( 900 rows indexed on d0key as well )
- see my posting on "4 databases" in "pgsql-general" if you want more detail on the data
in these tables....
It gives the result :
d0f1 | count
-----------------------------+------
Thu May 30 00:00:00 1996 NZST|100000
( Note that only 1 of the 31 dim0 rows actually have corrosponding fact0 ones )
The query plan for this guy is :
Aggregate (cost=134804.38 rows=2289334 width=20)
-> Group (cost=134804.38 rows=2289334 width=20)
-> Sort (cost=134804.38 rows=2289334 width=20)
-> Nested Loop (cost=134804.38 rows=2289334 width=20)
-> Seq Scan on dim0 d0 (cost=36.70 rows=101 width=12)
-> Index Scan using fact0_q1 on fact0 f (cost=1334.33 rows=3000000 width=8)
After some fooling about I tried this query :
select max(d0.d0f1),
count(f.f1)
from dim0 d0,
fact0 f
where d0.d0key = f.d0key
and d0.d0f1 between '1996-05-01' and '1996-05-31' ;
which executes in 2 sec and gives the same result.
The query plan for this one is :
Aggregate (cost=134804.38 rows=2289334 width=20)
-> Nested Loop (cost=134804.38 rows=2289334 width=20)
-> Seq Scan on dim0 d0 (cost=36.70 rows=101 width=12)
-> Index Scan using fact0_q1 on fact0 f (cost=1334.33 rows=3000000 width=8)
therefore nested loop evaluation of the original query probably takes about
2 secs, and it is the sort / group by that takes the remaining 10 secs.
Is this the expected level of performance for sort / group ?
I suspect that it is possible to perform this sort etc more quickly...
( in part because Oracle and Sqlserver can do this same query in 1-2 sec...
and I dont really see why Postgresql needs to be slower )
Cheers
Mark
From | Date | Subject | |
---|---|---|---|
Next Message | Frank Bax | 1999-12-01 01:35:34 | RestrictionClauseSelectivity |
Previous Message | Tom Lane | 1999-11-30 22:56:35 | Re: [SQL] Addendum: PG6.5.3: CASE w. diff THEN types -- prob with Linux(?) |