Huge Performance Difference on Similar Query in Pg7.2

From: Heiko Klein <Heiko(dot)Klein(at)met(dot)no>
To: pgsql-general(at)postgresql(dot)org
Subject: Huge Performance Difference on Similar Query in Pg7.2
Date: 2002-03-22 09:04:20
Message-ID: 15514.62356.723349.852577@polar.oslo.dnmi.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I have two views:
LowHiSectorGridEmissionsView: > 1.000.000 rows
DataSetsView: > 30.000 rows

When I now perform the two similar (results equal) queries on those
rows:
1)
select count(*) from EmissionsView, DataSetsView
where DataSetsView.setid = EmissionsView.setid
and EmissionsView.setid = '4614' ;
-------------
2)
select count(*) from EmissionsView, DataSetsView
where DataSetsView.setid = EmissionsView.setid
and DataSetsView.setid = '4614' ;
------------

I have a huge performance difference. 1) Makes a index search on
EmissionsView and a Seq scan on DataSetsView: time: 1-2s
2) Makes a index search on DataSetsView and a Seq scan on EmissionsView:
time: ~50s

The count(*) isn't the point here, the same is true for every other type
of select.

I'm just porting the database from a Ora7 on 2x200Mhz SGI machine to a
Pg7.2 on a 850Mhz Athlon/Linux PC. On Ora7, both queries take about the
same time: 1-2s. I cannot easily change all queries, since I got only
time to work on the database, not on the frontend, which creates the
queries.

Why is Pg7.2 using a Seq scan at all? (Even after a vacuum analyze!) And
if a Seq scan is needed: Why doesn't the query-planer change the query
to the much faster version itself? And most important: How should I work
on, is there a Pg internal solution for it, or do I have to go the hard
way and modify the frontend?

Heiko

Below are the explain results:
1)
emep=> explain select count(*) from EmissionsView , DataSetsView where DataSetsView.setid = EmissionsView.setid and EmissionsView.setid = '4614' ;
NOTICE: QUERY PLAN:

Aggregate (cost=5975.66..5975.66 rows=1 width=131)
-> Hash Join (cost=5958.18..5975.58 rows=35 width=131)
-> Hash Join (cost=5956.30..5972.99 rows=35 width=127)
-> Hash Join (cost=5954.41..5970.40 rows=35 width=123)
-> Seq Scan on reports (cost=0.00..11.78 rows=378 width=8)
-> Hash (cost=5954.30..5954.30 rows=42 width=115)
-> Nested Loop (cost=4375.32..5954.30 rows=42 width=115)
-> Nested Loop (cost=4375.32..5812.95 rows=42 width=103)
-> Seq Scan on footnotes (cost=0.00..1.01 rows=1 width=4)
-> Materialize (cost=5811.31..5811.31 rows=42 width=99)
-> Hash Join (cost=4375.32..5811.31 rows=42 width=99)
-> Hash Join (cost=4371.95..5796.47 rows=307 width=77)
-> Merge Join (cost=4370.73..5789.88 rows=307 width=50)
-> Sort (cost=4370.73..4370.73 rows=28192 width=38)
-> Hash Join (cost=24.06..1834.10 rows=28192 width=38)
-> Hash Join (cost=22.17..1268.37 rows=28192 width=34)
-> Seq Scan on datasets (cost=0.00..724.86 rows=33786 width=22)
-> Hash (cost=21.23..21.23 rows=378 width=12)
-> Hash Join (cost=1.89..21.23 rows=378 width=12)
-> Seq Scan on reports (cost=0.00..11.78 rows=378 width=8)
-> Hash (cost=1.71..1.71 rows=71 width=4)
-> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4)
-> Hash (cost=1.71..1.71 rows=71 width=4)
-> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4)
-> Index Scan using lhsecgriemis_setid_idx on emissions (cost=0.00..1343.91 rows=368 width=12)
-> Hash (cost=1.18..1.18 rows=18 width=27)
-> Seq Scan on datatypes (cost=0.00..1.18 rows=18 width=27)
-> Hash (cost=3.09..3.09 rows=109 width=22)
-> Seq Scan on sectordefinitions (cost=0.00..3.09 rows=109 width=22)
-> Index Scan using datasets_setid_idx on datasets (cost=0.00..3.37 rows=1 width=12)
-> Hash (cost=1.71..1.71 rows=71 width=4)
-> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4)
-> Hash (cost=1.71..1.71 rows=71 width=4)
-> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4)

EXPLAIN

2)
emep=> explain select count(*) from EmissionsView , DataSetsView where DataSetsView.setid = EmissionsView.setid and DataSetsView.setid = '4614' ;
NOTICE: QUERY PLAN:

Aggregate (cost=91660.36..91660.36 rows=1 width=131)
-> Hash Join (cost=2732.23..91660.35 rows=3 width=131)
-> Nested Loop (cost=2719.53..91058.61 rows=117798 width=103)
-> Seq Scan on footnotes (cost=0.00..1.01 rows=1 width=4)
-> Materialize (cost=89290.63..89290.63 rows=117798 width=99)
-> Hash Join (cost=2719.53..89290.63 rows=117798 width=99)
-> Hash Join (cost=2716.17..56957.45 rows=863754 width=77)
-> Seq Scan on emissions (cost=0.00..18502.34 rows=1035134 width=12)
-> Hash (cost=2328.69..2328.69 rows=28192 width=65)
-> Hash Join (cost=25.29..2328.69 rows=28192 width=65)
-> Hash Join (cost=24.06..1834.10 rows=28192 width=38)
-> Hash Join (cost=22.17..1268.37 rows=28192 width=34)
-> Seq Scan on datasets (cost=0.00..724.86 rows=33786 width=22)
-> Hash (cost=21.23..21.23 rows=378 width=12)
-> Hash Join (cost=1.89..21.23 rows=378 width=12)
-> Seq Scan on reports (cost=0.00..11.78 rows=378 width=8)
-> Hash (cost=1.71..1.71 rows=71 width=4)
-> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4)
-> Hash (cost=1.71..1.71 rows=71 width=4)
-> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4)
-> Hash (cost=1.18..1.18 rows=18 width=27)
-> Seq Scan on datatypes (cost=0.00..1.18 rows=18 width=27)
-> Hash (cost=3.09..3.09 rows=109 width=22)
-> Seq Scan on sectordefinitions (cost=0.00..3.09 rows=109 width=22)
-> Hash (cost=12.70..12.70 rows=1 width=28)
-> Hash Join (cost=10.62..12.70 rows=1 width=28)
-> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4)
-> Hash (cost=10.62..10.62 rows=1 width=24)
-> Nested Loop (cost=0.00..10.62 rows=1 width=24)
-> Nested Loop (cost=0.00..5.96 rows=1 width=16)
-> Index Scan using datasets_setid_idx on datasets (cost=0.00..3.37 rows=1 width=12)
-> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4)
-> Index Scan using reports_pkey on reports (cost=0.00..4.64 rows=1 width=8)

EXPLAIN

Responses

Browse pgsql-general by date

  From Date Subject
Next Message mark 2002-03-22 09:17:59 ODBC problem
Previous Message Tom Lane 2002-03-22 06:20:19 Re: Postmaster processes running out of control?