From: | Kim Bisgaard <kib+pg(at)dmi(dot)dk> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | full outer performance problem |
Date: | 2005-06-08 09:37:40 |
Message-ID: | 42A6BC64.4010001@dmi.dk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi,
I'm having problems with the query optimizer and FULL OUTER JOIN on
PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins.
I might be naive, but I think that it should be possible?
I have two BIG tables (virtually identical) with 3 NOT NULL columns
Station_id, TimeObs, Temp_XXXX, with unique indexes on (Station_id,
TimeObs) and valid ANALYSE (set statistics=100). I want to join the two
tables with a FULL OUTER JOIN.
When I specify the query as:
SELECT station_id, timeobs,temp_grass, temp_dry_at_2m
FROM temp_dry_at_2m a
FULL OUTER JOIN temp_grass b
USING (station_id, timeobs)
WHERE station_id = 52981
AND timeobs = '2004-1-1 0:0:0'
I get the correct results
station_id | timeobs | temp_grass | temp_dry_at_2m
------------+---------------------+------------+----------------
52944 | 2004-01-01 00:10:00 | | -1.1
(1 row)
BUT LOUSY performance, and the following EXPLAIN:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Full Join (cost=1542369.83..1618958.58 rows=6956994 width=32) (actual time=187176.408..201436.264 rows=1 loops=1)
Merge Cond: (("outer".station_id = "inner".station_id) AND ("outer".timeobs = "inner".timeobs))
Filter: ((COALESCE("outer".station_id, "inner".station_id) = 52981) AND (COALESCE("outer".timeobs, "inner".timeobs) = '2004-01-01 00:00:00'::timestamp without time zone))
-> Sort (cost=1207913.44..1225305.93 rows=6956994 width=16) (actual time=145748.253..153851.607 rows=6956994 loops=1)
Sort Key: a.station_id, a.timeobs
-> Seq Scan on temp_dry_at_2m a (cost=0.00..117549.94 rows=6956994 width=16) (actual time=0.049..54226.770 rows=6956994 loops=1)
-> Sort (cost=334456.38..340472.11 rows=2406292 width=16) (actual time=31668.876..34491.123 rows=2406292 loops=1)
Sort Key: b.station_id, b.timeobs
-> Seq Scan on temp_grass b (cost=0.00..40658.92 rows=2406292 width=16) (actual time=0.052..5484.489 rows=2406292 loops=1)
Total runtime: 201795.989 ms
(10 rows)
If I change the query (note the "b."s)
explain analyse SELECT b.station_id, b.timeobs,temp_grass, temp_dry_at_2m
FROM temp_dry_at_2m a
FULL OUTER JOIN temp_grass b
USING (station_id, timeobs)
WHERE b.station_id = 52981
AND b.timeobs = '2004-1-1 0:0:0'
I seem to destroy the FULL OUTER JOIN and get wrong results (nothing)
If I had happend to use "a.", and not "b.", I would have gotten correct
results (by accident).
The "a." variant gives this EXPLAIN:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..11.97 rows=1 width=20) (actual time=0.060..0.067 rows=1 loops=1)
-> Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m a (cost=0.00..5.99 rows=1 width=16) (actual time=0.033..0.036 rows=1 loops=1)
Index Cond: ((station_id = 52981) AND (timeobs = '2004-01-01 00:00:00'::timestamp without time zone))
-> Index Scan using temp_grass_idx on temp_grass b (cost=0.00..5.96 rows=1 width=16) (actual time=0.018..0.021 rows=1 loops=1)
Index Cond: (("outer".station_id = b.station_id) AND ("outer".timeobs = b.timeobs))
Total runtime: 0.140 ms
(6 rows)
Why will PostgreSQL not use the same plan for both these queries - they
are virtually identical??
I have tried to formulate the problem with left joins, but this demands
from me that I know which table has all the values (and thus has to go
first), and in practice no such table excists.
TIA,
Kim Bisgaard.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Stone | 2005-06-08 09:51:50 | Re: Postgresql on an AMD64 machine |
Previous Message | Steve Pollard | 2005-06-08 09:09:09 | Importing from pg_dump slow, low Disk IO |