Re: Index isn't used during a join.

From: Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: PGPerformance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index isn't used during a join.
Date: 2006-01-11 05:10:55
Message-ID: 20060110221055.053596ab@thunder.logicalchaos.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Ok, I'm back, and in a little better shape.

The query is now correct, but still is slow because of lack of index usage. I don't know how to structure the query correctly to use the index.

Taken individually:

weather=# explain analyze select * from doy_agg where doy = extract( doy from now() );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=13750.67..13750.71 rows=2 width=20) (actual time=123.134..123.135 rows=1 loops=1)
-> Bitmap Heap Scan on readings (cost=25.87..13720.96 rows=3962 width=20) (actual time=6.384..116.559 rows=4175 loops=1)
Recheck Cond: (date_part('doy'::text, "when") = date_part('doy'::text, now()))
-> Bitmap Index Scan on readings_doy_index (cost=0.00..25.87 rows=3962 width=0) (actual time=5.282..5.282 rows=4215 loops=1)
Index Cond: (date_part('doy'::text, "when") = date_part('doy'::text, now()))
Total runtime: 123.366 ms

produces the data:

weather=# select * from doy_agg where doy = extract( doy from now() );
doy | avg_windspeed | max_windspeed
-----+------------------+---------------
10 | 8.53403056583666 | 59

and:

weather=# EXPLAIN ANALYZE
weather-# SELECT *,
weather-# unmunge_time( time_group ) AS time
weather-# FROM minute."windspeed"
weather-# WHERE unmunge_time( time_group ) > ( now() - '24 hour'::interval )
weather-# ORDER BY time_group;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=595.33..595.77 rows=176 width=28) (actual time=4.762..4.828 rows=283 loops=1)
Sort Key: time_group
-> Bitmap Heap Scan on windspeed (cost=2.62..588.76 rows=176 width=28) (actual time=0.901..3.834 rows=283 loops=1)
Recheck Cond: (unmunge_time(time_group) > (now() - '24:00:00'::interval))
-> Bitmap Index Scan on minute_windspeed_unmunge_index (cost=0.00..2.62 rows=176 width=0) (actual time=0.745..0.745 rows=284 loops=1)
Index Cond: (unmunge_time(time_group) > (now() - '24:00:00'::interval))
Total runtime: 5.108 ms

produces:

time_group | min_reading | max_reading | avg_reading | time
------------+-------------------+-------------+-------------------+---------------------
1136869500 | 0.8 | 6 | 2.62193548387097 | 2006-01-09 22:05:00
1136869800 | 0 | 3 | 0.406021505376343 | 2006-01-09 22:10:00
1136870100 | 0 | 5 | 1.68 | 2006-01-09 22:15:00
...

But I want the composite of the two queries, and I'm stuck on:

weather=# EXPLAIN ANALYZE
weather-# SELECT *,
weather-# unmunge_time( time_group ) AS time
weather-# FROM minute."windspeed"
weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy )
weather-# WHERE unmunge_time( time_group ) > ( now() - '24 hour'::interval )
weather-# ORDER BY time_group;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=153627.67..153628.48 rows=322 width=48) (actual time=10637.681..10637.748 rows=286 loops=1)
Sort Key: windspeed.time_group
-> Merge Join (cost=153604.82..153614.26 rows=322 width=48) (actual time=10633.375..10636.728 rows=286 loops=1)
Merge Cond: ("outer"."?column5?" = "inner".doy)
-> Sort (cost=594.89..595.33 rows=176 width=28) (actual time=5.539..5.612 rows=286 loops=1)
Sort Key: date_part('doy'::text, unmunge_time(windspeed.time_group))
-> Bitmap Heap Scan on windspeed (cost=2.62..588.32 rows=176 width=28) (actual time=0.918..4.637 rows=286 loops=1)
Recheck Cond: (unmunge_time(time_group) > (now() - '24:00:00'::interval))
-> Bitmap Index Scan on minute_windspeed_unmunge_index (cost=0.00..2.62 rows=176 width=0) (actual time=0.739..0.739 rows=287 loops=1)
Index Cond: (unmunge_time(time_group) > (now() - '24:00:00'::interval))
-> Sort (cost=153009.93..153010.84 rows=366 width=20) (actual time=10627.699..10627.788 rows=295 loops=1)
Sort Key: doy_agg.doy
-> HashAggregate (cost=152984.28..152990.69 rows=366 width=20) (actual time=10625.649..10626.601 rows=366 loops=1)
-> Seq Scan on readings (cost=0.00..145364.93 rows=1015914 width=20) (actual time=0.079..8901.123 rows=1015917 loops=1)
Total runtime: 10638.298 ms

Where:

weather=# \d doy_agg
View "public.doy_agg"
Column | Type | Modifiers
---------------+------------------+-----------
doy | double precision |
avg_windspeed | double precision |
max_windspeed | integer |
View definition:
SELECT doy_readings.doy, avg(doy_readings.windspeedaverage1) AS avg_windspeed, max(doy_readings.windspeedmax1) AS max_windspeed
FROM ONLY doy_readings
GROUP BY doy_readings.doy;

which I don't want because of the full scan on readings.

I can easily do the two queries seperately in the script utilizing this data, but want to do it in the db itself. I figure I'm just not seeing how to combine the two queries effectively.

Thoughts?

Thanks,
Rob

--
22:08:50 up 3 days, 14:35, 9 users, load average: 2.71, 2.48, 2.51
Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ron 2006-01-11 06:03:48 Re: help tuning queries on large database
Previous Message Tom Lane 2006-01-11 04:38:20 Re: Left Join Performance vs Inner Join Performance